+33 (0)6 85 22 07 25 mario.diaznava@st.com

Methods and Tools, and Building Blocks

Methods and tools

Deploying neural network applications at the Edge is more challenging than in the Cloud, due to limited hardware resources, such as computing power and memory capacity. Additionally, edge devices are power constrained because they mostly run on batteries. AI algorithms must therefore be optimized for particular hardware targets: number of parameters, quantization of these parameters, etc. Conversely, the hardware accelerators must themselves be optimized for the tasks to be executed: their architecture specialization would be different for the processing of still images than for processing sequential temporal inputs.

This is why a hardware-algorithm co-development methodology is necessary. Design teams, comprised of machine learners and chip designers, must have access to a common framework and dedicated tools to explore optimization opportunities at multiple levels.

On the hardware side, the designers must be able to answer the following questions:

  • What is the Optimal compute array size?
  • Best memory hierarchy?
  • Reconfigurability needs?
  • An Analog or Digital implementation? A combination of both?

On the algorithm side, the questions that will need to be answered are:

  • What is the best quantization ratio?
  • Is a mixed quantization scheme necessary?
  • How to cope with devices variability in the case of an analog implementation?

In the ANDANTE project, a lot of effort was devoted to tool suites. Nineteen tools were developed to train and optimize both classical encoding (ANN) and spike encoding (SNN) neural networks; to map these networks to different hardware accelerators; to generate RTL code and mixed-signal IPs, necessary compilers, etc. Eleven of those tools were made available to the community in Open Source.

These tools played a key role in the design of hardware accelerators during the project, with several first silicon successes. They were also essential to correctly map applications algorithms onto these accelerators.

Workflow for embedded Neural Networks

AI Building Blocks

When running AI applications at the edge, the number one priority is to reduce power consumption. In integrated circuits, the main source of energy used is the movement of data and not calculations: the further away the data is, the higher the energy cost. This is why this project focused on exploiting new integrated non-volatile memories (NVM), to store all the parameters of neural networks on a chip, thus eliminating the need for external memories.

These embedded NVMs can be used either as traditional storage devices, or as computational memories. In the first case, the design of memory macros would be classic, using single-valued binary-cells (bitcells). In the latter case, bitcells with multiple-valued or continuous-valued would be used, for ensuring analog multiplications.

But, in both cases, dedicated IP blocks must be designed to program and read these bitcells. For instance, the reading mechanism of a ferroelectric memory cell (such as the FeRAM technology) is different from a resistive memory cell (such as the OxRAM or PCRAM technologies).

Other technological building blocks have been developed, particularly in the field of mixed signal: neurons, buffers, analog-to-digital converters, digital-to-analog converters, etc.

Analog in memory compute (ACiM): mapping a DNN layer to analog matrix-vector-multiply (MVM). The matrix-vector product is calculated in the analog domain using Ohm’s and Kirchhoff’s law, where each weight is encoded by a differential conductance pair.