Advanced Neuromorphic Processors for IoT Edge Solutions
Nowadays, deep neural network (DNN) algorithms achieve high performance results for multiple applications like autonomous driving, smart health, smart home, smart farming, etc. However, these algorithms require high computational power for both training and inference. The area of high-performance DNN accelerators has largely been dominated by cloud platforms running NVIDIA GPUs and Google TPUs, and the general trend has been to offer flexibility and performance to cater to a wide range of DNN applications – without major consideration of the power consumption. Since the requirements for cloud and edge or near-sensor computing in terms of performance are different, new architectures and circuits need to be explored. Therefore, the trend for edge and smart sensor AI solutions is to leverage from inference accelerators for tiny neural network models, which offer low-energy consumption, high throughput, and low latency, opening the possibility to move processing close to the sensor and sensor nodes.
Inference accelerators optimize data movement and the computation of vector matrix multiplications. Data movement is a major contributor to the overall energy consumption of such ICs even on-chip. In the ANDANTE project, the designed inference accelerators have an energy consumption in the range of microjoules for the use cases addressed and a power consumption in the range of milliwatts which is ca. x1000 time less than the values provided by the FPGA counterparts.
The contribution of the ANDANTE project to the topic of NN inference accelerators covers:
Digital spiking NNs: Spiking neural network inference accelerators utilize spikes as data representation. In ANDANTE two designs were carried out: 1) The University of Zurich designed an SNN core with on chip learning. The choice of spikes is motivated not only by their inherent low-power characteristics but also by their seamless integration with spiking retina and cochlea sensors for task-agnostic in-sensor computing. 2) Low-power neuromorphic hardware can move processing close to the sensor and sensor nodes. For maximum efficiency, this raises a need for front ends designed to enable direct connection between sensors and neuromorphic processors as the one designed by SynSense for audio use cases.
Digital deep NNs: Digital accelerators compute the MAC operations employing many processing elements that compute parallelly. These designs reduce data movement by using input data, activation functions and/or weight stationary dataflows. In ANDANTE three designs were carried out: 1) STMicroelectronics extends the STM32 MCUs family with a dedicated AI accelerator co-processor targeting the mass market and therefore usable with knowledge in the field of AI and enabling operation as autonomous as possible. 2) CEA’s design, NeuroCorgi, is a feature extraction circuit intended for image classification, segmentation, and detection applications. This circuit mainly aims to minimize the energy required by the inference of ANN applications in the Edge while having extremely low latency. 3) The design of CSEM, a neural computing engine, is designed to accelerate NNs for smart vision applications in digital life use cases. The engine is integrated into a complete SoC (Visage2) with a processor and general-purpose peripherals.
Analog/mixed-signal deep NNs circuit-based designs. The three next designs are mixed-signal ICs that leverage from analog in-memory computing using crossbar architectures. Such approach avoids per se data movement since the weights are stored where the vector matrix multiplication computation occurs. 1) Fraunhofer IIS, in cooperation with Fraunhofer EMFT, designed the ADELIA Gen2 circuit, a mixed-signed multi-core inference accelerator using SRAM-based in-memory computing designed in 22nm FDSOIÒ. The design has 6 analog processing units (APUs) supporting convolutional layers and digital on-chip communication. 2) Fraunhofer IPMS designed a flexible SoC for convolutional neural network applications centered around FeFET based analog in-memory computing accelerator crossbar architecture. The design integrates the multiply-accumulate accelerator with a RISC-V microcontroller. 3) Infineon designed an analogue NN circuit for tinyML applications. A central element of the aNN ASIC is the RRAM enabling excellent scalability, high-speed, long endurance, low power consumption together with a simple structure and CMOS compatibility. This ASIC was evaluated in a color recognition application for a limited range of colors.
Most of these neuromorphic accelerators have been integrated in 5 different Edge platforms and validated in 19 use cases in 5 different applications domains as illustrated in the next Figure.
Neuromorphic HW accelerators and platforms developed for the 14 ANDANTE use cases