+33 (0)6 85 22 07 25 mario.diaznava@st.com

Seamless integration of brain-inspired AI processing; Yes, we can!

Neuromorphic (brain-inspired) architectures claim to drastically reduce the power consumption and latency of AI processing in edge devices. Yet, why did these technologies not reach mainstream products? Neuromorphic AI depends on non-mainstream technologies, such as memristive chip devices and spiking neural networks (SNNs). As a result, current neuromorphic devices do not benefit from the latest silicon nodes and machine learning models. ANDANTE member GrAI Matter Labs (GML) shows that the benefits of brain-inspired AI can be obtained using fully digital architectures, while leveraging advanced convolutional neural networks (CNNs). GML’s digital design uses the TSMC’s 12FFC CMOS silicon node, which opens the door to yet more power and performance gains. GML employs brain-inspired digital architecture concepts, such as (i) massively parallel, (ii) near-memory compute, in combination with (iii) novel event-based dataflow and (iv) radical exploitation of sparsity. GML’s Neuronflow(TM) architecture exploits all types of sparsity, and thereby avoids 95% of neural network computations. GML’s NeuronFlow architecture processes SparNets, in which the synaptic activity consists of sparse delta-value events. These delta-value events are considered sparse because they occur with a much lower frequency in SparNets than non-valued spikes in comparable SNNs, or activations in comparable ANNs. GML’s simulations show that the average frequency of SparNet events is at least significantly lower than the frequency of activations regular ANNs. In below Fig. 1, measured on an automotive dataset, the difference between normal inference and average sparse inference is 16.6x.

Fig. 1: ANN vs SparNet operations per inference

This sparsity is obtained by only generating events when accumulated synaptic activity results in a sufficiently large impact (delta) on the membrane value of affected neurons. This delta-based event suppression starts at the input layers of the network, which only generate input events when input values have a sufficiently large delta, compared to the previous input values. This effect, both on input and on activations in subsequent layers, is shown in below Fig.2 

Fig. 2: Neural Network activations, ANN vs SparNet

In the ANDANTE project, GML shows that, using Neuronflow technology, its GrAI VIP chip performs end-to-end processing of a diverse set of CNN-based algorithms, e.g.  Path planning for autonomous steering of vehicles in use case 3.5, and detection of Corona in medical Ultrasound images acquisition or processing in use case 4.2. In accordance with ANDANTE specifications, GrAI VIP executes these high-performance tasks in a power budget of less than 2 Watts. This allows the integration of such advanced AI algorithms in a wide range of low-power edge devices.

For additional information, contact: mlindwer@snapchat.com