The ALSEMAR SeaExplorer glider is a long-range, low-energy underwater drone. This UUV carries an acoustic payload over several weeks and tens of kilometres while performing on-board acoustic signal processing for a wide range of applications such as maritime traffic monitoring, ambient noise monitoring, classification for underwater warfare or the classification and inventory of marine mammals/fish.
When surfacing, the amount of data transmitted is limited and high sample rate raw audio cannot be sent. Depending on the application, it is important to be able to analyze and send only relevant acoustic information during transmission. For the classification task, the detection probability of each class (for example: Dolphin, Boat, Sperm Whale, etc.) could be sent. Integrated signal processing and very low-power AI are then necessary. Two approaches are studied:
Video classification:
It is based on vision processing (spectrograms representation) and deep learning using a CNN (Convolutional Neural Network) hardware accelerator named NeuroCorgi supporting MobileNetV1 Model. In addition, a custom head network trained on the ALSEAMAR dataset (Fig. 2 a, top), using CEA LIST Aidge (formerly N2D2) simulation framework. The advantage of this approach is to use an optimized version of computer vision model resulting into highly low power and low latency circuit design, and a light custom head network to adapt the model to a specific use case.
Audio classification:
It is based on time series and an SNN (Spiking Neural Network) architecture. The implemented SNN is the SynNet architecture designed and developed by SynSense (Fig. 2 a, bottom). The SNN architecture leads to ultra-low power consumption and low latency performance. It was trained on the ALSEAMAR dataset, using the SynSense simulation framework Rockpool, and the Xylo Audio 2 hardware development kits.
For both approaches, training and testing were conducted on ALSEMAR data and simulation frameworks, leading to promising results. Fig. 1 shows the data processing flow.
Even if the scores obtained are quite high (>90% accuracy), the results obtained by working on non-annotated raw data do not seem very robust. This can be explained by the fact that the signals used for the training have a high SNR. A simple way to solve this problem is to augment the data on the dataset by adding random noise to the original signal. However, it is more relevant to filter such a white noise with a mean spectrum filter. This filtering process results in that for each file in the data set, 3 new files are generated with 3 coefficients randomly chosen. The results obtained on the augmented test dataset are slightly lower (~85%) but the robustness is increased. Examples of data augmentation and resulting confusion matrix are shown on Fig. 2 b top and Fig. 2 b bottom, respectively.
Finally, these two classifications approach were integrated into different hardware platforms:
- Video classification approach: CNN accelerator NeuroCorgi circuit provided by CEA LIST, connected to a Xilinx Kria KV260.
- Approach 2: SNN Xylo Audio dev kit provided by SynSense.
For additional information, contact: mmichau@alseamar-alcen.com