FPGA-based Analog VLSI Neural Network Emulator – Daniel Herrera

This work presents an \ac{FPGA}-based emulator which recreates the analog adaptive circuits in \ac{VLSI}. Its operation is supported by the emulation of arithmetic units with data coming from laboratory measurements. Based on re-configurable hardware technology, we obtain execution performance 9.44 times faster than a software-based solution and power consumption as low as 100[mW]. The emulation achieves, in contrast to the software solution, a resolution of 0.5\% error in the output range.

The most straightforward neural network consists of a single neuron with adaptable synaptic weights, named perceptron. The output of this network is a linear combination of the input with the synaptic weights. The update of the synaptic weights is done by \ac{LMS} algorithm, which computes an error signal by comparing the reference from a training set against the produced output by the network and updates the synaptic weights by following the solution vector in steps defined by the learning rate.

An artificial synapse in analog \ac{VLSI} consists primary in a multiplier circuit and a memory circuit to store the weight value of a synapse. A Gilbert cell topology is used as a current multiplier. This multiplier receives two input voltages to produce a current almost proportional to the tanh functions of each voltage. The multiplier also depends on a cell saturation current and its thermal voltage. In order to emulate an analog multiplier cell, we measured the transfer functions of the Gilbert cells, we then fitted these measurements with tanh-curves, the tanh is approximated with the first order Taylor approximation, which is sufficiently precise for small input voltages.

On the other hand, a floating-gate \ac{pFET} transistor is used to implement analog memory in \ac{CMOS}. The output voltage of an analog \ac{VLSI} memory cell depends linearly on the number of electrons stored on the floating gate. However, each memory cell transfer function has a different slope due to \ac{PVT} spread. The memory cell weight voltage is changed by adding pulses to the floating gate using the digitally implemented \ac{PWM}. To emulate the memory cell, we change the stored weight value relative to the change in the number of pulses in the analog implementation. Since the memory update is a discrete process depending on the number of pulses, the applied memory update will carefully follow the real value.

The adaptive network in this project is an adaptive filter using \ac{LMS} algorithm in the weight training. The emulator receives from the input world signals that are weighted in the blocks which represent the synapse. This synapse reflects the functioning of the real analog multipliers which are in turn the outputs that are added to compose the network’s output. The reference signal comes from a similar perceptron with fixed known weights, and its result is compared to the network’s output. The \ac{LMS} algorithm uses this error vector with the learning rate in order to calculate the weights update in terms of pulses applied to each emulated analog memory cell, strengthen of weakening the synaptic connection according to the algorithm’s computation. This architecture is programmed in Verilog for its \ac{FPGA} synthesis.

During the development of this work, a \ac{V2P} board was used for building our emulator. A computer was used to provide an interface for creating a customized network based on user parameters; this Matlab\texttrademark -based tool generates the Verilog configuration that is then synthesized in Xilinx ISE\texttrademark . After the Virtex board is programmed, the emulation takes place, and its results are collected back to the computer by using a C-based serial communication. Finally, another Matlab\texttrademark -based script process the results and prepare the reports. These results express a proper fit between real analog devices and the emulated ones in the \ac{FPGA}; they also evidence an average convergence of the \ac{LMS} algorithm and how each different synaptic weight behaves under different pulse responses. In addition, exciting facts were identified such as the speed, obtaining a maximum performance of 9,44 times a software-based solution; a low resources requirement, which yields working with a variety of inexpensive boards; and a resolution of 0.5\% error in the output range, warranting a high data reliability in the results of this emulator.

We have studied and deployed a tool that opens the floor towards the implementation of adaptive networks in \ac{VLSI} with low energy and convergence levels in the order of milliseconds, enabling the evaluation and development of adaptive processing systems from medium to large scale before its final design and fabrication, in a simple and practical manner. We can notice the speed achieved under this solution allows to emulate essential adaptive circuits as can be found in recognition of faces, human voices, or handwritten notes.