Taking a large interferometer for radio astronomy, such as the \ac{ALMA} telescope, where the amount of stations produces a considerable volume of data in a short period of time (visibilities can be produced every 16[ms]), it becomes difficult to detect problems in the signal produced by each antenna in a timely manner (one antenna produces 16[GHz] bandwidth signal which is later digitized using 3-bits samplers). An undetected or late detection of those problems are translated to generation of useless information. Considering this limitation, we present an approach based on machine learning algorithms for detecting issues in the already digitized signal produced by the active antennas. This work aims to detect totally corrupted or unsuitable signals. In addition, this development also provides a timely warning which helps to stop and investigate the problem in order to prevent collecting useless information.
The investigation started analyzing the three most common instances which may cause corrupted or non-coherent data. They are \textit{big DC component}, \textit{reflections in the \ac{IF} signal}, and \textit{harmonics}. The DC components are seen during the digitization step and can be introduced by means of a bad digitizer adjustment, or by a common mode voltage. The reflections are caused by cables being poorly terminated, or a closed antenna shutter (the surface which can block the path between the radio source and the receiver). The harmonics are produced by a wrong timing adjustment of the connection between the digitizer and the formatter. Those signal inconveniences are not always promptly detected, and the troubleshooting process can be a cumbersome task for scientists, especially during critical observation campaigns under a tight schedule or array constraints. Identifying those kinds of problems can take a considerable amount of time and not detecting them any sooner can cause data loss during the subsequent observations.
For classifying normal behavior from those three different anomalies in the signal, a first proposal of neural network topology consisting in a single layer perceptrons was introduced. This network retrieves weighted data from the correlator’s output (64 values) for training each neuron. For improving classification efficiency, a sigmoidal activation function is employed. The network was trained using 25\% of each signal pathology and then validated against the remaining 75\%. The results obtained were 100\% accurate in almost all cases. Thus, we moved forward with the introduction of an additional hidden layer, increasing the classification capabilities of the network.
The second approach consists in a double layer perceptron network, which integrates the same activation function and is trained with the same samples. In addition to the assimilated features from the single-layer approach, back-propagation algorithm was placed for training the whole network, and analysis for the minimum number of nodes in the hidden layer was required in this network strategy for limit resources and obtaining acceptable results, which were 100\% accurate for all signal pathologies.
The presented mechanism for detecting anomalies on the signals produced by the antennas was deployed within the \ac{DPI} of the Baseline correlator. This module was selected since it is connected to all antennas (the correlator results passes through this device) and because this module also provides the needed logic resources (\ac{FPGA} programmed in Verilog and micro-controller programmed in C) for deploying a neural network able to produce results in real-time.
The developed diagnostic tool presented in this work has demonstrated to be effective for detecting and classifying the set of anomalies shown above. As a tool for detecting signal inconsistencies in real time, this development proves to be effective, and this idea can be extrapolated to other systems where a vast set of data needs to be validated in a reduced amount of time. The research presented is focused on the primary descriptors for artificial neural networks intending to prove that machine classification is feasible and opens a new path towards improving diagnostics of the processed astronomical signal.