System Design Architectures for Next Generation ALMA Correlator. – Daniel Herrera

It was assembled an international expert team to study the applicability of new algorithms and technologies to an upgrade of the \ac{ALMA} radiotelescope’s digital processing systems, in particular, the correlator and the phased array. The technology used in the present \ac{ALMA} correlator is a decade old. It is possible with current and projected technology to improve bandwidth and spectral resolution; to reduce the size and power consumption (probably overall, and dramatically proportional to performance). Other features may also be added; for example while phased array summing and fast recording for \ac{VLBI} and pulsar work have been proven through the addition of equipment developed by the \ac{APP}, any future \ac{ALMA} digital back-end should combine the correlation, phased array and recording functions natively. This study examines the various ways the \ac{ALMA} correlator might be upgraded as a next-generation instrument, the technologies, algorithms, costs, and timelines, balanced against the benefits.

The study objectives were divided into work packages that were assigned to each team member. My involvement in this project was related to the study and documenting of the following concepts:

\subsubsection{Delay Tracking}
Four levels of delay compensation were identified as the ones required to properly maintain signal coherence across the different stages of signal processing; they are named bulk delay, coarse delay, fine delay, and residual delay. The bulk delay, intended to provide delay adjustment in steps of 64 samples, are meant to be implemented by two alternatives: one as an on-chip memory approach where the memory is integrated inside the UltraScale+\texttrademark\ \ac{FPGA}; and another off-chip strategy by making usage of \ac{RLDRAM} 3, which is the allowed memory interface for UltraScale+\texttrademark\ \ac{FPGA} family. The coarse delay, bringing a resolution of one sample per step, is considered to be performed in the FPGA’s fabric as an array of multiple 2-bit multiplexers known as barrel shifter. The fine delay, having the chance to move delays in steps of 1/16 of a sample, is expected to remain as it is in the current \ac{ALMA} implementation, which means it will be processed in the digital rack, before data is transmitted to the main building. The residual delay, which compensates delays with a resolution lower than 1/16 of a sample, is estimated to integrate a post-processing stage, after the F-engine block, with the purpose to align signal phases by applying convolution operations over the frequency domain.

\subsubsection{Monitor and Control}
One of the aspects to consider in this new correlator machine, besides the new signal processing capabilities, is how we intend to monitor and control the several operations involved. This section retrieves the fundamental communication points to be included in the architecture of this new machine, these are related to the configuration mode, delays (bulk, coarse, fine), link status, boards temperature, Walsh sequence, and square-law detector. Moreover, a study of the required communication bandwidth is established in terms of the maximum delay rate we will support under the number of antennas in the array (72) and the data package size.

\subsubsection{Polyphase \acs{FIR} Followed by \acs{FFT}}
One of the main candidates for the F-engine, in a \ac{FX} correlator, is based on a two-fold processing: first a Polyphase \ac{FIR} filter (coarse channelization) followed by a \ac{FFT} per each \ac{FIR} output channel (fine spectral analysis), scheme mostly known as \ac{PFB}. The proposed architecture consists in two stages: The first stage is a coarse channelization step, where the wideband signal is decomposed in several narrowband signals; The second stage is the spectral analysis of each individual sub-band. The addition of this filtering as a pre-processing stage aid to minimize the effects of leakage and scalloping loss, increasing the quality of the resulting spectrum. This section exposed the most essential specifications to comply with the processing requirements for the F-engine, proposed a concept architecture to connect filter and channelizer stages and finally, we present an overview of the number of multipliers, adders, and memories would be required to build this F-engine.

\subsubsection{\acs{DTS} to Ethernet Conversion}
It is assumed \ac{ALMA} will keep the current Digital Transmission System approach for communicating the already digitized samples to the correlator. The existing transmission system is based on a custom protocol which is implemented over a sonnet communication standard. The new design aims to change this custom protocol into a Ethernet protocol which allows connecting the F-engine to the digitized samples. The purpose of moving into an Ethernet alternative is due to this protocol is a widely used standard which provides a large set of debugging tools including network sniffer, which in turn can be beneficial during the commissioning process. Moreover, we can relief the development effort to bring up communication devices as the raw data coming from the antennas is foreseen to be used for other stages other than the F-engine itself. This section identifies the main requirements and provide some assumptions to determine the required logic resources we would need in order to build an interface to deal with the customized communication standard in \ac{ALMA} and our new communication approach.

\subsubsection{Gaussian-Distributed Simulated Signal}
As part of an exercise test set of the novel capabilities for the next generation \ac{ALMA} correlator, it was requested to provide a new signal pattern to test each processing stage, this time using simulated data from a 4-bit Gaussian-distributed signal generator. Between the most critical capabilities that bring a simulated signal generator, we will be able to verify the Van Vleck correction, test correlation efficiency under weak tone injection, and support testing of new features without the real usage of antennas. For accomplish this goal, we made our design based on Central Limit Theorem, which defines a Gaussian-like signal distribution composed by a combination of several random signals that share the same kind of distribution (uniform). The definition of the random signal was obtained from the \ac{PRN} units used currently in Baseline Correlator, and the combination scheme was a binary adder tree. This noise generator block was implemented in Xilinx Vivado under \ac{VHDL} design, including a generator primitive to determine, before the hardware synthesis, how much \ac{PRN} blocks the designer wants to include. For the sake of this study, 32 \acp{PRN} with random seeds were chosen and the behavioral simulation demonstrated the effectiveness of the proposed implementation, these results were compared to Matlab\texttrademark simulation and both confirmed that 32 \acp{PRN} as the base for the Gaussian-like distributed simulation is precise enough for testing the digital processing of the new correlator.

\subsubsection{Results}
The development of this work was performed by monthly group discussion by means of presentations and documentation reviews; two in-person meetings were done at the beginning for establishing the bases of the study and another one at the end of the project for closing ideas. From the study group, a new \ac{ALMA} memo has been submitted, which is one of the primary documents the team produced to summarize and describe each scientific and technical concepts. Finally, the report informed the successful functional, timing, and financial results obtained by this study with the feasibility to move forward on the steps towards the development of a new correlator.