FPGA Projects

Figure 1-1 Precise clock distribution using optical fiber diagram

Figure 1-2 Data and instruction transmission using optical fiber diagram

Figure 1-3 Block diagram of the verification system and testing method

Figure 1-4 Real world system and testing environment

Figure 2-1 Block diagram of the digital phase lock drift compensator

Figure 3-1 FPGA system top level block diagram

Figure 3-2 Block diagram of the digital phase lock processing module

Figure 4-1 clock synchronization performance (open loop)

Figure 4-2 clock synchronization performance (close loop)

Distributed data acquisition system over long distance for radio astronomy telescope based on optical fiber

1, Overview

This distributed data acquisition system use optical fiber to implement precise clock distribution and data/instruction transmission at the same time. It is designed for low frequency radio astronomy telescope (China’s 21CMA) to replace their analogy beamforming to digital beamforming and improve their system performance.

I built a verification system simulating two nodes, one local node and one remote node. The system contains two FPGA board (Xilinx KC705), two ADC board (ADI AD9361), two clock jitter cleaner board (TI LMK04906) and some other support hardware. I design and implement the entire FPGA software and write part of the driver and windows application on the PC side.

The highlight of this system is that the design of a digital phase lock method to compensate the length drift of the optical fiber because of the change of temperature and hugely decrease the long-term clock jitter in the remote side. We test the clock synchronization performance between the local and remote node using this method, the short-term jitter is 1.2ps RMS and the long-term jitter is 5ps RMS. The optical fiber is 200 meter in length and the temperature changes dramatically (at least 30 degree in range). I have been writing a patent based on this method and it will be published soon.

Figure 1-1 Figure 1-2 are the basic diagram of the precise clock distribution and data/instruction transmission using optical fiber.

Figure 1-3 is the block diagram of the verification system and the testing method. The basic idea is as follow. We used the signal generator and splitter two generate two identical input signal. This two identical signal were input to the local and remote ADC sampling board. We control the system to sampling the signal at the same time (through the local node) and store the sampling data in the DDR3 memory in the FPGA mother board. Then we transmitted the remote node sampling data to the local node. Finally, we transmitted the local/remote sampling data to the control/analysis PC through PCI-Express interface and analyse the short-term and long-term clock jitter using FFT. Figure 1-4 is the real world system and testing environment.

2, Drift compensation method

The basic idea of the digital phase lock method to compensate the length drift of the optical fiber because of the change of temperature is as follow. We used the reciprocity of the round trip of the optical fiber transmission, which means the transmit link and the receive link of the optical fiber changed simultaneous. Thus, we used a phase detector to detect the phase difference between the local node optical fiber receive clock and the local node source reference clock (local sampling clock in practical). Then we digitalized the phase difference using the XADC in FPGA. We then calculated the compensate phase difference to the remote node through the change of the phase difference (caused by the length drift of the optical fiber) in the FPGA. At last we change the local node optical fiber transmit clock using a Direct Digital Synthesizer (DDS) to compensate the clock jitter in the remote side.

The block diagram of the digital phase lock drift compensator is shown in Figure 2-1.

3, FPGA system level design and implementation

I designed and implemented the entire FPGA software. I also take care of the entire debugging (both hardware and software) and testing process.

The FPGA software system contains five main modules. 1, real-time data sampling module. 2, optical fiber transmission control module (using CPRI IP). 3, memory control module. 4, PCI-Express interface module. 5, digital phase lock processing module. The remote node and the local node FPGA system are almost identical except remote node didn't contain module 4 and 5. Below is the block diagram of top level of the FPGA software system.

The FPGA software system contains mainly 4 clock domains. 1, real-time sampling data clock domain (200MHz). 2, optical fiber transmission clock domain (153.6MHz) 3, DDR3 interface clock domain (200MHz). 4, PCI-Express interface clock domain (250MHz).

3.1 Real-time data sampling module

The AD9361 PHY module was transplanted from the ADI design demo in github, which including a Microblaze soft core executed the ad9361 physical driver (in C code).

3.2 Optical fiber transmission control module

The Optical fiber transmission control module used CPRI IP to achieve optical fiber physical interface and it has three basic functions. 1, clock control and distribution. 2, data transmission. 3, system control and monitor instruction transmission. This was implemented by a self-design data/instruction transmission protocol.

3.3 Memory control module

The memory control module used a static priority control data scheduling method to control the six basic data streams in real - time. The memory control module also contains a DDR3 interface module using Xilinx MIG IP.

3.4 PCI-Express interface module

The PCI-Express interface module was programmed in TLP level. Using Memory_Rd Memory_WR Register_RD Register_WR TLPs.

The PCI-Express interface module and the Memory control module made up a basic Direct-Memory-Access (DMA) system with the host PC. The DMA minimum read and write size is 4096 Byte which is the same length as the real-time data frame size.

3.5 Digital phase lock processing module

Figure 3-2 is the detailed implementation diagram of the digital phase lock processing module in the local node which is the main part of the drift compensation method.

4 Distributed clock synchronization performance test

The clock synchronization performance or the clock jitter performance test is the core performance of this system. It directly determines the sampling performance of the distributed acquisition system. The highlight feature of our system, i.e. drift compensation method hugely improve the clock synchronization performance. Here is the comparison test between not using this method (open loop) and using this method (close loop).

The test environments are as follow. The optical fiber was 200 meter in length. The optical fiber was heated uniformly by a drier and cool down naturally to simulate the temperature drift in the real world. The input signal to the ADC are 2401Mhz and the LO of the ADC is 2400MHz (reference by the distributed clock recovery from optical fiber). We sample the data in the local node and the remote node at the same time every 15 seconds and analysed the data. We calculate the phase difference in average of the data from the two node and finally converted into time unit (picosecond) i.e. clock jitter. In the figure below y axis is the phase difference in 2401MHz.

Open loop:

  • long-term clock jitter peak to peak : 289.8391ps

  • long-term clock jitter RMS : 60.3373ps

Close loop:

  • long-term clock jitter peak to peak : 22.6991ps

  • long-term clock jitter RMS : 3.9427ps

Figure 1 Block diagram of the high speed real-time data caching and scheduling architecture

Figure 2 The actual picture of the verification system

High speed real-time data caching and scheduling architecture based on microblaze and AXI on Xilinx FPGA

1, Overview

This high speed real-time data caching and scheduling architecture mainly design for high-speed data acquisition and real-time data processing system in radar, communication, radio astronomy, etc application.This architecture make use of the standard AXI bus and soft-core processor (Microblaze) to decouple the complex data caching and scheduling logic from RTL to general purpose processor. This architecture offers simplified API to PC (using PCIE interface) as well as soft processor (xilinx mircoblaze) to accomplish high complexity data caching and scheduling logic in large-scale high speed signal processing system.

Recently this architecture has been applied on a channel sounding system in the 5th mobile communications system conducted by HUAWEI. It works pretty well.

I designed the architecture and built a verification system. The system used one FPGA board (Xilinx KC705), and two ADC boards (ADI AD9361) in order to simulate the real time data stream loads. The FPGA project was developed through the latest Vivado IDE (version 2014.4). I reused some parts which is already verified in the project Distributed data acquisition system over long distance for radio astronomy telescope based on optical fiber. Including the PCI-Express interface, DDR3 Interface, DMA subsystem etc. These module were packaged into Vivado IP and used in this project.

Figure 1 is the block diagram of the architecture and the actual picture of the verification system.

Figure 2 is the actual picture of the verification system.

Figure 1-1 Block diagram of the real-time signal processing subsystem in board level

Figure 1-2 Block diagram of the synchronous acquisition subsystem in board level

Figure 1-3 Actual picture of the four channel signal acquisition (500Msps) board

Figure 2-1 Block diagram of the signal detection module

Figure 2-2 Block diagram of the digital channelizer module

Figure 2-3 Block diagram of the new pulse detector module

Figure 2-4 Block diagram of the parameter coarse measure module

Figure 2-5 Block diagram of the frequency measurement module

Figure 2-6 Block diagram of the time/phase difference measurement module

FPGA implementation of pulse signal detection and parameters measurement subsystem of a 24 channel synchronous digital receiver

1, Overview

The 24 channel broadband real-time synchronous digital receiver is the essential infrastructure of a passive direction finding and locating (DLS) system. The digital receiver system contained two 6U 13-Slot rackmount chassis. One is the synchronous acquisition subsystem which consisted 1 host PC board, 6 four channel signal acquisition (500Msps) board and 6 data transmission board. The other is the real-time signal processing subsystem which contained 1 host PC board and 6 data receive board and 6 signal processing board. The customized FPGA boards were all designed by the hardware team in my lab.

The synchronous acquisition subsystem sampled the 24 channel IF signal and cached the sampling data in the DDR2 memory in real time. The sampling data in each channel were divided in 12 time slots and then transmitted them to the real-time signal processing subsystem through the 6 transmission board. Each signal processing board received 2 time slots of 24 channel sampling data and conducted the signal processing (1 FPGA for 1 time slot). The block diagram (in board level) of the synchronous acquisition subsystem and the real-time signal processing subsystem are in Figure 1-1 and Figure 1-2.

I was implemented the pulse signal detection and parameters measurement IP core which is the core DSP module of the real-time signal processing subsystem. The FPGA platform is XC5VSX95T-1FF1136 and the development environment is Xilinx ISE 12.3 and modelsim 6.5d. I also participated the entire debugging and testing process, including the broad level debugging ,system level debugging, system testing and the field testing.

Figure 1-3 is the actual picture of the four channel signal acquisition (500Msps) board. The signal processing board is the same except it didn't have 4 ADC chips. It contained 2 FPGA(XC5VSX95T-1FF1136) and 2 1GB DDR2 memory.

2, Detailed FPGA design and implementation

The pulse signal detection and parameters measurement subsystem contained 3 modules each packaged to a IP core. 1, signal detection module. 2, frequency measurement module. 3, time/phase difference measurement module. The processing frequency of these IP core are 200Mhz.

The signal detection module is the most sophisticated among the three modules. It had huge amount of computing logic and complex control logic. It detected the pulse signal in real time and measure the coarse pulse descriptor word (CPDW) of the pulse signal. The frequency measurement module and the time/phase difference measurement module used the CPDW to calculate the precise parameters of the pulse signal.

2.1, Signal detection module

The signal detection module contained three parts. 1, digital channelizer. 2, parameters coarse measurement. 3,CPDW filter. Figure 2-1 is the block diagram of the signal detection module.

In the digital channelizer part, first the 24 channel input signal was processed by a simple narrow-band digital beamforming (weighted accumulation) module since the input signal are in X band (10GHz). Then the beam data were processed by a digital channelizer (128 sub-channel). The input signal was frequency divided into 128 subbands. Then the output power spectrum (128 point FFT equivalent) was used to coarsely measure the pulse signal parameters. The coarsely measured parameters were packaged in to coarse pulse descriptor word (CPDW). At last, the CPDW was process through a CPDW filter module. Figure 2-2 is the block diagram of the digital channelizer module (128 sub channels). Figure 2-3 is the block diagram of the pulse detector module. This module are new pulse signal detected and judgment module. Figure 2-4 is the block diagram of the parameter coarse measure module (PCM). There are two parallel PCM modules. The system can simultaneously detect and measure two pulse signal (same arriving time, different freqency). The PCM module coarsely measure the frequency (128 point FFT equivalent), power, time of arrive (256ns), etc. And package them into a CPDW package.

2.2, Frequency measurement module

The frequency measurement module were designed to calculate the precise frequency of the pulse signal detected. The input data are 1 channel data. It made use of the coarse frequency parameter (128 point FFT equivalent) in the CPDW calculate by the signal detect module and used a 9 bit local oscillator (LO) frequency to modulate the input signal to baseband. Then the modulated data are accumulated (accumulate data points are controlled by a 5 bit parameter). This is a equivalent discrete fourier transformation (DFT) calculation. And the DFT point can be controlled by the LO frequency parameter and the accum points parameter ranging from 2048 to 4096.

At last the different DFT results were sent into a peak search module to search the precise frequency of the pulse signal. If the pulse signal is a broadband signal the output frequency is the highest bin of the power spectrum. The precise frequency results were then used to calculate the time and phase difference of the different channel of the pulse signal. The block diagram of the frequency measurement module is as shown in Figure 2-5.

2.3 Time/phase difference measurement module

The time/phase difference measurement module contained two main parts. 1, time difference measurement. In this part the 24 channel data were modulate to the baseband and then processed through a variable bandwidth filter (low pass) and a smoothing filter. And then the time difference of the 24 channel pulse signal were calculate through the threshold method.

2, phase difference measurement. In this part the data in each channel are calculate through frequency conversion (modulation) and accumulation which is also an equivalent DFT calculation. And then the result were sent to the phase difference calculate module to calculate the phase difference using the coordinate rotation digital computer (CORDIC) algorithm. Figure 2-6 is the block diagram of the time/phase difference measurement module.