“DSP chip, also called digital signal processor, is a kind of microprocessor especially suitable for digital signal processing. In the process of product development, we often need to process the signal in real time, which means that the system must complete the specified processing function for the externally input signal within a limited time, that is to say, the signal processing speed should be greater than the signal update speed. The processor structure, instruction system, and data flow method of the DSP chip make it easy to meet the requirements of real-time signal processing.
DSP chip, also called digital signal processor, is a kind of microprocessor especially suitable for digital signal processing. In the process of product development, we often need to process the signal in real time, which means that the system must complete the specified processing function for the externally input signal within a limited time, that is to say, the signal processing speed should be greater than the signal update speed. The processor structure, instruction system, and data flow method of the DSP chip make it easy to meet the requirements of real-time signal processing. The application of DSP has covered almost every field of electronics and information. This article does not need to list its applications one by one, and does not intend to spend unnecessary space to introduce the structure and principle of DSP, because the books and materials in this area are also relatively good. many. This article combines the author’s insights based on the baseband signal processing of a national defense scientific research project based on TI’s C5510 series DSP, and talks about the application and implementation of the C55x series DSP in baseband signal processing, because there are already a lot of information about the C54x series DSP. However, there are relatively few books and information about C55x series DSP. Although both C55x and C54x belong to TI’s C5000 series, many books are often only replaced by “C54x and C55x are fully compatible in software”. But for a DSP developer, it is not such a simple matter. We consider not only the realization of its functions, but also how to optimize and utilize resources. So it is necessary to study the improved functions of C55x on the basis of C54x and discuss the application of C55x.
2. Comparison between C55x and C54x
The C54x series is a fixed-point DSP specially designed for low-power, high-performance, high-speed real-time signal processing. It is widely used in wireless communication systems. Its CPU has the following characteristics:
⑴ Using an improved Harvard structure, one program bus (PB), three data buses (CB, DB, EB) and four address buses (PAB, CAB, DAB, EAB);
⑵ 40bit arithmetic logic unit (ALU), a 40bit shifter and two 40bit accumulators (A, B), support 32bit or dual 16bit operations.
⑶ The combination of a 17bit×17bit hardware multiplier and a 40bit special adder (MAC) can complete multiplication and addition in one cycle;
⑷ Units such as comparison, selection and storage can accelerate the execution of Viterbi decoding.
⑸ The dedicated EXP encoder can complete the exponential calculation of the 40bit value in the accumulator in one cycle.
⑹Single data address generating unit (DAGEN) and program address (PAGEN) generating unit can perform three read operations and one operation at the same time.
Compared with C54x, C55x has increased its comprehensive performance by 5 times by adding functional units, while its power consumption is only 1/6 of C54x. C55x uses variable-length instructions to improve code efficiency, and enhances parallel mechanisms to improve loop efficiency. Not only does it increase hardware resources, but also optimizes resource management, so the performance has been greatly improved, and its processing capacity can reach 400-800MIPS. C55x has made the following expansions in the functional units of the CPU:
⑴ Two bus lines have been added, one read operation line (BB) and one write operation line (FB);
⑵ One multiplication and addition unit (MAC) has been added;
⑶ Added a 16bit ALU;
⑷ Increase the accumulator to 4, namely AC0, AC1, AC2 and AC3;
⑸ Increase the number of temporary registers to 4, namely T0, T2, T2 and T3;
Due to structural changes, we must pay attention to the change relationship between C55x and C54x registers in the system design, especially when we use the compatibility mode with C54x in the C55x design instead of the enhanced mode, which is more important. The following table shows the register correspondence between C54x and C55x.
Although C55x is also compatible with C54x and can run C54x instructions on C55x DSP, C55x is different from C54x, and C55x has made greater simplifications in instructions. For example, compared to the loading (LD) and storage (ST) of the C54x, the C55x uses the more flexible and easy-to-use MOVE operation instructions to implement loading and storage, expanding the scope of MOVE operations to data exchange and stack operations. In addition, in the compatibility mode, we should pay attention to the use of XC, SACCD and ARx+0.
3. Application of C5510 in baseband signal processing
Let’s talk about the application and realization of C5510 in the baseband signal processing of the communication system in conjunction with a certain national defense project that the author participated in. Due to space limitations, only the program flow chart is given, and the source code is omitted.
1. The task of DSP in baseband signal processing
In the processing of the baseband signal of this system, DSP mainly completes data scrambling and descrambling, convolutional coding and VITERBI decoding, interleaving and de-interleaving, framing (or sub-frame) and de-framing, etc. First, perform randomization and scrambling for the main transmission data (using external synchronization presets, using n=17-level m-sequence), and then perform (2, 1, 7) convolutional coding, and constrain the convolution of length K=7 Code, the generating polynomial is (expressed in octal): 1+D+D^2+D^3+D^6=(171), octal g1=171, G1=1+D^2+D^3+ D^5+D^6=(133), octal, g2=133, so the tail bit K-1=6 bits must be added before each encoding. The number of bits in a subframe after encoding is 50 (taking into account the compensation for the transmission rate occupied by the control information bits within a large frame). After adding the control information bits of each subframe (such as subframe data type bits), the effective number of bits in a subframe is 56, and then after 7×8 block interleaving, plus 8 bits of synchronization protection code, it finally becomes A 64-bit sub-frame is sent to the modulator after buffering and other processing.
2.Implementation of baseband signal processing based on C5510
A. Data scrambling and descrambling
The scrambling is implemented using n=17-level m-sequences. The octal representation of the generator polynomial is g=400011, the polynomial f(x)=x17+x3+1, and there are three feedback taps. And adopt the external synchronization preset type to reduce the error spread. Each time a large frame (including 20 sub-frames) is transmitted, the preset pulse is triggered once, and the pulse preset can be realized by software. The logic principle of scrambling and descrambling is shown in Figure 3. Scrambling and descrambling can be solved only by using the XOR src and dst of the C55x circularly, so there is no need to elaborate.
B. Convolutional encoding and decoding
Convolutional codes (2, 1, 7) with better performance than block codes are used. The limit length is K=7, and the generator polynomial (in octal representation) G0=171, G1=133, free distance df=10, gradually Near coding gain Gh=3.98dB. The schematic diagram of the convolutional code encoder is shown in the figure below.
The output sequence of the convolutional encoder is G0 G1 G0 G1 G0 G1. … ., In the DSP C5510 programming, you can use the instruction BFXPA to complete the output sequence arrangement so that the macro file can be called multiple times in the program, thereby simplifying and shortening the source program, and a macro can be defined in specific implementation:
merge .macro src1, src2, temp, dst; macro definition
BFXPA #5555h, src1, temp; extract the even-numbered bits of src1
BFXPA #0AAAAh, src2, temp; extract bits in odd positions of src2
XOR temp, dst; XOR operation between the two
SFTL src1, #-8, src1; src1 is shifted 8 bits to the right
SFTL src2, #-8, src2; src2 is also shifted 8 bits to the right
Convolutional code decoding adopts the maximum likelihood decoder-Viterbi decoding. The process is shown in Figure 5.
The algorithm idea is:
① Starting from the time unit j=m, calculate the partial measurement of a single path into each state and store the measured path and its measurement. Such a path is called a survival path.
② j is increased by 1, adding the component degree of entering a certain state to the surviving path measurement related to the previous time unit. Calculate the partial metrics of all paths into this state. Store the path with the largest metric for each state, that is, the surviving path and its metric, and delete all other paths.
③ If j<(L+m), repeat step ②, otherwise stop. Here L is the code word length, m=6.
The calculation of the branch metric value adopts soft decision, that is, Euclidean distance. For a convolutional code with a coding rate of 1/2, its branch metric value is:
T=SD0 G0 (j) + SD1 G1 (j)
For the convenience of calculation, Gn (j) is expressed as bipolar, 0 is expressed as +1, 1 is expressed as -1, or vice versa, so that the calculation of branch metric value can be simplified to the addition and subtraction of data. In the DSP implementation process, they can be represented by registers:
T0: + SD0 + SD1
T1: + SD0 －SD1
In C55x, special application instructions ADDSUB, SUBADD and MAXDIFF can be used to complete the accumulation, comparison and selection of the metric value of each state path, and can make full use of the pipeline processing advantages of C55x. In order to facilitate the call, the Viterbi butterfly operation using pipeline processing can be defined as a macro.
C. Interleaving and de-interleaving
The general error correction coding is aimed at random errors, but the errors generated in the wireless channel are mostly burst errors. Therefore, we use interleaving technology to discretize the burst errors into random errors, which is actually a hidden error. Diversity technology can achieve the effect of anti-deep fading. However, interleaving will have an impact on the system delay. Considering the error correction performance and complexity of the system, a 56-bit block interleaving method in a subframe is adopted. If it is processed in matrix form, it is written in rows at the originating end and read out in columns at the receiving end. Of course, you can also write in columns at the sender and read in rows at the receiving end.
When implementing interleaving in C55x, you can use AR0 to point to the input buffer address of the data to be interleaved, AR1 to point to the interleaving table, and AR2 to point to the address of the data to be interleaved. AR1 is incremented by 1, and the bit position corresponding to the interleaved data word pointed to by AR2 is also incremented by 1. The content pointed to is the address offset of the input buffer. The bit pointed to by this offset is the bit that needs to be interleaved to the word pointed to by AR2 Location. The important structure of the program is equivalent to two layers of loops. In the outer loop, the pointer AR2 is incremented by 1, and the inner loop is executed 16 times. De-interleaving is the inverse process of interleaving. The same interleaving table is used. The program structure is roughly the same as that of interleaving, but the direction of bit movement is opposite. Therefore, in the process of programming, you only need to modify the interleaving program.
With the rapid development of DSP technology, the improvement of chip integration has also reduced the cost of DSP chips, which has increased the demand for DSP and the expansion of application fields. DSP has shifted from military to civilian use and has been widely used in the entire Electronic information field. More and more people start or engage in the design and development of DSP. We know that the digitization, broadbandization, intelligence and multimedia requirements in modern communication systems all put high requirements on signal processing. A DSP can often only perform physical layer processing, but cannot complete processing control and high-level signaling. Therefore, DSP must be combined with another processor. TI Company combines the C55x DSP core with the ARM9 microprocessor with strong control performance to launch the Open Multimedia Application Platform (OMAP). It can be predicted that the combination of DSP and other microprocessors is the future development direction of DSP.