Advanced Processors
Delve into the world of DSP processors and hardware architectures for signal processing, exploring topics such as Harvard Architecture, pipelining, and specialized instructions like MAC. Discover the simplified architecture of standard microprocessors and the need for DSP architecture with elements like SIMD, VLIW, and parallelism. Gain an understanding of the specialized nature of DSP processors and their crucial role in real-time operations and heavy data flow. Explore the concepts behind pipeline MAC operations, hardware structures, and the importance of parallelism in signal processing.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Advanced Processors (Books shared in separate link)
Agenda Overview of DSP Unit-5 Unit-6
DSP Processors DSP Processors are specialized microprocessor with an optimized architecture for the fast operational needs of digital signal processing.
Need for DSP Architecture Harvard Architecture Pipelining Fast dedicated hardware MAC Special Instruction Replication On-chip memory and cache Extended Parallelism- SIMD, VLIW, Superscalar Filtering, correlation, FFT Heavy data flow through CPU Real time operations Parallelism
Simplified Architecture of Standard Microprocessor Van Newman Architecture Independency between the operations Limitations on the increase in speed
Hardware Architecture for Signal Processing Multiple Bus Structure Separate data and program memory Data memory Coefficients, input data, out put samples, intermediated data
Special Instructions Special Instruction: MAC Repeat: RPT
Single Instruction Multiple Data (SIMD) Processing Data bus-A Data bus-B ALU MAC Shifter ALU MAC Shifter Execution Unit A Execution Unit B
SIMD Processing 16 bit 16 bit 16 bit 16 bit 16 bit 16 bit 16 bit 16 bi 16x16 MAC 16x16 MAC 16x16 MAC 16x16 MAC 32- bit result 32- bit result 32- bit result 32- bit result
Very Long Instruction Word (VLIW) Instruction fetch packet Eight 32 bit instructions Always 256 bits wide Internal Program Memory 8x32-bits Instruction fetch decode and dispatch Execution packet Dispatches instructions into appropriate execution units Varies from one to eight instructions (32 bits to 256 bits) nx32-bits L1 S1 M1 D1 L1 S1 M1 D1 Register file A Register file B Two data paths 32-bits 32-bits Internal data RAM
Superscalar Processors Uses instruction level parallelism Developed to execute multiple instructions in one cycle Achieved through multiple execution units Extensive use of pipelining Instruction width is not fixed An instruction can be issued to execute in parallel like SIMD Uses load/store architecture suitable to take two inputs and compute an output
Fixed point and Floating point representation 16-bit signed fractional point, often indicated as Q1.15 IEEE 754 normalized representation of a single precision floating point number.
General purpose DSP architecture DSP Processors Fixed point processors Floating point processors Represent each number with a minimum of 16 bits 216 = 65536 possible bit patterns can represent a number Unsigned integer : 0 to 65,535 Signed integer : -32,768 to 32,767 Unsigned fraction : spread uniformly between 0 to 1 Signed fraction : spread uniformly between -1 to 1 Represent each number with a minimum of 16 bits 232 = 4,294,967,296 possible bit patterns can represent a number Represented numbers are not uniformly spaced ANSI/IEEE Std. 754-1985-- the largest and smallest numbers are 3.4 1038 and 1.2x10-38, respectively The represented values are unequally spaced between these two extremes, such that the gap between any two numbers is about ten-million times smaller than the value of the numbers. This is important because it places large gaps between large numbers, but small gaps between small numbers
Fixed point digital signal processors First Generation Second Generation Third Generation TMS320C54xx,D SP563X and DSP16000 Aimed for Digital communication and Digital Audio Special instructions for Adaptive filtering which included echo cancellations and adaptive equalization and Viterbi decoding Low power and had power management facility Fourth Generation TMS320C1X by TI in 1982 Dedicated AU with multiplier and accumulator Harvard architecture with separate program and data memory On-chip memory and special instructions for execution of basic DSP algorithms TMS320C5X from TI, DSP5600X from Motorola, ADSP21XX from Analog Devices, DSP16XX from Lucent Technologies Enhanced features than first generation Larger on-chip memory and more special instructions to execute DSP algorithms MAC with Repeat TMS320C62XX VLIW Included extensive parallelism while maintaining the features of earlier versions Wider instructions, wider data paths more registers, larger instruction cache and multiple AU
Floating point DSP processors First Generation Second Generation Third Generation TMS320C3X TI Larger memory and many on-chip peripheral facilities Program cache and on-chip dual access memories Graphics and Image processing Supported three floating point formats TMS320C4X, ADSP- 2106x SHARCH Emphasis on multiprocessing and multiprocessor support TMS320C67xx, ADSP-TS001 VLIW
Special purpose Digital Signal Processor Hardware digital filters : FIR FIR structure Hardware Architecture for FIR filter
Special Purpose Digital Signal Processor Hardware digital filters : IIR IIR Structure Hardware architecture for IIR filter
Special purpose Digital Signal Processor Hardware FFT Processors Simplified architecture of hardware FFT processor Concept of hardware butterfly processor
Special purpose Digital Signal Processor Hardware FFT Processors Double buffering in real-time FFT FFT performed on N point data in buffer A while buffer B is being filled
Name of the unit .L unit .S unit .M unit .D unit Type of operation 32 bit add and subtract operations only Arithmetic operation 32/40 bit operation 32 bit operation - Logical operation 32-bit operations 32-bit operations - 32-bit logical operations* 16x16 multiply operations Multiply operations - - - 32/40 bit shift operations Shift operations - - - Compare operations 32/40 bit operation - - - Branch operations - Yes - - Loads and stores with 5-bit constant offset(15 bit constant offset in .D2 only) Load and Store operations - - - Linear and circular address calculation - - - Yes Constant generation - Yes - - 32/40 bit count operations Count operations - - - 16 bit move operations Move operations Register to register only - Register to register only
TMS320C67XX CPU data paths Data lines: scr1 and scr2 32bits (All) 40bits (.L, .S) Register File Cross Paths: Functional units can read and write operands from own register files .L1,.S1,.M1, .L2, .S2, .M2 have access to opposite side registers through cross paths Memory Load and Store Paths: LD1and LD2 (LDDW) ST1 and ST2 Data Address Paths: DA1 and DA2 allows data address generated by any one path to access data to or from any register
Control Registers (accessed by .S2 alone using MVC) Register Name Abbre. Description Addressing Mode Reg. AMR Specifies linear or circular addressing of A4-A7 &B4-B7 Control Status Reg. CSR Contains important control and status bits of the processor Program Counter E1 Phase Reg. PCE1 Contains the address of the fetch packet that is in the E1 phase of the pipeline Interrupt Flag Reg. IFR Contains the status of INT4-INT5 and NMI maskable interrupts Interrupt Set Reg. ISR Used to manually set maskable pending interrupts Interrupt Clear Reg. ICR Used to manually clear maskable pending interrupts Interrupt Enable Reg. IER Used to enable/disable the individual maskable interrupts Interrupt Service Table Reg. ISTP Points to beginning of interrupt service table Interrupt Return Pointer IRP Contains the address to be used to return from a maskable interrupt Non-maskable Interrupt Return Pointer NRP Contains the address to be used to return from a non- maskable interrupt
Address Mode Register AMR 31 26 25 21 20 16 Reserved BK1 BK2 B7 mode B6 mode B5 mode B4 mode A7 mode A6 mode A5 mode A4 mode 15 0 Mode Select Description of mode 0 0 Linear modification of address 0 1 Circular addressing using BK0 1 0 Circular addressing using BK1 1 1 Reserved
Unit 5 Introduction to Computer Architecture R5-12.1, R5-12.2 General purpose Digital Signal Processors R5-12.3 Selecting digital signal processors R5-12.4 Special purpose DSP Hardware R5-12.6 Architecture of TMS320C67X Reference GuideTMS320C67XX/T2-13.2 Features of C67X processors Reference GuideTMS320C67XX/T2-13.2 TMS320C67x/C67x+ DSPCPU and Instruction Set Reference Guide/T2-13.4 CPU General purpose register files TMS320C67x/C67x+ DSPCPU and Instruction Set Reference Guide/T2-13.5 Functional units and operation TMS320C67x/C67x+ DSPCPU and Instruction Set Reference Guide/T2-13.6 Data paths TMS320C67x/C67x+ DSPCPU and Instruction Set Reference Guide/T2-13.7 Control register file TMS320C67x/C67x+ DSPCPU and Instruction Set Reference Guide/T2-13.8
Functional Units Name of unit .L Type of operations Arithmetic, Logical, Compare , Other Arithmetic, Logical, Shift , Branch, Move, Other Multiply Arithmetic, Load store, Other .S .M .D
Addressing Modes Register Addressing mode: mnemonic .unit scr1, scr2, dst Mnemonic used could be ADD, SUB, MPY etc. ADD .L1 A1, A2,A3 ADD .S2 B1, B2, B2 ADD .L1 X A1,B2, A2 Linear Addressing mode Uses Circular Addressing mode .D1 and .D2
Addressing Modes Linear Addressing mode: Uses .D1 and .D2 mnemonic .unit mode field, dst Load, store *+baseR[offsetR/ucst5] Positive offset from baseR specified by offserR/ucst5 *-baseR[offsetR/ucst5 ] Negative offset from baseR specified by offserR/ucst5 *++baseR[offsetR/ucst5] Pre-incrmt from baseR specified by offserR/ucst5 *--baseR[offsetR/ucst5 ] Pre-decrmt from baseR specified by offserR/ucst5 *baseR++[offsetR/ucst5 ] Post-incrmt from baseR specified by offserR/ucst5 *baseR--[offsetR/ucst5 Post-decrmt from baseR specified by offserR/ucst5
Addressing Modes Linear Addressing mode: Uses .D1 and .D2 mnemonic .unit mode field, dst LDW .D1 *A0[1], A1 Load contents of mem located pointed by contents of A0+offset(1 left shifted twice) into reg A1 (left shift by 3, 2, 1, 0 for double word, word, half word, byte respt.) LDW .D1 *++A0[A4], A1 LDW .D1 *A0++[2], A1
Addressing Modes Circular Addressing mode: Uses .D1 and .D2 A4-A7 and B4-B7 are used Address mode register is used to select modes for A4/B4 A7/B7 mnemonic .unit mode field, dst
Fixed Point Instructions Instruction Functional Unit Description MV .L1 or .L2 .S1 or .S2 .D1 or .D2 Move value from one register to another MVC .S2 only Move value between control register and registerfile MVK .S1 or .S2 Move 16-bit const into lower 16-bits of a register and sign extended MVKLH .S1 or .S2 Move 16-bit const into upper 16-bits of a register MVKH .S1 or .S2 Move upper 16-bit const value of 32-bit into upper 16- bits of a register
Flow of Execution Conditional Operations: All instructions can be conditional A1,A2,B0,B1,B2 are tested for conditional operation (value as zero or non zero can be tested) Specified condition in register is tested at the beginning of Execution E1 phase Parallel Operation: 8 instructions are fetched to form Fetched packet Execution of these instructions is controlled by scanning p-bit from left to right P=1 of ith instruction; then i+1th instruction is to be executed in parallel with ith instruction P=0 of ith instruction; then i+1th instruction is to be executed in the next machine cycle after ith instruction
Flow of Execution Fully serial : p bits are zero; need 8 m/c to execute; Fully parallel : p bits are 1; need 1m/c Partially serial :
Flow of Execution In summary
Pipelining Fetch Operation Program address generate PG Program address send Program access ready wait Program fetch packet receive PR Mem addr of 8 instr of fetch packet is generated Address are send to mem Mem read operation 8 instrn are received in CPU PS PW Execution will depend on fully serial, fully parallel or partially serial type
Pipelining Decode Operation DP- Instruction dispatch Fetched packet are spilt into execution packet Execution packet consists of one instrn or two to eight parallel instrn Instrn are assigned to appropriate functional units DC-Instruction decode Source registers , destination registers and associated paths are decoded
Pipelining Execute Operation E1 E2 E3 E4 E5 Fixed point processor E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 Floating point processor
Internal Memory Cached based internal mem arch. 2 level mem arch L1P,L1D -4k size Not inculded in Mem. Map Always enabled L2 64k size shared for both program and data mem First L1P and L1D are accessed and if a miss occurs then L2 is accessed L2 controller facilitates CPU access EMIF CPU access Peripherals