The TMS320C64x+ DSPs (including the TMS320C6421 device) are the highest-performance fixed-point DSP generation in the TMS320C6000 DSP platform. The C6421 device is based on the third-generation high-performance, advanced VelociTI very-long-instruction-word (VLIW) architecture developed by Texas Instruments (TI), making these DSPs an excellent choice for digital signal processor applications. The C64x+ devices are upward code-compatible from previous devices that are part of the C6000 DSP platform. The C64x DSPs support added functionality and have an expanded instruction set from previous devices.
Any reference to the C64x DSP or C64x CPU also applies, unless otherwise noted, to the C64x+ DSP and C64x+ CPU, respectively.
With performance of up to 4800 million instructions per second (MIPS) at a clock rate of 600 MHz, the C64x+ core offers solutions to high-performance DSP programming challenges. The DSP core possesses the operational flexibility of high-speed controllers and the numerical capability of array processors. The C64x+ DSP core processor has 64 general-purpose registers of 32-bit word length and eight highly independent functional unitstwo multipliers for a 32-bit result and six arithmetic logic units (ALUs). The eight functional units include instructions to accelerate the performance in telecom, audio, and industrial applications. The DSP core can produce four 16-bit multiply-accumulates (MACs) per cycle for a total of 2400 million MACs per second (MMACS), or eight 8-bit MACs per cycle for a total of 4800 MMACS. For more details on the C64x+ DSP, see the TMS320C64x/C64x+ DSP CPU and Instruction Set Reference Guide (literature number SPRU732).
The C6421 also has application-specific hardware logic, on-chip memory, and additional on-chip peripherals similar to the other C6000 DSP platform devices. The C6421 core uses a two-level cache-based architecture. The Level 1 program memory/cache (L1P) consists of a 128K-bit memory space that can be configured as mapped memory or direct mapped cache, and the Level 1 data (L1D) consists of a 384K-bit memory space that can be configured as mapped memory or 2-way set-associative cache. The Level 2 memory/cache (L2) consists of a 512K-bit memory space that is shared between program and data space. L2 memory can be configured as mapped memory, cache, or combinations of the two.
The peripheral set includes: a 10/100 Mb/s Ethernet MAC (EMAC) with a management data input/output (MDIO) module; a 4-bit transmit, 4-bit receive VLYNQ interface; an inter-integrated circuit (I2C) Bus interface; a multichannel buffered serial port (McBSP0); a multichannel audio serial port (McASP0) with 4 serializers; 2 64-bit general-purpose timers each configurable as 2 independent 32-bit timers; 1 64-bit watchdog timer; a user-configurable 16-bit host-port interface (HPI); up to 111-pins of general-purpose input/output (GPIO) with programmable interrupt/event generation modes, multiplexed with other peripherals; 1 UART with hardware handshaking support; 3 pulse width modulator (PWM) peripherals; and 2 glueless external memory interfaces: an asynchronous external memory interface (EMIFA) for slower memories/peripherals, and a higher speed synchronous memory interface for DDR2.
The Ethernet Media Access Controller (EMAC) provides an efficient interface between the C6421 and the network. The C6421 EMAC supports 10Base-T and 100Base-TX or 10 Mbits/second (Mbps) and 100 Mbps in either half- or full-duplex mode, with hardware flow control and quality of service (QOS) support.
The Management Data Input/Output (MDIO) module continuously polls all 32 MDIO addresses in order to enumerate all PHY devices in the system.
The I2C and VLYNQ ports allow C6421 to easily control peripheral devices and/or communicate with host processors.
The rich peripheral set provides the ability to control external peripheral devices and communicate with external processors. For details on each of the peripherals, see the related sections later in this document and the associated peripheral reference guides.
The C6421 has a complete set of development tools. These include C compilers, a DSP assembly optimizer to simplify programming and scheduling, and a Windows debugger interface for visibility into source code execution.
High-Performance Digital Signal Processor (C6421)
2.5-, 2.-, 1.67, 1.43-ns Instruction Cycle Time
400-, 500-, 600-MHz C64x+ Clock Rate
Eight 32-Bit C64x+ Instructions/Cycle
3200, 4000, 4800, 5600 MIPS
Fully Software-Compatible With C64x
Commercial and Automotive (Q or S suffix) Grades
Low-Power Device (L suffix)
VelociTI.2 Extensions to VelociTI Advanced Very-Long-Instruction-Word (VLIW) TMS320C64x+ DSP Core
Eight Highly Independent Functional Units With VelociTI.2 Extensions:
Six ALUs (32-/40-Bit), Each Supports Single 32-Bit, Dual 16-Bit, or Quad 8-Bit Arithmetic per Clock Cycle
Two Multipliers Support Four 16 × 16-Bit Multiplies (32-Bit Results) per Clock Cycle or Eight 8 × 8-Bit Multiplies (16-Bit Results) per Clock Cycle
Load-Store Architecture With Non-Aligned Support
64 32-Bit General-Purpose Registers
Instruction Packing Reduces Code Size
All Instructions Conditional
Additional C64x+ Enhancements
Protected Mode Operation
Exceptions Support for Error Detection and Program Redirection
Hardware Support for Modulo Loop Auto-Focus Module Operation
C64x+ Instruction Set Features
Byte-Addressable (8-/16-/32-/64-Bit Data)
8-Bit Overflow Protection
Bit-Field Extract, Set, Clear
Normalization, Saturation, Bit-Counting
VelociTI.2 Increased Orthogonality
C64x+ Extensions
Compact 16-bit Instructions
Additional Instructions to Support Complex Multiplies
C64x+ L1/L2 Memory Architecture
128K-Bit (16K-Byte) L1P Program RAM/Cache [Flexible Allocation]
384K-Bit (48K-Byte) L1D Data RAM/Cache [Flexible Allocation]