4-way set associative Next186MP3 Decoder

4-way set associative Next186MP3 Decoder


Category: System on Chip

Created: February 27, 2015

Updated: January 27, 2020

Language: Verilog

Other project properties

Development Status: Stable

WishBone compliant: No

WishBone version: n/a

License: LGPL


This is an evolution of my previous project, Next186SoC PC, able to play MP3 files in real time (any bitrate).
It is written in Verilog, and it contains all the features of Next186SoC PC, plus a few more.


This is a PC SoC able to run 16bit DOS. It features the following elements:
- a 80186 compatible CPU, running at 40MIPS (Netx186)
- 16KB of cache: 4-way set associative
- SDRAM interface (up to 64MB of SDRAM supported, through EMM and XMM)
- High Memory Area (HMA) useable in DOS
- Text mode, EGA(320x200x16), VGA(640x480x16, 320x200x256, ModeX), VESA(640x480x256)
- a sound queue (16KB), able to deliver CD quality stereo sound (44100Hz) at 2 digital pins (pulse density modulation over a RC integrator: 1Kohm + 10nF). For the best sound quality I recommend a high impedance (>10Kohm) low pass filter (I use a 4th order low pass active Butterworth filter). The sound interface is also compatible with Disney Sound source and with Covox Speech thing, which provides improved sound for some DOS games.
- a 32bit DSP coprocessor, able to assist the main CPU at MP3 decoding (and not only) - it takes only 5-7% LUTs of the SoC, plus 4x18bit multipliers
- SD card interface (in SPI mode). FAT16 formatted disks are limited at 2GB. In order to use SD cards of up to 32GB, I used FreeDos 16bit, which supports FAT32.

My current implementation is done on a Spartan6XC6SLX9 (Papilio Pro), it is running at 80Mhz (40Mips), have 8MB of SDRAM running at 140Mhz.
The DSP is running at the bus speed of 80Mhz.
It uses ~4200xLUT6 (from 5700 available) and 5x18bit multipliers (from 16 available).
For the interface, I extended Arcade Mega Wing to 18bit VGA DAC, and I added a SD card interface.

Without the DSP, the Next186 CPU is able to provide 50% of the required power for decoding a MP3 file. With the DSP coprocessor, it provides ~150%.


The DSP have 2K instructions of 16bits, and 256x32bit registers.
Each instruction can address 2 registers from a 64 registers page. The 64reg pages can be mapped over the 256 registers.
The DSP is based on a two stage exposed pipeline, executes one instruction/clock and it is able to execute code in parallel with the main CPU. More than that, the main CPU is able to transfer data to/from DSP registers while the DSP is running, allowing more parallelization.
The DSP can do 32bit operations (additions, subtractions, multiplications, shift right, 16bit packing, logical operations). It have no jump, looping, subroutine call capabilities.
The instruction set is detailed in the Verilog DSP32.v file.

The MP3 decoding code occupies under 1K instructions.