Parallel Two Dimensional DCT in VHDL

Parallel Two Dimensional DCT in VHDL


Category: Arithmetic Core

Created: April 14, 2006

Updated: January 27, 2020

Language: VHDL

Other project properties

Development Status: Stable

Additional info: Design done, FPGA proven, Specification done

WishBone compliant: No

WishBone version: n/a

License: n/a


NEW: 12 bit input MDCT version created by Emrah Yuce has been added to project downloads.

Parallel synthesizable implementation of 2D DCT in VHDL. Currently works on 8 bit input data using 12 bit DCT coefficients (12-bit DCT output). Multiplier-less design, parallel distributed arithmetic with butterfly computation used instead. Implementation done as row-column decomposition, two 1D DCT units and transpose matrix between them (double buffered as ping-pong buffer for performance). Latency (time between first 8 bit input data is sampled and first dct data present on output) is 94 clock cycles.

Self-veryfing testbench included which takes matlab-converted image as input. Core transforms it to DCT coefficients and behavioral IDCT testbench code reconstructs from it original image. PSNR is computed between original and reconstructed image to find out error introduced by fixed point arithmetic, for sample Lena images PSNR is 48 dB.

Matlab scripts are included for computing floating point DCT/IDCT as reference. Scripts for converting 8 bit bitmap to txt format readable by testbench and vice versa are also available.

Core was tested on Digilent S3 board with Spartan Xc3S1000 FPGA.


+ 8 bit input, 11 bit output
+ Throughput 10 MSamples/s with 10 MHz input clock
+ Latency 94 clock cycles
+ Transforms 8x8 block of 64 samples in 64 cycles when pipeline is full
+ FPGA proven

IMAGE: block_diagram.jpg

FILE: block_diagram.jpg
DESCRIPTION: MDCT block diagram