Motion Estimation Processor for H.264, VC-1 and AVS video codecs

Motion Estimation Processor for H.264, VC-1 and AVS video codecs


Category: Video Controller

Created: August 27, 2009

Updated: January 27, 2020

Language: VHDL

Other project properties

Development Status: Stable

WishBone compliant: No

WishBone version: n/a

License: GPL


High-definition programmable and configurable motion estimation processor for H.264, VC-1 and AVS video codecs.


The LiquidMotion LMx1 processor is a reconfigurable ASIP (Application Specific Instruction Set Processor) designed to execute user-defined block-matching motion estimation algorithms optimized for hybrid video codecs such as MPEG-2, MPEG-4, H.264 AVC and Microsoft VC-1. The core offers scalable performance dependent on the features of the chosen algorithm and the number and type of execution units implemented. The ability to program the search algorithm to be used, and to reconfigure the underlying hardware that it will execute on, combines to give an extremely flexible motion estimation processing platform.
A base configuration consisting of a single 64-bit integer pipeline, capable of processing 1080p HD video at 30 frames per second using a hexagonal motion estimation search followed by a square refine (as used in the opens-source h.264 encoder x264) with 1 reference frame and 16x16 block size can be implemented in 2,300 FPGA logic cells. In contrast, a complex configuration including support for motion vector candidates, sub-blocks, motion vector costing using Lagrangian optimization, four integer-pel execution units and one fractional-pel execution unit plus interpolation will need around 14,000 logic cells. At least one integer-pel execution unit must always be present to generate a valid processor configuration but the others units are optional, and are configured at synthesis time.


• Intuitive and easy programming using a c-like syntax of user-defined block matching motion estimation algorithms.
• Highly configurable architecture enables the designer to optimize the hardware for the selected algorithm.
• Binary compatibility so that once an algorithm has been compiled it can be executed by any hardware configuration.
• Support of advance features such as rate distortion optimization using Lagrangian techniques, sub-partitions and fractional pel searches
according to the codec standard.
• Efficient evaluation of multiple user-defined motion vector candidates transparently to the rest of the algorithm.
• Toolset available to enable the efficient exploration of the large design space and the generation of the RTL configuration file for the hardware processor library.


• Video coding (H.264, MPEG-4, MPEG-2, VC-1, AVS)
• Video enhancement applications such as frame rate conversion, de-interlacing, super-resolution and video


A toolset has been developed that enables the algorithm designer access to the hardware features without any knowledge of the processor microarchitecture. The toolset IDE is a fully integrated environment composed of a compiler, assembler, cycle accurate model and RTL export. The Cycle Accurate Model includes a full implementation of the x264 encoder (open-source h.264) so the designer can quickly evaluate the effects of different motion estimation algorithms in terms of PSNR and bit rate. The algorithm designer can create a new algorithm for the required application using typical C constructs such as for, while loops and if-else constructs. The compiler automatically recognises the search points that correspond to fractional-pel searches and generates the correct instructions.
Parallelism is extracted by the compiler by coding search patterns composed of a variable number of search points in a single instruction. The hardware analyses the instruction and distributes the load to the available execution units. Using the cycle accurate model the designer can quickly explore the performance of many configurations in terms of frame per second throughput, compressed video bit-rate and PSNR, hardware complexity and power/energy consumption.
The impact of changes in the original search algorithm can be evaluated before exporting the selected configuration hardware file and program binary. The final implementation can then be generated by processing the configuration file and the rest of the RTL processor description with standard tools such as Synplicity and/or Xilinx ISE.


Toolset (compiler, cycle accurate model and analysis tools) available at VHDL configurable processor description, VHDL testbench, FPGA prototype implementation using the PCI bus also available together with the original design team. The cycle accurate simulator supports all the features. The open-source RTL version is free for academic and research purposes and currently supports a single integer pipeline with 16x16 macroblocks. See it working at . The commercial version RTL supports all the features