Adding Custom Instructions to RISC-V to Boost Performance While Reducing Power and Code DensityJune 20, 2019 by Tommy Lin, Andes Technology
The article discusses the benefits of a custom instruction for RISC-V and introduces the ACE framework.
The article discusses the benefits of a custom instruction for RISC-V and introduces the framework ACE, which simplifies the process of adding custom instructions to the Standard RISC-V CPU ISA.
The RISC-V ISA creates an open source instruction set architecture that designers can implement in their designs. While being open source means anyone can use the RISC-V standard ISA, most design teams will want to customize the ISA: (1) to differentiate their implementations from others, (2) build uniqueness that can’t be easily copied by competitors, or (3) add functionality that boosts performance, reduces power consumption and improves code density.
Custom Extensions for RISC-V
Extensions to a standard RISC-V can be added to speed up critical operations as well as improve code density. For example, writing C code to create a complete software solution for a video decoder provides the most flexibility. On the other hand, a pure hardware design provides the most efficient solution but it requires the design team to hard wire the solution in circuits that can't be changed once completed.
The middle ground to these extremes is adding instructions to speed up frequently used code in a design. For example, two-thirds multiply, multiply add, or more complex functions such as motion compensation, inverse discrete cosine transformation, and variable length decoding. This approach will speed processing but also enable the flexibility of a software approach. These programmable parts of a design make the system more flexible.
Creating Custom Extensions
Up to now creating custom instructions has involved a great deal of design and verification effort on the part of the design team. Andes Technology Corporation has developed a framework called Andes Custom Extension™ (ACE) that greatly simplifies the task of adding custom instructions to the standard RISC-V CPU ISA. Along with a powerful tool called COPILOT that eliminate all the housekeeping chores needed to add the custom instruction into the EDA design and verification flow.
Figure 1. Migration on software with pure C code vs. ACE
To use ACE the designer begins by profiling his basic algorithm—what functions in the code are consuming the largest amount of CPU resources. From the profiling analysis, the designer can determine the time critical code or hot spots of the algorithm. Converting this critical code into custom instructions using ACE, the designer can determine if the new instruction significantly reduces these cycles. This determination can be made easily with a software simulator.
Once the designer determines that the new instruction achieves the design requirement, the designer can create the corresponding concise RTL. Andes powerful COPILOT tool produces the complete RISC-V RTL with customer’s concise RTL, as well as semantics and operands for the newly created instruction. The designer can then check the RTL results generated by EDA tools to determine if the new instruction achieves the performance and power criteria. For verification of the new instruction, COPILOT also creates test cases for the extended RTL from the instruction set simulator (ISS) along with the cross-checking environment needed to verify the new instruction with the designer’s conventional EDA design and verification flow.
In designing new instructions, ACE enables scalar instructions to be single or multi-cycle. For vector instructions, ACE accommodates incorporating a ‘For’ loop or ‘Do-While’ loop. Also, designers can direct ACE to make their instructions retire immediately and continue execution in the background. For the operand part of the new instruction, ACE enables the designer to specify an immediate operand, operand in general purpose registers, or operand resident in baseline memory through the CPU. ACE also allows the designer to create custom operands--registers and memory of arbitrary width and number--which are implied to save opcode encoding space. COPILOT automatically generates the following elements: opcode assignment; all required tools and simulator add-ons; RTL code for instruction decoding, operand mapping, dependence checking, input accesses, and output updates; and the waveform control file.
Custom Extension Example
Let's illustrate with an example of creating a 32-bit multiply-add instruction. Building an ACE definition file, the designer details to the simulator the new instruction's name MADD32, its operand names and attributes--its inputs and outputs, its associated register file, whether its data is immediate. The designer then provides instruction semantics in C that will be used by the instruction set simulator, and finally, estimate of the clock cycles the new instruction will require. Thus, the designer can focus on the instruction's functionality. The tool takes care of the housekeeping work. The tool automatically converts the concise RTL and simulator add-on for the new instruction
The C program needed to use the new instruction is shown in the figure. Once the instruction is created, the designer's new instruction is shown as an intrinsic function ace_madd32. Where before the 32-bit multiply-add operation in pure C code would have required 8 clock cycles, with the new custom instruction the operation is done in a single cycle. If there were a thousand instances in the designer's program that required the multiply-add operation, the new custom instruction would almost provide an order of magnitude reduction in machine cycles needed to execute the code.
The Benefits of Custom Instructions
The benefit of adding custom instructions to a standard RISC-V ISA can greatly boost performance while reducing power and improving code density. The drawback to adding custom instructions has been the time and design resources needed to incorporate the new instruction into the design and verification flow, something that many design teams found too daunting to attempt. Automated tools that perform all the housekeeping chores needed to incorporate custom extensions, means design teams are empowered to convert critical software functions into hardware and achieve higher performance while reducing power and program code size in their final implementation.
Industry Articles are a form of content that allows industry partners to share useful news, messages, and technology with All About Circuits readers in a way editorial content is not well suited to. All Industry Articles are subject to strict editorial guidelines with the intention of offering readers useful news, technical expertise, or stories. The viewpoints and opinions expressed in Industry Articles are those of the partner and not necessarily those of All About Circuits or its writers.