Double Precision Floating Point Unit IEEE-754 Compliant
- The unit is designed to be synchronous to one global clock. All registers are updated on the rising edge of the clock.
- All registers can be reset with one global reset.
- The multiply operation is broken up to take advantage of the 25 x 18 multiply blocks in the Virtex5 DSP48E slices. The 25 x 18 multiply twos complement block will perform a 24 x 17 unsigned multiply, so it takes 9 DSP48E slices to perform the 53 x 53 bit multiply required to multiply two double-precision floating point numbers.
- fpu_double.v is the top-level module. The input signals are:
- 1) clk
- 2) rst
- 3) enable
- 4) rmode (rounding mode)
- 5) fpu_op (operation code)
- 6) opa (64-bit floating point number)
- 7) opb (64-bit floating point number)
- The output signals are:
- 1) out (64-bit floating point output)
- 2) ready (goes high when the output is ready)
- 3) underflow
- 4) overflow
- 5) inexact
- 6) exception
- 7) invalid
- Each operation takes the following amount of clock cycles to complete:
- 1. addition : 20 clock cycles
- 2. subtraction: 21 clock cycles
- 3. multiplication: 24 clock cycles
- 4. division: 71 clock cycles
- This is longer than some floating point units, but the support for denormalized numbers requires several more logic levels and a longer latency.
- version 1
- pipelined versions of add/sub and multiply are included in the "pipeline" folder
Double Precision Floating Point Unit
IEEE-754 compliant double-precision floating point unit. 4 operations (addition, subtraction, multiplication, division) are supported, as are the 4 rounding modes (nearest, 0, +inf, -inf). This unit also supports denormalized numbers, which is rare because most floating point units treat denormalized numbers as zero. The unit can run at clock frequencies up to 230 MHz for a Virtex5 target device.
Also, a pipelined version of add/sub and multiply is available in the pipeline folder. Add/sub has a latency of 24 clock cycles, then an answer is available on each clock cycle. Multiply has a latency of 21 clock cycles. Denormalized numbers are treated as 0 by the pipelined versions.