GridVortex SHA256 Hash Core Algorithm

Details
Category: Crypto Core
Created: July 14, 2016
Updated: January 27, 2020
Language: VHDL
Other project properties
Development Status: Stable
WishBone compliant: No
WishBone version: n/a
License: LGPL
Description
This is the GV_SHA256, a fast SHA-256 engine (580Mbps @ 74MHz), fully compliant to the NIST FIPS-180-4 SHA-256 approved algorithm.
It is implemented as a single-cycle combinational logic for one iteration of the hash core, with a total of 65 clocks for the full 512-bit block hash operation.
The design is fully static, with a clean data path / control path approach.
The core has minimal registers, and is a parallel 256-bit processor, that receives a stream of 32-bit words, and delivers a 256-bit parallel hash output.
Interfacing with the engine is easy, with flow control signals that facilitate bus interfacing.
The testbench includes NIST vectors to verify against all corner cases for the SHA256 algorithm and block padding.
This logic synthesizes to system clocks of 72MHz, with ~10 layers of longest combinational logic path, without pipelining, in Spartan-6 (speed grade 2) technology.
The main objective is to share the design in order to benefit from the process Verification in several technologies, and maybe get feedback on design glitches for the algorithm optimizations. The implementation is straightforward, with aspect naming after the NIST-180-4 source description, and is optimized for minimum registers, instead of for minimum combinational length. That is a good fit for IoT applications, where silicon area may be more important than maximum clock speed.
I have more aggressively optimized versions of the SHA256 family (HMAC_SHA256, HKDF, DRBG), used on our closed-source cyber security ASICs.
Algorithm optimizations and parallel adder optimizations reduce the cycles per function and increase the top clock speeds of these crypto functions.
This core is the base of the GridVortex Crypto Library, for HMAC-SHA256, HMAC-SHA256-DRBG, HKDF and HMAC-DRBG-KDF cryptographic constructions.
The VHDL is written as hand-optimized RTL targeted for ASIC processes, especially IoT ASICs at 130nm/90nm.
However, it is easily integrated on FPGA targets. No FPGA hard logic are used in the description.
All feedback are highly appreciated.