A Real-World Example of How Error Correction Codes are Critical in MicrocontrollersJune 23, 2020 by Jake Hertz
How exactly do error correction codes work and how do they affect the integrity of an MCU? A new ECC-protected Arm Cortex-M4F MCU from Maxim Integrated serves as a real-world example.
One of the great challenges that engineers face in digital communication is addressing errors in transmitted data. Whether in the field of wireless communications or in PCB design, errors in some capacity are unavoidable over the transmission channel or in memory.
Errors most commonly come in the form of bit flips (a 1 becomes a 0 or vice versa), but in some cases, bits can be deleted entirely or new erroneous bits can be inserted into a data stream.
Example of a single-bit error. Image used courtesy of Dinesh Thakur
In a chapter on digital transmission, Dr. Edwin V. Jones explains that errors in digital communications are often caused by natural phenomena such as thermal noise, power noise, cross talk, attenuation, and other forms of electromagnetic interference. With these unavoidable errors more prevalent as devices scale down, engineers came up with a means of handling them. One of these solutions is error-correcting codes (ECC).
How Do Error Correction Codes Work?
Engineers have come up with many different schemes to not only detect errors but also to correct them on the receiver end, allowing for minimal retransmissions.
Backup mode RAM retention—including RAM size with and without ECC—of Maxim Integrated's MAX32670. Image used courtesy of Maxim Integrated
A very simple type of an ECC is “brute force repetition." An example of brute force repetition would be to send each bit multiple times; let's say a bit is sent out five times. In an example like this, our original message might be 0101 but we would actually transmit 00000111110000011111.
Our receiver would then make a majority vote for each group of five bits. In this way, even if we did have a single bit flip error, our receiver would be able to detect it and decode the correct message. Obvious issues with this approach are significant overhead (4 original bits is now 20 bits) and the case where there are a majority of errors in a group of bits.
In practice, engineers use much more elegant solutions—like block error correction codes and convolution codes—to correct errors. Algorithms such as hamming codes and convolutional codes have been developed to minimize overhead and maximize reliability in ECC.
Maxim’s New ECC-Protected Microcontroller
Yesterday, Maxim Integrated announced its newest product: a microcontroller for industrial, healthcare, and IoT solutions. Maxim claims that the MAX32670 "saves 40% power and 50% space" while also including "ECC-protected memory for increased equipment uptime.”
This new MCU, which integrates up to 384 KB of flash and 160 KB of SRAM, implements ECC over the entire flash, RAM, and cache. This provides reliability across the entire memory space of the microcontroller—something which will be very valuable especially given the device's small footprint.
Simplified block diagram of MAX32670. Image used courtesy of Maxim Integrated
It is important to note that the ECC being used is a single error correction and double error detection (SEC-DED) code. This means that it can only correct one error per block, and if there are more than two errors, those extra erroneous bits may go entirely unnoticed.
While it's not obvious what type of ECC is being employed here, it is apparent that Maxim is highly valuing reliability in the MAX32670.
ECCs in Reliability-Critical Applications
The MAX32670 is a great example of ECCs being used in real-world applications. In industrial, healthcare, and IoT spaces, data reliability is paramount and this news from Maxim Integrated further shows the growing demand for reliability in these fields.
Do you frequently work with error-correction codes? Which methods are most familiar to you? Share your experience in the comments below.