How to Fix Hardware Issues in Software
In this article, we are going to discuss how to identify and fix 'errors' in your software and hardware designs.
A common factor of both software and hardware design is that mistakes happen. This is the inevitable consequence of human beings’ involvement.
Commonly, when problems occur, there is talk about “bugs. .This term is unfortunate, as it suggests that the fault is somehow external to the development process—it sounds as if a bug crept in during the night under the cover of darkness. It would be much better to call them “errors”, because that is what they are. If a developer thinks in those terms, they can own the errors and, hopefully, be wary of making such mistakes in future.
The development process for hardware and software seems somewhat similar: a requirements specification is issued; a design is crafted; the result is coded using a specialized language (HDL for hardware and programming language for software – amazingly similar).
But the similarities end here.
The hardware design gets frozen at some point, when hardware production is initiated. Making changes after this point tends to be troublesome and expensive. Software, on the other hand, never gets finished; adjustments and refinements continue up to the last minute. It is so easy to tweak the code and do a rebuild. This is both a strength and a weakness.
Because software remains malleable until late in the day, there is the opportunity to do extensive testing and fix the bugs – sorry, errors – that persist. The downside of the flexibility is that there is a temptation to make last-minute enhancements and refinements; “creeping elegance” is a real danger. These challenges are all software related, but it gets worse …
What if a hardware error is detected after the design is frozen?
I mentioned earlier that this is problematic, as it is expensive and inconvenient to address. A very common solution is to “fix it in the software”. Such fixes may be quite trivial, and the software engineers should be pleased to be able to assist their hardware counterparts. However, in other cases, the adjustments to the code may completely compromise the software design.
There is a degree of fuzziness between an error in the hardware design and a “quirk”, the latter being something that is unsurprising to the hardware developer but needs to be accommodated in the software. I would like to explore a few examples of where software needs to accommodate oddities of hardware.
The simplest kind of input device to an embedded system is a switch or a press-button. Intuitively, it might be expected that a switch in the “off” position (or a button that is not being pressed) would appear as a value of 0 and transition to 1 when it switched on (or the button pressed). However, in almost all circumstances, the reverse is true: an off switch shows as 1 and transitions to 0. This is because it simplifies the hardware design to “pull up” an input pin to logical 1 and ground it (pull down to 0) to signify input. Of course, once a developer is clear about this, accommodation in software is trivial.
Another natural expectation about a switch or a button is that it will simply transition between the two logical states. However, more often than not, a mechanical switch is not so well behaved—the contacts close, then bounce open one or more times before settling down in the closed position. The result of this behaviour is the expected 1 to 0 transition may be repeated several times in quick succession, which, unchecked, might be interpreted incorrectly. Imagine a press button that increments a setting, but often increments it by 2 or 3 instead of 1!
This can be fixed in hardware, but that comes at a cost in complexity and bill of materials. It is a prime candidate to “fix in the software” and numerous “debouncing” algorithms have been designed over the years.
It's All About Timing
Some sophisticated peripheral hardware may respond to commands written to registers by software. It is not uncommon for the hardware to respond to a command by performing a sequence of actions, during which it will not be responsive to further commands. This is not a fault, as the hardware designer intended the device to function in this way. However, from the software developer’s point of view, it seems quite illogical.
This presents a challenge to the software. A safeguard needs to be included so that, when a command is issued, a suitable time must have elapsed before another command may be written. In simple applications, a delay loop of some kind may suffice. In more complex software, it may be impossible to tie up the CPU in an idle loop, as there is other processing to be done. In this case, a more sophisticated timing mechanism is needed.
Big-Endian or Little-Endian
There are a variety of ways to represent data within a word (or even within a byte). A simple example is the order of bytes in a word; they might be least significant first or most significant. Neither way is right or wrong and different CPUs have historically been little- or big-endian. It is, therefore, almost inevitable that two subsystems that are linked together may have a different idea about data representation.
This is another candidate for fixing in software—just a byte swap or rotate. The challenge is localizing usage of the interface to a small part of the software that performs the transformation.
Beyond circumventing errors and quirks in hardware, it is not uncommon for functionality that might be efficiently implemented in hardware to be unloaded onto the software, in the name of cost-saving or because that functionality was dreamt up late in the development process.
Early in my career (at the start of the 1980s) I was fortunate enough to work on an early 16-bit embedded system. This was a microprocessor-controlled replacement for a large hard-wired control panel on a servo-hydraulic system. At that time the idea of such immense (it seemed) computing power in a microprocessor was quite inspiring and, of course, the developers got carried away. The hardware design was completed and frozen in very good time, enabling production to proceed on schedule. The software development went well until the stream of ideas for new functionality became a flood. The software was brought to its knees – i.e., its real-time behavior was compromised – and a major design review was necessitated.
I am sure that this could not happen nowadays, but it would be wise to make sure that history does not repeat itself ...
Hiding Hardware Oddities: Drivers
It has been recognized for a long time that accessing and controlling hardware can be a particular challenge to the unwary or inexperienced embedded software developer. As a result, the concept of a device driver came about. A driver is simply a small (or maybe not so small!) piece of software that encapsulates the awkwardness of using some hardware and presents a rational interface to the application code.
It is in a driver where eccentricities of hardware design are accommodated and, ideally, this is where tweaks to address unintended functionality or non-functionality should be implemented.
In this article, we took a look at several instances where issues in hardware can be fixed in software. If you have other examples of this situation, please share them in the comments below.
Here is an extreme example of this topic. Some decades ago, the US launched the IUS, the Interim Upper Stage. It was a mini-booster, attached to a satellite or other payload, that was carried to low orbit by the Space Shuttle, removed from the payload bay, and launched to a higher orbit where the attached payload was released. On one of the first launches, the package began to spin out of control some moments after being launched outside the Shuttle. The entire mission was at risk of being lost. With limited battery power, there was a very tight window available to fix whatever had broken. Diagnostic download information indicated that the CPU had lost an internal instruction. I don’t know which one, except that it was a common one, like add or subtract. The lead programmer had to rewrite all the code without using that specific instruction, get it uploaded to a spinning target, and restart the system. He managed to rewrite the entire system in about 30 minutes, got it uploaded, and restarted the system with fifteen minutes to spare. The mission was saved.
I attended an award ceremony where the programmer was awarded the rare Silver Snoopy Award.
Since this was a long time ago, some of the details here may not be exact, but close enough to make the point.
This is the best example of software fixing software that I’ve ever heard of. It’s also the best example of programming under pressure I know of. Hope you enjoyed it.