Watchdog Timers in Microcontrollers
This article is the fourth of a series on microcontroller timers which discusses internal watchdogs.
This article is the fourth of a series on microcontroller timers. The first article describes major features of most types of timers and covers periodic timers. If you are not familiar with the general operation of microcontroller timers, I recommend reading the first article. We've also discussed pulse width modulation timers and real-time clocks in MCUs.
This article describes watchdog timers, commonly abbreviated as WDT and also called computer operating properly timers or COP. There are external watchdog devices and internal watchdog functions. This article describes only internal watchdogs.
What Is a Watchdog Timer? (An Unconventional Analogy)
A watchdog timer is a specialized timer module that helps a microprocessor to recover from malfunctions. If a watchdog timer reaches the end of its counting period, it resets the entire processor system. In order to prevent this, a processor must perform some type of specific action that resets the watchdog. Thus, a watchdog timer can be configured such that it will reach the end of its counting period only if a processor failure has occurred, and by forcing a system reset, the watchdog timer helps the processor to escape from the failure mode and continue normal operation.
In order to visualize the functionality of a WDT, I have an unconventional analogy for you to consider.
Lost was a wildly popular TV series about a group of survivors marooned on a mysterious island following a plane crash. One of the subplots involves characters who believe they must enter a short series of numbers before a counter on the screen goes to 0 or the world will end.
When the series is entered, the counter resets and starts counting down again. It is never clear if entering the series of numbers is simply part of a psychological experiment or the fate of the world is at stake. The only way to find out is to not enter the numbers. The characters are free to go outside, find food, go for a swim—but they must return in time to reset the counter. From the perspective of the computer, receiving the correct input means there is a person out there who is operating normally and servicing the counter. Not receiving the correct input indicates that something is wrong.
A watchdog timer operates like the computer system in Lost. The rest of the microcontroller is the stressed-out characters who are free to do anything but must periodically service a counter or there is a reset of the microcontroller (AKA the end of the world). Properly operating software and hardware will service the watchdog within a fixed period of time and faulty software or hardware might not.
Types of Internal Watchdog Timers
There are two types of watchdogs, non-windowed and windowed. Both types cause a reset if servicing the counter is late.
A windowed watchdog also causes a reset if servicing occurs too soon. This diagram shows timing sequences for a watchdog using an up counter. Some watchdogs use a down counter but the principle is the same.
The timing sequence of a watchdog timer
For a non-windowed or windowed watchdog, any service before the counter reaches the upper value resets the counter and everything is OK. A windowed watchdog adds a lower limit and creates a count window. Any service between the lower limit and the upper limit is OK. Here is a detailed description of the diagram.
- A - Shortly after boot up, the program initializes the watchdog with the upper limit for the counter and enables counting. For a windowed watchdog, a lower limit is also set.
- B and C - The software successfully services the counter before it reaches the upper limit and, for a windowed watchdog, after the lower limit. After servicing, the counter resets to 0 and starts counting up again. Everything is OK.
- D - The program does not service the counter and the count reaches the upper limit. The watchdog resets the microcontroller.
- D to E - The microcontroller boots up and initializes and enables the watchdog.
- E - The watchdog starts counting.
- F - The program services the counter before it reaches the upper limit and, for a windowed watchdog, after the lower limit. The counter resets to 0 and starts counting again. Everything is OK.
- G - The program services the counter before the count reaches the lower limit for a windowed watchdog. For a non-windowed watchdog without a lower limit, there is no reset and the counter goes to 0 and starts counting up again. For a windowed watchdog, the microcontroller is reset.
Watchdogs vary a lot in their details. They can count up or down. They use different clocks. They have different selections for upper and lower limits. The following sections introduce many of the characteristics but are only an overview.
Initializing a watchdog can be tricky. For example, some watchdogs are automatically enabled when the microcontroller boots.
If you do not use the watchdog, you must include code at boot up to disable it. This is the case with the watchdog I am currently using. I must disable the watchdog or change window settings if my program needs to run longer than one second!
Some watchdogs only allow one write to a control register as a security feature. This feature prevents out-of-control software from changing the settings after initialization. If you are using a default setting in a watchdog with this feature, be sure to overwrite the default setting, even though the bit values are the same. This “locks in” the values and maintains security.
If you are programming in C, be aware there may be initialization of the watchdog in the startup code, which runs before your main( ) function. This code is sometimes included automatically by the development environment as part of running a standard C program. This code sets up memory and interrupt vector tables. Look in your startup code for operations on the watchdog and modify according to your system’s requirements.
Watchdogs can be non-windowed and windowed. It is common for a single timer to offer both types by having the lower limit of the window as an option. The size of the window varies with different watchdogs.
The lower limit of the watchdog I am currently using is fixed at 75% of the upper limit. Another watchdog has the choice of 75%, 50%, 37.5%, or 25% of the upper limit. Yet another watchdog sets the upper and lower limits independently. There is also much diversity regarding how the upper limit is chosen.
Servicing AKA Watchdog Timer Reset
Periodically resetting the watchdog counter is called “servicing” the timer. Different watchdog timers will have different servicing requirements. All servicing requirements use an operation that is unlikely to occur during the execution of out-of-control software.
The watchdog I am currently using requires two writes to a “service register” with a value of 0x55 followed by 0xAA. If any value other than 0x55 or 0xAA is written to the service register, at any time, the microcontroller immediately resets.
Another watchdog requires a single write to a service register with a value of 0xAAAA. There seems to be a fascination with the numbers 0xA and 0x5 and their alternating bit patterns of 1010 and 0101. One timer goes against this trend by using a single write with a magic number of 0x5743.
Sleep and other low-power modes complicate things. What should the watchdog do if the processor goes to sleep and stops executing? Should watchdog time stand still?
The watchdog I am currently using halts and restarts at the initial count when the processor enters and exits from deep sleep. This topic is complicated and you will need to study the operation of your watchdog when the microcontroller uses low-power modes.
A complete reset of the microcontroller might be too harsh. Some watchdogs have features to allow a progressive response.
For example, the watchdog may have an option for an interrupt request sometime before a reset. This feature allows an interrupt service routine to fix or log a problem while the watchdog continues counting. If the interrupt routine cannot get things back on track, a reset occurs.
I use the term “strategy” for determining where to set a watchdog’s timing limits. The simplest strategy is to use a non-windowed watchdog with a timeout which is much longer than any possible execution time of the program between servicing the watchdog. I call it the “reset button” strategy because it takes the place of a person pushing a reset button. The strategy for a critical system such as a medical device or manufacturing robot needs faster response.
Many considerations can go into a watchdog strategy:
- Does the system use a predictable main loop or a complex multi-tasking structure based on a real-time operating system?
- Should a watchdog reset start a sequence of automated system checks?
- Is the watchdog the last stage of a series of attempts to diagnose and fix a problem?
- Do you want to test the watchdog along with other checks when the system starts up?
Watchdog strategy is a complicated topic and very application-dependent.
Before resetting the microcontroller, the watchdog sets a bit in a status register which survives the reset. Often this bit is stored along with other reset status such as a brown out reset bit. Testing the watchdog reset bit should be part of the strategy.
At a minimum, test this bit during startup and do something to indicate a problem. Otherwise, failures might go unnoticed, particularly infrequent ones. For example, output a message on a display or light an LED.
Bonus Topic: Dead Man Timer vs. Watchdog Timer
A dead man timer (DMT) is a variation on the watchdog concept. The name derives from a dead man switch which stops a machine if an operator releases a mechanical switch. Instead of using time as the pacing variable, a dead man timer counts CPU instruction fetches from program memory. There is a microcontroller reset if the DMT counter is not serviced before a specified number of instructions have been executed.
A key difference between a watchdog and a DMT is that a DMT can remain active during sleep and other power-saving modes which use up time but do not change the number of executed CPU instructions.
This article is the last in the series. There are other interesting timers such as low-power timers, triple PWM timers for controlling tri-color LEDs, and what I call hybrid timers which combine hardware timers with closely coupled software. Leave a comment if you would like to see more on microcontroller timers.