Technical Article

When Things Go Wrong: Battery Management System Failure Mitigation

February 09, 2021 by Enrico Sanino

What is thermal runaway in Li-ion battery systems? And how do battery management systems help mitigate failure for improved safety? Learn more in this technical article.

Li-ion-based batteries tend to be considered safe when in a properly controlled environment. We should say "mostly safe," because battery management systems (BMSs) and Li-ion cell manufacturing processes are not always perfect. But, if we cannot fight against the physics of Li-ion technology, we can instead strive for a better BMS design.

In this article, we'll build on a previous piece that discussed the introduction to battery management systems and what their standard building blocks are. 

Here, we'll cover what could happen in case of failure and how to mitigate such effects. We'll also take a brief look at possible future BMS components with consideration for the constant improvement of battery technology.


Thermal Runaway in Battery Management Systems

One of the famous failure modes of a power system is thermal run-away, which is often associated with fire hazards. In the case of BMS malfunction, thermal runaway can occur due to hardware failures or firmware bugs.

For example, a forgotten stop command in the balancer could continue to over-discharge a cell indefinitely. In such an event, even detecting the problem and blowing a fuse will not stop the cell discharge. This can lead to decomposition and perforation of the separator between the anode and the cathode in the cell due to over-discharge, inducing a powerful internal short circuit after a new charge attempt.


Figure 1. Formation of internal copper shorts due to over-discharge. Image used courtesy of Xuning Feng


You may be wondering how such a short could avoid detection. The initial contact could have enough resistance to keep the battery voltage high but with a very high self discharge current, making it not detectable by the external current sensor or voltage monitor.

A short circuit leads to a warm cell. If it reaches the critical temperature above 60°C, it will burst and burn, heating its neighboring cells and triggering a chain reaction. This is the thermal run-away, which has the potential for catastrophic consequences.

Figure 2. A burnt high-energy battery pack from a 2011 Chevrolet Volt. Image from the Chevrolet Volt Battery Incident Overview Report


Failure Mitigation

One solution to unforeseen bugs could be an external watchdog in case of MCU fatal errors, as shown in Figure 3.


Figure 3. A typical BMS block diagram with MCU watchdog implementation


If the MCU is not stuck but a command is forgotten, the cell monitor can implement a watchdog system, as shown in Figure 4.


Figure 4. A BMS block diagram with complete watchdog implementation


Alternatively, should a latch-up due to EMC problems or radiations occur, it can be extinguished by designing the watchdog such that it could issue a power cycle rather than just a logic reset. This architecture is less common.


Additional Solutions for Mitigating BMS Failure

With increasing energy densities and power demands, it’s getting easier to ask too much from battery cells. Therefore, even more precise fuel gauges must be implemented wherein cell impedance is a key part.

An easy method to directly measure the impedance at run-time would be of great use. Panasonic claims to have achieved just such a method using a new localized AC stimulation technique to monitor the cell electrochemical impedance. Other methods exist, but they require an unloaded voltage reference and calibration.

Another improvement could rely on FRAM technology, which is commonly used as system RAM by MCUs. FRAM retains data after a power cycle when buffering a Coulomb counter sample, which means that there's less chance for the firmware to lose the last valid data in the case of a sudden reset.

But, in the end, what makes the real difference is the cell chemistry: there are more options beyond Li-ion out there. 

If you'd like to learn more about battery systems, leave a comment below to share your thoughts and questions.

  • abennink February 12, 2021

    I actually consulted with a firm that was working on battery charge equalization systems years ago: some of their early projects had associated fires also.  This article gives some of the science behind the sometimes runaway conditions.

    Like. Reply
  • S
    Siemen February 17, 2021

    In most cases, the MCU is not able to affect the safety much in my experience, as that’s handled by the protection IC (handling the balancing, switches, etc), like those from TI (BQ series). Some important parts of safety not listed are the physical ones: the pressure valve pops in longitudinal direction giving off a high pressure smoke and heat in that same direction. By placing cells in a direction so they don’t heat eachother too much, or keeping more distance (1cm) in series, the chain reaction likelihood causing catastrophic failure reduces a LOT.

    Like. Reply