After applying some of the general troubleshooting tips to narrow the scope of a problem’s location, there are techniques useful in further isolating it. Here are a few:
In a system with identical or parallel subsystems, swap components between those subsystems and see whether or not the problem moves with the swapped component. If it does, you’ve just swapped the faulty component; if it doesn’t, keep searching!
This is a powerful troubleshooting method, because it gives you both a positive and a negative indication of the swapped component’s fault: when the bad part is exchanged between identical systems, the formerly broken subsystem will start working again and the formerly good subsystem will fail.
I was once able to troubleshoot an elusive problem with an automotive engine ignition system using this method: I happened to have a friend with an automobile sharing the exact same model of ignition system. We swapped parts between the engines (distributor, spark plug wires, ignition coil—one at a time) until the problem moved to the other vehicle. The problem happened to be a “weak” ignition coil, and it only manifested itself under heavy load (a condition that could not be simulated in my garage). Normally, this type of problem could only be pinpointed using an ignition system analyzer (or oscilloscope) and a dynamometer to simulate loaded driving conditions. This technique, however, confirmed the source of the problem with 100% accuracy, using no diagnostic equipment whatsoever.
Occasionally you may swap a component and find that the problem still exists, but has changed in some way. This tells you that the components you just swapped are somehow different (different calibration, different function), and nothing more. However, don’t dismiss this information just because it doesn’t lead you straight to the problem—look for other changes in the system as a whole as a result of the swap, and try to figure out what these changes tell you about the source of the problem.
An important caveat to this technique is the possibility of causing further damage. Suppose a component has failed because of another, less conspicuous failure in the system. Swapping the failed component with a good component will cause the good component to fail as well. For example, suppose that a circuit develops a short, which “blows” the protective fuse for that circuit. The blown fuse is not evident by inspection, and you don’t have a meter to electrically test the fuse, so you decide to swap the suspect fuse with one of the same rating from a working circuit. As a result of this, the good fuse that you move to the shorted circuit blows as well, leaving you with two blown fuses and two non-working circuits. At least you know for certain that the original fuse was blown, because the circuit it was moved to stopped working after the swap, but this knowledge was gained only through the loss of a good fuse and the additional “down time” of the second circuit.
Another example to illustrate this caveat is the ignition system problem previously mentioned. Suppose that the “weak” ignition coil had caused the engine to backfire, damaging the muffler. If swapping ignition system components with another vehicle causes the problem to move to the other vehicle, damage may be done to the other vehicle’s muffler as well. As a general rule, the technique of swapping identical components should be used only when there is minimal chance of causing additional damage. It is an excellent technique for isolating non-destructive problems.
Example 1: You’re working on a CNC machine tool with X, Y, and Z-axis drives. The Y axis is not working, but the X and Z axes are working. All three axes share identical components (feedback encoders, servo motor drives, servo motors).
Exchange these identical components, one at a time, Y axis and either one of the working axes (X or Z), and see after each swap whether or not the problem has moved with the swap.
Example 2: A stereo system produces no sound on the left speaker, but the right speaker works just fine.
Try swapping respective components between the two channels and see if the problem changes sides, from left to right. When it does, you’ve found the defective component. For instance, you could swap the speakers between channels: if the problem moves to the other side (i.e. the same speaker that was dead before is still dead, now that its connected to the right channel cable) then you know that speaker is bad. If the problem stays on the same side (i.e. the speaker formerly silent is now producing sound after having been moved to the other side of the room and connected to the other cable), then you know the speakers are fine, and the problem must lie somewhere else (perhaps in the cable connecting the silent speaker to the amplifier, or in the amplifier itself).
If the speakers have been verified as good, then you could check the cables using the same method. Swap the cables so that each one now connects to the other channel of the amplifier and to the other speaker. Again, if the problem changes sides (i.e. now the right speaker is now “dead” and the left speaker now produces sound), then the cable now connected to the right speaker must be defective. If neither swap (the speakers nor the cables) causes the problem to change sides from left to right, then the problem must lie within the amplifier (i.e. the left channel output must be “dead”).
If a system is composed of several parallel or redundant components which can be removed without crippling the whole system, start removing these components (one at a time) and see if things start to work again.
Example 1: A “star” topology communications network between several computers has failed. None of the computers are able to communicate with each other.
Try unplugging the computers, one at a time from the network, and see if the network starts working again after one of them is unplugged. If it does, then that last unplugged computer may be the one at fault (it may have been “jamming” the network by constantly outputting data or noise).
Example 2: A household fuse keeps blowing (or the breaker keeps tripping open) after a short amount of time.
Unplug appliances from that circuit until the fuse or breaker quits interrupting the circuit. If you can eliminate the problem by unplugging a single appliance, then that appliance might be defective. If you find that unplugging almost any appliance solves the problem, then the circuit may simply be overloaded by too many appliances, neither of them defective.
In a system with multiple sections or stages, carefully measure the variables going in and out of each stage until you find a stage where things don’t look right.
Example 1: A radio is not working (producing no sound at the speaker))
Divide the circuitry into stages: tuning stage, mixing stages, amplifier stage, all the way through to the speaker(s). Measure signals at test points between these stages and tell whether or not a stage is working properly.
Example 2: An analog summer circuit is not functioning properly.
I would test the passive averager network (the three resistors at the lower-left corner of the schematic) to see that the proper (averaged) voltage was seen at the noninverting input of the op-amp. I would then measure the voltage at the inverting input to see if it was the same as at the noninverting input (or, alternatively, measure the voltage difference between the two inputs of the op-amp, as it should be zero). Continue testing sections of the circuit (or just test points within the circuit) to see if you measure the expected voltages and currents.
Closely related to the strategy of dividing a system into sections, this is actually a design and fabrication technique useful for new circuits, machines, or systems. It’s always easier begin the design and construction process in little steps, leading to larger and larger steps, rather than to build the whole thing at once and try to troubleshoot it as a whole.
Suppose that someone were building a custom automobile. He or she would be foolish to bolt all the parts together without checking and testing components and subsystems as they went along, expecting everything to work perfectly after its all assembled. Ideally, the builder would check the proper operation of components along the way through the construction process: start and tune the engine before its connected to the drivetrain, check for wiring problems before all the cover panels are put in place, check the brake system in the driveway before taking it out on the road, etc.
Countless times I’ve witnessed students build a complex experimental circuit and have trouble getting it to work because they didn’t stop to check things along the way: test all resistors before plugging them into place, make sure the power supply is regulating voltage adequately before trying to power anything with it, etc. It is human nature to rush to completion of a project, thinking that such checks are a waste of valuable time. However, more time will be wasted in troubleshooting a malfunctioning circuit than would be spent checking the operation of subsystems throughout the process of construction.
Take the example of the analog summer circuit in the previous section for example: what if it wasn’t working properly? How would you simplify it and test it in stages? Well, you could reconnect the op-amp as a basic comparator and see if its responsive to differential input voltages, and/or connect it as a voltage follower (buffer) and see if it outputs the same analog voltage as what is input. If it doesn’t perform these simple functions, it will never perform its function in the summer circuit! By stripping away the complexity of the summer circuit, paring it down to an (almost) bare op-amp, you can test that component’s functionality and then build from there (add resistor feedback and check for voltage amplification, then add input resistors and check for voltage summing), checking for expected results along the way.
Set up instrumentation (such as a datalogger, chart recorder, or multimeter set on “record” mode) to monitor a signal over a period of time. This is especially helpful when tracking down intermittent problems, which have a way of showing up the moment you’ve turned your back and walked away.
This may be essential for proving what happens first in a fast-acting system. Many fast systems (especially shutdown “trip” systems) have a “first out” monitoring capability to provide this kind of data.
Example #1: A turbine control system shuts automatically in response to an abnormal condition. By the time a technician arrives at the scene to survey the turbine’s condition, however, everything is in a “down” state and its impossible to tell what signal or condition was responsible for the initial shutdown, as all operating parameters are now “abnormal.”
One technician I knew used a videocamera to record the turbine control panel, so he could see what happened (by indications on the gauges) first in an automatic-shutdown event. Simply by looking at the panel after the fact, there was no way to tell which signal shut the turbine down, but the videotape playback would show what happened in sequence, down to a frame-by-frame time resolution.
Example #2: An alarm system is falsely triggering, and you suspect it may be due to a specific wire connection going bad. Unfortunately, the problem never manifests itself while you’re watching it!
Many modern digital multimeters are equipped with “record” settings, whereby they can monitor a voltage, current, or resistance over time and note whether that measurement deviates substantially from a regular value. This is an invaluable tool for use in “intermittent” electronic system failures.