Monday, April 6, 2009

PC Failures, PC Fixes: Troubleshooting Mysterious Problems (1)

Has there ever been a truly trouble-free PC? We've gotten a lot closer to it in recent years, thanks to better diagnostics and improved software and hardware engineering -- but sometimes, every now and then, things fall apart and the center cannot hold in a big country way.

The worst problems of all are the ones that come without warning, maybe also striking again and again without warning, and leave little or nothing for you to analyze when they're done. That's when you need to call in a PC version of Gregory House, Fox TV's caustic but brilliant medical mastermind, or play a version of the role yourself, whittling down possible causes until the patient recovers. Or doesn't.
The good news is that you don't have to put up with them. Over time I've built up a repository of insights and strategies for dealing with these kinds of difficult-to-trace failures. They take time and effort to track down, but the effort is well-spent.

Note that most of the discussion here is aimed at a Windows-centric audience, but many of the same concepts apply to Linux or other OS users, too -- especially tips about hardware.

Types Of Failure

Most of the time, when something goes wrong, there's at least an error message or a warning of some kind, like the infamous Blue Screen of Death, to steer us in the right direction. This piece, though, deals with failures that have no warning at all -- no BSOD, no errors, nothing. The system may hang completely, reboot spontaneously, or even shut itself off without warning.

If there's no BSOD, then the system has been -- to use a euphemism employed by another of industry colleagues -- "mugged," meaning whatever happened was outside the realm of the operating system's ability to cope with it. Such things generally fall into a few basic categories: hardware failures, electrical problems, and untrappable OS issues.

Hardware Failures are anything from a component going bad to memory failing to a device being mistakenly disconnected. A fair number of hardware failures are "trappable," meaning the OS can anticipate disasters of that variety and warn the user about what went wrong (via a BSOD). But not everything can be trapped in this fashion, simply because there's no way to anticipate it.

Electrical Problems might normally be filed under hardware failures, but I'm breaking them out as a category of their own for a few reasons. For one, electrical problems can come from outside the PC entirely (a frayed power cable, a bad socket, a dying UPS battery) or from within it (a failing system power supply, a faulty soft switch). Also, they can typically be fixed without affecting the rest of the PC or its hardware.