In technical troubleshooting there are two main approaches: The first is "What's changed?", and the other is "What could possibly cause this?". The distinction may seem subtle, but they are fundamentally different ways of approaching a problem.
Most IT managers will instruct their people to start with the former. After all, most trouble tickets start with something that used to work, but now doesn't. Find out when the problem started, what changed around that time, and you've probably got your culprit.
However, the successful troubleshooter will be able to strike a balance between these two ways of viewing a problem in order to arrive at the appropriate solution as quickly and efficiently as possible.
For example: We had a portable credit card machine that sat on a rolling counter for t-shirt sales. This counter was pulled out from a closet in the lobby before and after concerts, and naturally a power cord and network cable had to be snaked across the floor to connect the machine.
During the week in question we'd just had a new POE IP telephone system put in that necessitated new CAT 6e patch panels and quite a lot of changes to the basic Ethernet cabling infrastructure.
A few days after the last trip out by the cabling guys to punch down new cable runs and patch panels, the guy in charge of the t-shirt sales says 'Hey, did you know the credit card machine isn't working?'. I asked how long this had been an issue, and he replied that it might have been a few days, he wasn't sure.
OK, at this point a "What's changed?" guy is pretty sure he knows what's happened, right?...an issue with the cable run from the new patch panel. But before checking that I took a look at the machine itself. Sure enough, the RJ-45 jack on the little device was pressed in quite a bit, as if it had received a blow. I expressed concern that the unit had fallen off the table, and the sales manager replied that Yes, this was quite possible. In fact, it had happened before, requiring replacement of the unit.
So, now a "What could cause this?" guy is thinking we're looking at a damaged unit...far more likely than a sudden failure in infrastructure cabling that's worked for years, in spite of recent changes, right? After all, the cable guys had done a terrific job toning out each run, and no other connection was having any trouble.
As compelling as that argument was, luckily we had an identical power supply and Ethernet connection upstairs for a similar machine. A quick check upstairs and guess what?...yup, the 'damaged' unit worked fine.
After more troubleshooting and elimination, it was determined that the falling unit had landed squarely on the CAT5 cable end where the RJ-45 connector was crimped on, landing in just such a way that the port on the machine, although bent in, was not damaged electrically, but the fine wires in the connector were. New cable end and we're all set.
So, who was right? Well, I think a purely “What's changed?” approach could have taken the tech down the wrong road for quite a while as he toned out and tested a cable through the building. On the other hand, without checking the unit at another station we'd have been without a machine for weeks while it was 'repaired' at the factory. In the end it was a combination of troubleshooting techniques that had the unit back at work making money that same evening!
Saturday, May 22, 2010
Subscribe to:
Post Comments (Atom)

No comments:
Post a Comment