Thursday, 27 March 2025
The multitude of manifestations of problem management
Being of a technocratic cast of mind, I love investigating problems. Mostly problem management involves discovering just how florid the causes of incidents can be. Almost nothing is excluded as a possible cause. Two particular instances occur to me as interesting.
Many years ago, I was called out to investigate a serious series of incidents at a coal mine. Their system was going down for no apparent reason, and, when the system went down, it could no longer sort coal into railway bogeys correctly, so some bogeys that should have high quality coal would get poor quality and vice-versa. This cost the mine so much money that, when the system went down, they had to close the whole plant.
I travelled to the plant to investigate. I had a vital clue, that the outages occured on Friday afternoons. This suggested to me that there might be human involvement, because, if the computer went down at three on a Friday afternoon, everybody had an early day, and a longer weekend.
It wasn't certain that that was the cause, so I investigated the system, but found nothing obviously wrong. So I solved it, by magic, I told the people at the mine that I'd installed some monitoring software that would record, in future, exactly how it failed... the problem never occurred again.
Another conundrum, at about the same time, was a computer that would fail, at random times. Each computer sent in showed a blown power supply. In each case, next to the power supply was a small lump of misshapen carbon. Also, all the units sent in had empty slots, without blanking plates.
It turned out that, attracted by the warmth, mice would creep into the machines, through the empty holes an night. When going for a pee, in the depths of the night, they would be instantly carbonised, when peeing onto the nicely warm power supply.
In this case, supplying blanking plates, with instructions to fit them, solved the problem forever.
These days, with high technology in the cloud, we assume we'll find complex software reasons behind failures, but I think it's always important to keep a very open mind.
Subscribe to:
Posts (Atom)