Usually the feeling that the gods and the forces of nature habitually conspire against us is a product of confirmation bias - we forget all the times that woes come as single spies, because the times they come in battalions are so much more memorable.
It's important to be aware that this is not always the case. You are not paranoid when the bastards really are out to get you.
In particular, in many circumstances, maybe even the case of a washing machine, the underlying problem can be one of capacity - capacity problems are difficult to detect because they are intermittent at first, and then, finally, and spectacularly, catastrophic.
There are, in fact, a few cognitive biases involved in producing such things as 'Murphy's law' and 'Sod's Law'. We find things more important if they happen to us. We like to have a reason for things happening, and though the theory that the world is against us is an unlikely one, it is, at least, a theory, so we prefer it to accepting that happenstance is usually a good reason for coincidences.
We also are very poor at judging the probability of things happening. Often, what seems a very unlikely event, is, when you consider the size of the population, and the time over which it could happen, actually something that's almost certain to happen somewhere at least a few times a decade.
How can we then distinguish those events that signal a preventable catastrophe from those that are merely isolated events?
Unfortunately, the simple answer is, that we can't. The reason that our brains are so inclined to so many fallacies is because we live in an uncertain world, and a collection of heuristics that work fairly well, most of the time, is worth having, and using, even though it also leads us into such errors.
The more complicated answer is that events that are connected to one, or a small number, of related causes, that are a consequence of a mismatch between demand and capacity, have some characteristics that allows you to spot them against the camouflage of background noise.
These are that capacity related problems cause events that are:
- Intermittent.
- Apparently unrelated, but often coincident with a specific time of day, week or month.
- Progressive. Strange things happen once or twice a week, but then more often, once or twice a day
- Responsive to intervention. You may try to fix a symptom, and find they go away for a while
- More serious over time. Before the final catastrophe, you'll have one or two more serious events than usual
You'll notice that these characteristics fit a number of naturally occurring events - avalanches, earthquakes and volcanoes being examples. That's not an accident, these events are also capacity related - stresses build up over time, with minor event cascades (there are often a series of small earthquakes before a volcanic eruption, for example).
What can we do about this unpredictability?
When you see the relationship with natural events, you'll see what we actually do. Firstly, we need to anticipate where such a problem might occur, then see how serious it is (we're less concerned with volcanoes under the sea, far from any land, than volcanoes near towns, for example), and then put monitoring in place.
We need to design the monitoring carefully, to make sure that the metrics we use make sense, are connected with the likely capacity problem, and are measuring the system itself.
Then we need to measure the trends. Not just trends that are obviously leading to a catastrophe, but all trends. Then we need to correlate these trends with each other, project where they are tending towards, and find out what is causing the trends. Then we can put measures in place to reverse the trend, or, if that isn't possible, increase the capacity we have to deal with it, or, if that isn't possible, find a way to mitigate the risk of a meltdown.
Measuring trends is a more subtle matter than it might seem. It's often not the most obvious trend, in the main demand, that's the danger. Smaller, deviations at periods of quiet demand, or on the shoulders of a demand peak, are often the warnings.
The analysis required to detect such off-peak trends isn't that difficult to do from a mathematical point of view, but it does mean that you need to design your thresholds in a more sophisticated way than simply a maximum or minimum, based on a percentage of historical demand.
No comments:
Post a Comment