Image: iStock

If you are in the prediction business, you must be accurate in your communication about your predictions — specifically around confidence.

Prediction confidence is an easy concept for data scientists to understand, but it’s a difficult concept to explain, especially to those who aren’t familiar with the idea. This is why it’s important to translate your ideas and analytics concepts (e.g., sigma level and yield) before communicating your analysis to non-analytic consumers.

SEE: Spare clients the analytics tech speak unless they request it

Follow these four semantic categories

More about Big Data

You would be remiss if you offered data or information for consideration based on your predictive analytics without an indication of prediction confidence. (Nothing bothers me more than people who claim to know the predicted outcome of something without explaining how confident they are about their prediction.) However, you cannot do that with the masses (i.e., non-analytics) before considering semantic translation.

I developed a priority/governance model to help leaders make decisions and empower middle management using these four semantic categories: must, should, might, could. This is a good starting point for translating prediction confidence.

1: Must

If something must be done, it conveys a very strong degree of importance and certainty. If I’m in the prediction business, I might say that something shall or will happen. That said, we all know that you cannot make a 100% guarantee that anything will happen, so the first rule of prediction is: Never indicate that something is 100% likely to happen.

So, what sigma level or yield is close enough to 100% so that we can responsibly use language that represents this much certainty? That depends on your circumstances, but as a rule of thumb, I suggest 4 Sigma. At 4 Sigma, you have a 99.4% chance of something happening. That’s pretty close to certain for me. Let’s consider the other scenarios where we’re not so confident.

2: Should

If something should be done, you should do everything within practical reason to accomplish it.

When communicating something that should happen, it’s useful to think about success management and the 80-20 rule. I often caution people — especially analytic-minded people — to stay on the road to success and avoid the path to perfection. In other words, when you’re about 80% done with something, stop and move on because you’re likely at a success point. The pursuit of the additional 20% to reach perfection is not a valuable use of time.

If we extend this idea into predictive analytics, I contend something that’s 80% likely to happen is something that should happen or something that will probably happen. If you look up an 80% probability (i.e., yield) in a Sigma Conversion table, you’ll see that’s about a 2.4 Sigma (actually you can expect an 81% yield from a 2.4 Sigma process, but it’s close enough). Bare in mind, even with a 2.4 Sigma process, you would still expect about 180,000 defects per million observations. So, even if something should happen, don’t be surprised when it doesn’t. Hopefully, the connotation of should or probably communicates that idea.

SEE: Free ebook — Executive’s guide to Big Data strategies and best practices

3 and 4: Might and could

To round out our semantic exploration of predictive confidence, let’s consider might and could. The two words may sound equivalent in the degree of confidence communicated, though I suggest helping your target audience make slight distinctions. When I talk to leaders about the difference between might and could in the context of setting priorities, I say that might is a bit more certain than could, but both are definitely lower priorities than must and should.

Technically, anything could happen, right? But to distinguish it from the other categories, let’s focus on the more far-reaching probabilities when using the language of could, and set an upper bound of confidence. I suggest about 1 Sigma, which is approximately 31%. Based on this definition, if I say it could rain tomorrow, I’m saying my confidence on this prediction is anywhere from a very low non-zero number to about 31%. And that leaves what might happen. By process of elimination it takes the range from 1 Sigma (approximately 31%) to 2.4 Sigma (approximately 81%). Although that’s a wide range, it seems to fit with our other definitions.

SEE: Research — The Power of IoT and Big Data (Tech Pro Research)


Your prediction confidence ranges map to the semantics you choose, so make sure you’re consistent and that these semantic translations are documented and clearly communicated to your target audiences. Just because the words will, should, might, and could reasonably translate to 4 Sigma and above, 2.4 Sigma and above), 1 Sigma (and above), and below 1 Sigma, don’t assume your target audience knows that. You should make it clear to them in several ways that this is your method for communicating the difficult-to-describe concept of prediction confidence.

Reserve the words will and shall for only the most certain predictable outcomes (99% or better); use should and probably to describe somewhat likely outcomes (between 81% and 99%); use might to describe less likely outcomes (between 31% and 81%); and use could for everything else.

Following this consistent standard and educating your target audiences on the style with which you choose to communicate will help you deliver accurate messages about your predictions in a responsible way.

Also see

Read original article: 

Analytics prediction confidence: It's all about semantics