Today’s post will be rather lengthy, so get something to drink before you get too far into it.
Back in 2011, I wrote about the coming time when every conceivable data point would be captured and potentially used. The question then was, who should have it and why. I argued then, and still argue today, that you are better off trusting a business, large or small, with your data, than the government. The reasons are as pertinent today as they were then, but here you go:
- You can sue the business that misuses your data
- Business wants to make a profit so will not waste energy trying to make use of data that doesn’t apply to its model
- Business only wants to make your decision-making more effective and less costly
- You can stop doing business with a Company at any time and choose a competitor.
Your data, in the hands of a government, do not adhere to these 4 points. As a matter of fact:
- It is difficult, and in regards to the federal government impossible, to sue for misuse of data.
- The government is not about “profit”, but control and risk mitigation.
- Government is about bureaucracy, which is antithetical to effective decision making.
- You can’t change your government.
Now some of you will disagree with my 8 points. That is your right. But before you dismiss any of the points, carefully consider them. Take Verizon and the data on your phone calls that was shared with the NSA:
Your number (given as it is your account), Number called, start time of call, end time of call, Geo-location (where you are)
These five data points provide Verizon with lots of opportunities to serve you better. And to do you a disservice. Verizon could (and does)
- check to see if the number you call is part of Verizon’s network and if not, send a targeted message encouraging the recipient to switch.
- check your location and find you are in a coffee shop and send you an SMS coupon
- offer direction assistance since you keep calling the same business and you are within 3 blocks of the store
- assume you are a drug dealer and have a SWAT team automatically sent to your location
Verizon is well within its right, as the controller of the data points, to use it for any of the first 3 purposes. You, individually, might be annoyed with the messages, but overall, it might be that 78% actually appreciate the interaction that Verizon provides. If you are one who is not happy, you can follow rule 4 above: Switch Companies. But on the last purpose, I think we could agree that Verizon, in that hypothetical example, took a step that was outside its business purpose and thus would be held to Rule 1: Lawsuit.
The government, on the other hand, has an entirely different purpose for the data. It would be interested in the data only for the last purpose listed. Why? Because the government believes its mandate is risk mitigation, or to reduce overall threats. Thus, it is better to assume the worst, you are a drug dealer, than to believe you are just lost. After all, 61% of the time, when someone calls the same number 6 times in the span of a few minutes, it is to set up a drug drop… assuming the parties are actually engaged in drug sales.
And here lies the crux of the dilemma.
The results of Big Data are directly tied to the questions you ask of it.
The absolute worst question you can ask of Big Data, is Why. The question Why implies a cause: effect relationship. The problem is, we have been taught there must be an underlying reason for doing something and that, if we just stare at the canvas long enough, we will see the root cause. Perhaps, maybe, potentially, on an individualistic level that can happen; in reality, on the scale we make decisions, it is a fantasy and more akin to a delusion. But, to mitigate risk, you need to find a root cause because only by addressing that can you avoid (or be closer to avoiding) the end result.
If you look at the phone record example above, the “effective” purposes of the data do not go to the “WHY”, but rather the other questions: Who, What Where, When and How.
- Who you are calling and How that person can be enticed to switch carriers
- Where you are calling from and How they can help you find the way
- When you call and Where to offer you a coffee coupon
That last purpose really deals with Why. Why are you calling the same number. Everyone calls the same number over and over again for entirely different reasons. If you are trying to predict a behavior (motivation) you are entering the Why-sphere and you need to be careful. Just because 62% of drug dealers call the same number 6 times doesn’t mean everyone who calls the same number 6 times is a drug dealer. Verizon would be wrong to assume it based upon the data and the government even more so. But if Verizon took this course of action on the data it could be sued (rule 1), whereas the government can’t be.
Big Data and Predictions
The assumptions are in fact part of a game methodology; they are called predictions. Your actions can be predicted based on two factors:
- Your historical choices in similar situations
- The choices you have made immediately before the upcoming choice
Now, some of you are already groaning (all 3 of my loyal readers who heard this lecture already :) LOL) but this is where Big Data analysis is heading and, unfortunately will cause more harm than good in the short run. Why is that? Because
- We have no situational data for historical choices stored and ready for access
- Presupposes that you cannot learn to avoid a hot stove
A case in point: IBM is out hawking its BI solution by talking about the bakery who discovered that it sold more cakes on rainy days than pies. It goes to the questions above: Who, What, Where, When and How but avoids the Why. Chasing down Why leads you down a terrible path:
- When it rains, it is gloomy
- Gloomy days leads to gloomy personal outlooks
- Gloomy personal outlooks mean you are depressed
- Depressed people eat more gluten-based products to feel better
- People need to eat more gluten-based products to be happier
- Congress passes a law saying people must eat 4 lbs of wheat a day so they will be happy
Great for the National Wheat Futures Association, but bad for the rest of us. Thus the reasoning for avoiding the Why question: you will draw a conclusion that may, or may not, be backed by good data.
On the other-hand, if we avoid the “Why”, games theory actually can help a business be more profitable and effective.
- On sunny days your store sells 72% pies and 28% cakes
- On rainy days, your store sells 42% pies and 58% cakes
- The week of May 17-24 is historically (over the last 10 years) raining 80% of the time
What IBM is saying, correctly, is, plan to sell more cakes during that week. You will have less waste (fruits that go bad), greater profit (less mark-down on stale pies) and happier customers. The data isn’t saying anything about motivation or reasoning, it just says, “here is what has happened and plan on it happening again”.
The wrong thing to do is to start a marketing campaign which ties cakes to being sad because of the rain. It is wrong on several levels, but principally on the level of causation. The inference of a cause (rain) to the effect (cake sales) leads to the unstated cause premise – depression. Rain causes depression which effects cake sales. It is a false inference and I strongly discourage you from following it.
You are going to ignore this advice. With more and more data available, you will begin to believe that Cause A generates Effect B. You will even do a marketing campaign that leads to increasing cake sales to 60% when you tied eating cake to feeling happier. You will smile and say, “You were wrong.” Really.
On sunny days your total sales are down 80% because people have equated you with rain and depression. But thankfully you sold a lot of cakes in your day. My advice, leave motivation to actors and shrinks.
Using Data Correctly
Why am I beating up on this so much today? Because I haven’t written in a while and I have been reading far too many articles lately saying how bad it is that all these data are being captured and used. Sorry about this but where using machinery in the 20th century was the tool to wealth, using data is the tool of the 21st century. But to generate wealth, the right data points must first be identified, then captured and finally analyzed. It will require us to rethink business relationships and create “data partnerships” which allow companies to share meaningful data to help customers.
To use data though, you have to first capture it. By capturing it, I mean putting the data into a database where you can access it and analyze it. It means, if you sell something, give that something a part number so it can be tracked. It means give each customer an identification code and encourage them to use it. Find out about their likes and dislikes. Friend them on the appropriate social website and glean data that way.
In closing, notice I have not once used the words “information” or “intelligence” to describe the stuff we capture. Remember,
- Data is the bedrock of business; without it you don’t have a business model
- Information comes from putting two or more data points together
- Intelligence comes from putting two or more information points together
- Cognition, or spontaneous thought, comes from putting two or more intelligence points together
We are not yet in the true “information age”. We are still in data capturing stage. It is messy and doesn’t appear to make a lot of sense but this is the time to start working on gathering the right data about your customers, your products and your environment. The rest will come naturally.
As always, we are here to help you evaluate, design and implement the best system for capturing meaningful data in your business. Feel free to write us at firstname.lastname@example.org anytime.