"The past is prologue," wrote clairvoyant William Shakespeare in "The Tempest," predicting the rise of Big Data and the booming PR business of predictive analytics.
For as Viktor Mayer-Schonberger and Kenneth Cukier write in their just released book, "Big Data," (Houghton Mifflin Harcourt), the key to Big Data is to overcome the brain's "cause and effect" orientation.
VM-S and KC give the following example. "Take the following three sentences: Fred's parents arrived late. The caterers were expected soon. Fred was angry." Due to causality, a person will instantly surmise that Fred is upset because his parent's were late. There is however no evidence to back that up. Fred could be angry because his beloved Mets blew another game, or that the caterers are finally showing up a day late.
Big Data is about "correlations." There's no rhyme or reason why consumers run to Wal-Mart before a hurricane to stock up on Pop-Tarts, strawberry in particular. But based on past sales data and weather information, the retailer moves Pop-Tart displays to the front of the stores to take advantage of the correlation between Pop Tarts and hurricane forecasts. Pop Tarts, as usual, fly out the door.
It's also the reason that New York City number crunchers use residential brickwork permits to predict "illegal conversions," the cutting up of a house into smaller units to pack in more renters. If you are fixing up the place, it's highly unlikely that one day you are going to turn it into a dumping ground.
Mike Flowers, whom Mayor Bloomberg hired as NYC director of analytics, told VM-S and KC: "I am not interested in causation except as it speaks to action. Causation is for other people, and frankly it is very dicey when you start talking about causation….I need a specific data point that I have access to and tell me it's significant."
VM-S and KC also address the many dark sides of Big Data. In 2009, Google, based on searches for phrases like "medicine for cough and fever" tracked the outbreak of the HINI flu before the Centers for Disease Control and Prevention. That's because Google used real-time info, while the CDC&P received its feedback from doctors and health facilities a week after a patient was treated.
What if a future government during an outbreak used the Google data to order a mandatory quarantine of flu-related searches? That would snare healthy people who searched Google for flu info because either they want to protect themselves or were just curious about the epidemic.
And then there's the specter of "prediction-based punishment," where Big Data is used to arrest people before a crime is committed. Think, "Minority Report."
VM-S and KC wrote: "The core problem with relying on such predictions is not that they expose society to risk. The fundamental trouble is that with such a system we essentially punish people before they do something bad. And by intervening before they act (for instance by denying them parole if predictions show there is a high probability that they will murder), we never know whether or not they would have actually committed the predicted crime. We do not let fate play out, and yet we hold individuals responsible for what our predictions tells us they would have done. Such predictions can very be disproven.
"This negates the very idea of the presumption of innocence, the principle upon which our legal system, as well as our sense of fairness, is based."
As for privacy, forget about it. In the world of Big Data, the notion of "notice and consent" goes out the window. Much of data's value will lie in secondary uses, the vast majority of them may have been unimagined when the data was collected.
An example, a car theft system is being developed based on the datafication of the driver's posterior. That data could be re-used to analyze a driver's position when they are drowsy or drunk to alert nearby drivers to prevent accidents. The result: a person who allowed his posterior to be datafied, is ultimately arrested by police for driving while drunk. Did he consent to the cops receiving his personal info? Nope.
Big Data is a fascinating book. It's a primer on how Big Data is going to fundamentally change the world for the better and perhaps worse.