Quick Read: If data is the new oil, how do we go about extracting and refining it – at scale? Introducing Digital Fracking and a few notable examples.
Fracking, at its core, is an aggressive, invasive technique for extracting valuable raw materials out of hard to reach out places.
(Fracking. Image Source)
While this term has traditionally been used in the context of oil mining, this can equally be applicable to data, to arrive at the concept of Digital Fracking. A few examples first:
Have you read this recent story of a brilliant entrepreneur who’s been making money off you without you even noticing? He is Luis von Ahn, the Carnegie Mellon Professor who pioneered innovative interventions to extract value from what normal people normally do online. Some extracts from the story,
The ESP Game
A tremendous number of unlabeled images are floating around on the web, which impairs everything from the accuracy of image searching to the blocking of inappropriate content. So, in 2005, Von Ahn launched a fun game called the ESP Game.
The concept is simple – The program would randomly pair each player with another user on the web, and show them a series of images. Both players were instructed simply to “type whatever the other guy is typing.” The more overlap you produced, the better your score was.
Result: Within just four months, it had lured 13,000 bored web surfers into producing 1.3 million labels for roughly 300,000 images (source). And was subsequently acquired by Google and relaunched as Google Image Labeler (2006 – 2011).
Most of us would know what a CAPTCHA is. Essentially, it is a program that protects websites against bots by generating and grading tests that humans can pass but current computer programs cannot. By showing a distorted string of letters for example.
Now did you hear of reCAPTCHA? Most of us would have at least been subjected to it albeit unwittingly.
Launched by Von Ahn, the brilliant twist of reCAPTCHA is that this test isn’t just verifying your humanity. As this article says, it’s also putting you to work on decoding a word that a computer can’t. The first word in a reCAPTCHA is an automated test generated by the system, but the second usually comes from an old book or newspaper article that a computer scanner is trying (and failing) to digitize. If the person answering the reCAPTCHA gets the first word correct (which the computer knows the answer to), then the system assumes the second word has been translated accurately as well.
In 2009, Google acquired reCAPTCHA and put the program to work on a tremendous scale, digitizing material for Google Books and the New York Times archives etc.
(Source: Google reCAPTCHA page)
And then more recently Von Ahn came up with Duolingo – a free language learning program that is again a crowdsourced text translation platform at its core.
As he says, “It’s just taking something that people do anyways, and trying to extract value out of it.” See his amazing TEDx video where he explaine these in greater detail.
Drug Side Effects
Researchers estimate more than 90 percent of drug side effects go unreported. And it can take years for the FDA to detect a pattern of problems that leads to changes in how a drug is prescribed. While on the other side of the spectrum, hundreds of millions of people are waking up every morning and writing about their personal experiences on forums and social networks.
Armed with an insight on this gap, two start ups – Treato and Epidemico have begun treading the path of fracking the social networks and online medical forums to mine data on drugs and their potential side effects for pharma companies and patients.
Today, major pharma companies pay Treato and Epidemico for more detailed analyses of what patients are saying about their drugs: how they’re using the medication, what reactions they experience, or why they switch from one pill to another. (source)
Extracting Value From Online Reviews
And then there’s HugDug (a recent project by Seth Godin) – a brilliant intervention that has been able to hit a sweet spot between two disparate concepts – affiliate marketing and generosity. I keenly look foward to HugDug achieving scale and becoming a truly unique example of Digital Fracking by extracting value from tons of reviews lying out there.
As Von Ahn says “Look how many hours have gone into building the Panama Canal or the Pyramids – and with all the people that are on the web now, you can get a lot more hours.”
And to that point, the most important question that’s answered by the concept of Digital Fracking is this: How do you extract those hours – At scale?
(Feautured Image, Source)