Remember the data do-gooders

The Cambridge Analytica revelations are another reminder of the sinister uses of big data – but we should celebrate the good that algorithms have done us

The somewhat dreary world of data is starting to appear menacing. Cambridge Analytica has been caught by undercover journalists claiming that they can gain the democratic process – just by using a combination of personality profiling and micro-targeted ads. At first, we might question the voters. Yet the tale of Cambridge Analytica could well lead us to question our own rationality.

So it’s worth remembering that however flawed our cognitive capacities are, there are just as many software engineers out there helping to program a better future.

Take the ‘corrupted blood’ glitch from World of Warcraft back in 2005. The game’s developers accidentally created a virtual pathogen that spread from player to player, mage to paladin – and inadvertently created a natural experiment for the study of epidemics. Scientists were able to use player’s online behavioural data to model how populations react in the real world. Some players were altruistic, attempting to heal those infected with the disease, while others created voluntary quarantine zone by directing uninfected players to safety. A few players who had already been infected by the curse attempted to spread the disease, a phenomenon that has been observed during real-world epidemics.

Although an online role-playing game cannot entirely reflect the realities of people’s behaviour, the data that was gathered helped reveal the importance of social factors in the spreading of diseases.

More recently, Netflix used their highly detailed viewing data to commission new shows. In 2013, they launched their first original series, House of Cards, and did so without creating a pilot or even using focus groups. Based purely on the behaviour data of their viewers, they invested $100 million into the series, making back the entire investment in just over 3 months. Instead of asking people what they liked, the went straight to the data itself, analysing which actors, directors and genres were most popular with their viewers.

The free language learning site Duolingo has also found a way to take advantage of its users’ behavioural data. As students translate sentences and foreign texts, they are also unwittingly translating news articles from CNN and Buzzfeed. An algorithm then takes all of these answers and creates a single coherent document. The company argues that these crowdsourced translations are cheaper than a professional translator and more accurate than machine learning like Google Translate.

The founder of Duolingo, Luis von Ahn, is a master of turning these kinds of virtuous cycles into viable business models. Before Duolingo, he worked on Google’s reCAPTCHA program. Remember those annoying little quizzes where you had to type out some strangely contorted words just to prove you were a real human? That was him. And just like the principles behind Duolingo, the process also digitised a vast library of books while also increasing online security.

Each of those tiny little tasks has collectively helped to digitise around two million books a year. This is the power of vast data sets. Even though the time taken to type in those words was probably less than ten seconds, when combined together and properly directed, that brief behavioural output had world shaping repercussions.

The corrupting influence of companies like Cambridge Analytica leaves us feeling queasy about big data. But it’s worth remembering that the power of tech is as malleable as the voters they tried to influence.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s