In times gone by, there was one way of gleaning what a person thought about a particular subject: you asked them. For the public sector especially, surveys were an important way of understanding the problems that were plaguing its citizens and where best to focus its action and investment. Today we have data.
Surveys are imperfect in all sorts of ways, but principally because people lie. They say what they think will impress the person asking them. The likelihood of getting the absolute truth about a particular situation is essentially nil - particularly if the questioner is a perceived figure of authority.
At a recent Binary District event in Moscow, Seth Stephens-Davidowitz – a former Google data scientist and author of ‘Everybody Lies’, a New York Times bestseller and an Economist Book of the Year – gave a presentation on the opportunities that vast quantities of search data present to the public sector.
Our True Voting Habits Exposed
Stephens-Davidowitz’s first example of how data has transformed how we understand the world was in elections. “After an election, if people are immediately asked whether or not they voted, more than half will lie and say they had, even if they hadn’t, he explained. “No one wants to admit they failed to vote in an election.”
The rise of the world wide web has changed all that. “We now have a new method to be able to understand people, to understand human beings. It’s a really revolutionary method, thanks to the Internet: people’s Google searches. In Russia, I guess it’s more Yandex searches, but it’s a similar principle,” he said.
“People are remarkably honest on search engines. I call it ‘digital truth serum’. They tell Google what they might not tell anyone else, what they’re really thinking.”
“People are remarkably honest on search engines. I call it ‘digital truth serum’. They tell Google what they might not tell anyone else, what they’re really thinking – things they might not reveal in surveys, or to family, friends, neighbours… to anyone.
“When you analyse this anonymous, aggregate data as I have, you acquire a very different view of people than ever before. For example, there are more searches on Google for porn than there are for the weather, even though only 20% of people admitted in a survey that they’d looked for porn, while, presumably, 100% would have said they’d checked the weather at some point. Google Trends is a tool anyone can use. Anyone in this audience can go home and see what people are searching for. We’re already seeing that Google Trends gives better data than surveys on many questions.”
The example Stephens-Davidowitz gave ties back into his earlier one about voting in elections. By looking at someone’s Google search history, such as if they are entering queries about ‘how to vote’ or ‘where to vote’, you can get a much clearer idea of whether or not they voted than by simply asking them.
Revelations in Our Search History
A more controversial example Stephens-Davidowitz gave is linked to racism. Although very few people in the United States would admit they hold racial prejudices or have racist views, millions of people search Google for jokes that are specifically mocking of a particular race.
“You can use this to acquire a map of where racism is highest and lowest in the country,” he said. “You can do this in Russia, and you can do this in the United States. It’s data you otherwise wouldn’t see anywhere else.”
“People are naturally less honest on social media than with Google, because they’re trying to present an image to their friends. I call Facebook data ‘digital brag’ because everyone's trying to say, ‘oh, my life is so good.”
Even social media can’t offer this level of honesty. This is unsurprising, Stephens-Davidowitz explained. “People are naturally less honest on social media than with Google, because they’re trying to present an image to their friends. I call Facebook data ‘digital brag’ because everyone's trying to say, ‘Oh, my life is so good – I’m on a great vacation in a fancy hotel’. When you look at Facebook data, you see that it doesn’t correspond at all with what people are really doing.”
An example of this is a comparison between two magazines: the highbrow publication The Atlantic Monthly and the gossip magazine The National Enquirer. In terms of annual US sales, the latter is three times more popular than the former. However, The Atlantic Monthly is actually 45 more times popular on Facebook, as people prefer to present an image of themselves on social media that exaggerates how intellectual they are.
To compare the two sources Facebook and Google, Stephens-Davidowitz looked at the way people speak about their husbands in private and public spaces. He asked, “Let’s start with social media. How do people complete the phrase ‘My husband is…’ ? The number-one response is, ‘My husband is the best’, followed by ‘my best friend’, ‘amazing’, ‘the greatest’ and ‘so cute’. So that’s marriage according to social media. But again, everybody’s friends can see what they’re saying.
“Do we get a similar view on marriage when people are doing a search by themselves and no one is seeing what they’re saying? Actually, the third most popular on search is ‘my husband is amazing’. That shocked me, because these husbands must be really amazing, because how is Google going to help you in this situation?! But how else do people describe their husbands on search? Well, it’s ‘gay’, ‘a jerk’, ‘annoying’ and ‘mean’. You get different data when you give people different incentives.”
The Future is Personalised
Based on Facebook ‘likes’, data analysts can get a good idea of almost everything, such as a sports team’s popularity. This is only possible through the sheer scale of Facebook’s data source, compared to, say, a survey. This is why the major companies that hold a vast amount of data – Google, Facebook, Amazon, Netflix – can provide such honed, effective personalisation products.
Personalisation relies on scale for it to be effective. If you have a million like-minded people, you can build a fairly accurate prediction of what a user might like.
So, how can these tools be used to impact the public sector and improve society? “Well, the first is in this personalisation,” Stephens-Davidowitz remarked. “Amazon, Netflix, Pandora – their products are all personalised, based on analysing all of a user’s data. If you think of public sectors such as education or health, that really isn’t personalised. If students go into a school, they all basically get the same lesson. Little Johnny, little Sasha or little Seth are all going to get the same lesson plan.
“I think that’s going to change as data becomes more powerful and as the digital transformation continues. You’re going to see schools become more personalised. It’s already happening. Personalised learning is one of the biggest initiatives in the US government.”
“When you give each individual a different lesson plan based on all the data you’ve accumulated, their math scores and reading scores dramatically increase.”
This has the potential to seriously improve education, by tailoring each lesson to the data collected about each student. Already there are schools that have tested this technology. “They found out that when you add personalised learning, when you give each individual a different lesson plan based on all the data you’ve accumulated, their math scores and reading scores dramatically increase. It’s a really powerful tool,” commented Seth.
But the potential improvements extend far beyond education. IRS data from the US Tax Office has been used to calculate social mobility in different cities and states across the country. “A poor person in San José has a 12.9% chance of getting rich,” said Stephens-Davidowitz, “versus a poor person in Charlotte, NC who has only a 4.4% chance.”
Using this data, you can draw a map of the United States based on upward mobility, from which you can identify the common factors in the areas with poor mobility and address them accordingly. Similarly, areas with good mobility can be assessed for their common traits to identify what initiatives should be promoted across the country.
Data: It’s Good for Your Health
Stephens-Davidowitz’s final example of the public sector being able to harness the power of big data is to better measure the various behaviours in society. “My favourite example is influenza,” he said. “In the US, the government agency, The Centre for Disease and Control, uses doctors’ data to measure how many cases of flu there are every week.
“So, are a lot of people suffering from the flu or are a few people suffering from the flu? The problem is that it’s a very slow process. It takes more than a week to collate all this data, so it also takes more than a week to find out exactly how much flu there is, and ideally we’d have that data a lot sooner.”
“If you analyse how many people are searching ‘runny nose’ or ‘cold’ or ‘fever’ or ‘flu’, you can very quickly figure out if there’s a rise in influenza.”
He continued, “I said that people tend to tell Google and Yandex exactly what’s on their mind, and one of the things they do is conduct searches into their health condition. What Google data analysts have found is that if you analyse how many people are searching ‘runny nose’ or ‘cold’ or ‘fever’ or ‘flu’, you can very quickly figure out if there’s a rise in influenza. It only takes two seconds for a Google data scientist to do that.”
This is useful data for the public sector to be able to access in real time. For Seth, the challenge now is uncovering all the ways in which the power of Google’s ‘truth serum’ can benefit society. There are many issues that its searches can highlight far more quickly and accurately than surveys, such as sexually transmitted diseases, depression, suicide, child abuse and so on.
“To really know what’s going on in their towns, to really know how their citizens are doing, those in the public sector can analyse these huge data sets and instantaneously find out exactly what’s happening,” concluded Stephens-Davidowitz.
Immense public good can come from using search data in such positive ways. Of course, this has a flipside: the power companies such as Google have over such data. Stephens-Davidowitz responded to this by impressing on the audience the importance of regulation such as GDPR.
Citizens have to be as vigilant as their governments in ensuring their data isn’t being used nefariously, he said. For Stephens-Davidowitz, the future of data usage by the public sector and corporations is about the balance between embracing the good, and setting regulation to avoid the bad.
The potential for public sector entities using Google to address societal issues is not restricted to the analysis of search engine data. There have already been promising examples of research projects that use of different Google products in tandem with independent platforms, with the goal of tackling specific problems.
Obesity is one such issue, and researchers at the University of Washington have recently found a way to estimate its levels within US cities. To do this, they implemented an AI algorithm that analysed images from Google Street View, taking into account placement of infrastructure like parks, metro stations and even pet stores. The AI program could then estimate obesity levels within certain areas of a city.
Whether it could change people’s habits is another matter entirely.
Illustrations by Kseniya Forbender
To contact the editor responsible for this story:
Margarita Khartanovich at [email protected]
- Artificial Intelligence Isn’t Ready to Take Over From Doctors and Nurses, Just Yet
- Machine Learning Vs. Analysts: Will AI Eventually Replace Data Scientists?
- Why The Danger of Deepfakes Is No Danger At All
- Sarcastic Robots? How Deep Convolutional Neural Networks Are Making AI Worryingly Human
- Traffic Lights in The Sky: Flying Cars Can Appear Sooner Than You Think