ON THE INTERNET, the personal data users give away for free is transformed into a precious commodity. The puppy photos people upload train machines to be smarter. The questions they ask Google uncover humanity’s deepest prejudices. And their location histories tell investors which stores attract the most shoppers. Even seemingly benign activities, like staying in and watching a movie, generate mountains of information, treasure to be scooped up later by businesses of all kinds.
Personal data is often compared to oil—it powers today’s most profitable corporations, just like fossil fuels energized those of the past. But the consumers it’s extracted from often know little about how much of their information is collected, who gets to look at it, and what it’s worth. Every day, hundreds of companies you may not even know exist gather facts about you, some more intimate than others. That information may then flow to academic researchers, hackers, law enforcement, and foreign nations—as well as plenty of companies trying to sell you stuff.
What Constitutes "Personal Data"?
The internet might seem like one big privacy nightmare, but don’t throw your smartphone out the window just yet. “Personal data” is a pretty vague umbrella term, and it helps to unpack exactly what it means. Health records, social security numbers, and banking details make up the most sensitive information stored online. Social media posts, location data, and search-engine queries may also be revealing but are also typically monetized in a way that, say, your credit card number is not. Other kinds of data collection fall into separate categories—ones that may surprise you. Did you know some companies are analyzing the unique way you tap and fumble with your smartphone?
All this information is collected on a wide spectrum of consent: Sometimes the data is forked over knowingly, while in other scenarios users might not understand they’re giving up anything at all. Often, it’s clear something is being collected, but the specifics are hidden from view or buried in hard-to-parse terms-of-service agreements.
Consider what happens when someone sends a vial of saliva to 23andme. The person knows they’re sharing their DNA with a genomics company, but they may not realize it will be resold to pharmaceutical firms. Many apps use your location to serve up custom advertisements, but they don’t necessarily make it clear that a hedge fund may also buy that location data to analyze which retail stores you frequent. Anyone who has witnessed the same shoe advertisement follow them around the web knows they’re being tracked, but fewer people likely understand that companies may be recording not just their clicks but also the exact movements of their mouse.
In each of these scenarios, the user received something in return for allowing a corporation to monetize their data. They got to learn about their genetic ancestry, use a mobile app, or browse the latest footwear trends from the comfort of their computer. This is the same sort of bargain Facebook and Google offer. Their core products, including Instagram, Messenger, Gmail, and Google Maps, don’t cost money. You pay with your personal data, which is used to target you with ads.
Who Buys, Sells, and Barters My Personal Data?
The trade-off between the data you give and the services you get may or may not be worth it, but another breed of business amasses, analyzes, and sells your information without giving you anything at all: data brokers. These firms compile info from publicly available sources like property records, marriage licenses, and court cases. They may also gather your medical records, browsing history, social media connections, and online purchases.
Depending on where you live, data brokers might even purchase your information from the Department of Motor Vehicles. Don’t have a driver’s license? Retail stores sell info to data brokers, too.
The information data brokers collect may be inaccurate or out of date. Still, it can be incredibly valuable to corporations, marketers, investors, and individuals. In fact, American companies alone are estimated to have spent over $19 billion in 2018 acquiring and analyzing consumer data, according to the Interactive Advertising Bureau.
Data brokers are also valuable resources for abusers and stalkers. Doxing, the practice of publicly releasing someone’s personal information without their consent, is often made possible because of data brokers. While you can delete your Facebook account relatively easily, getting these firms to remove your information is time-consuming, complicated, and sometimes impossible. In fact, the process is so burdensome that you can pay a service to do it on your behalf.
Amassing and selling your data like this is perfectly legal. While some states, including California and Vermont, have recently moved to put more restrictions on data brokers, they remain largely unregulated. The Fair Credit Reporting Act dictates how information collected for credit, employment, and insurance reasons may be used, but some data brokers have been caught skirting the law. In 2012 the “person lookup” site Spokeo settled with the FTC for $800,000 over charges that it violated the FCRA by advertising its products for purposes like job background checks. And data brokers that market themselves as being more akin to digital phone books don’t have to abide by the regulation in the first place.
There are also few laws governing how social media companies may collect data about their users. In the United States, no modern federal privacy regulation exists, and the government can even legally request digital data held by companies without a warrant in many circumstances (though the Supreme Court recently expanded Fourth Amendment protections to a narrow type of location data).
The good news is, the information you share online does contribute to the global store of useful knowledge: Researchers from a number of academic disciplines study social media posts and other user-generated data to learn more about humanity. In his book, Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are, Seth Stephens-Davidowitz argues there are many scenarios where humans are more honest with sites like Google than they are on traditional surveys. For example, he says, fewer than 20 percent of people admit they watch porn, but there are more Google searches for “porn” than “weather.”
Personal data is also used by artificial intelligence researchers to train their automated programs. Every day, users around the globe upload billions of photos, videos, text posts, and audio clips to sites like YouTube, Facebook, Instagram, and Twitter. That media is then fed to machine learning algorithms, so they can learn to “see” what’s in a photograph or automatically determine whether a post violates Facebook’s hate-speech policy. Your selfies are literally making the robots smarter. Congratulations.
The History of Personal Data Collection
Humans have used technological devices to collect and process data about the world for thousands of years. Greek scientists developed the “first computer,” a complex gear system called the Antikythera mechanism, to trace astrological patterns as far back as 150 BC. Two millennia later, in the late 1880s, Herman Hollerith invented the tabulating machine, a punch card device that helped process data from the 1890 United States Census. Hollerith created a company to market his invention that later merged into what is now IBM.
By the 1960s, the US government was using powerful mainframe computers to store and process an enormous amount of data on nearly every American. Corporations also used the machines to analyze sensitive information including consumer purchasing habits. There were no laws dictating what kind of data they could collect. Worries over supercharged surveillance soon emerged, especially after the publication of Vance Packard’s 1964 book, The Naked Society, which argued that technological change was causing the unprecedented erosion of privacy.
The Future of Personal Data CollectionPersonal information is currently collected primarily through screens, when people use computers and smartphones. The coming years will bring the widespread adoption of new data-guzzling devices, like smart speakers, censor-embedded clothing, and wearable health monitors. Even those who refrain from using these devices will likely have their data gathered, by things like facial recognition-enabled surveillance cameras installed on street corners. In many ways, this future has already begun: Taylor Swift fans have had their face data collected, and Amazon Echos are listening in on millions of homes.
We haven’t decided, though, how to navigate this new data-filled reality. Should colleges be permitted to digitally tracktheir teenage applicants? Do we really want health insurance companies monitoring our Instagram posts? Governments, artists, academics, and citizens will think about these questions and plenty more.
And as scientists push the boundaries of what’s possible with artificial intelligence, we will also need to learn to make sense of personal data that isn’t even real, at least in that it didn’t come from humans. For example, algorithms are already generating “fake” data for other algorithms to train on. So-called deepfake technology allows propagandists and hoaxers to leverage social media photos to make videos depicting events that never happened. AI can now create millions of synthetic faces that don’t belong to anyone, altering the meaning of stolen identity. This fraudulent data could further distort social media and other parts of the internet. Imagine trying to discern whether a Tinder match or the person you followed on Instagram actually exists.
Whether data is fabricated by computers or created by real people, one of the biggest concerns will be how it is analyzed. It matters not just what information is collected but also what inferences and predictions are made based upon it. Personal data is used by algorithms to make incredibly important decisions, like whether someone should maintain their health care benefits, or be released on bail. Those decisions can easily be biased, and researchers and companies like Google are now working to make algorithms more transparent and fair.
Tech companies are also beginning to acknowledge that personal data collection needs to be regulated. Microsoft has called for the federal regulation of facial recognition, while Apple CEO Tim Cook has argued that the FTC should step in and create a clearinghouse where all data brokers need to register. But not all of Big Tech’s declarations may be in good faith. In the summer of 2018, California passed a strict privacy law that will go into effect on January 1, 2020, unless a federal law supersedes it. Companies like Amazon, Apple, Facebook, and Google are now pushing for Congress to pass new, less stringent privacy legislation in 2019 before the California law kicks in. Even in a divided Congress, lawmakers could come together around privacy—scrutinizing Big Tech has become an important issue for both sides.
Some companies and researchers argue it’s not enough for the government to simply protect personal data; consumers need to own their information and be compensated when it’s used. Social networks like Minds and Steemit have experimented with rewarding users with cryptocurrency when they share content or spend time using their platforms. Other companies will pay you in exchange for sharing data—your banking transactions, for instance—with them. But allowing people to take back ownership likely wouldn’t solve every privacy issue posed by personal data collection. It might also be the wrong way to frame the issue: Instead, perhaps, less collection should be permitted in the first place, forcing companies to move away from the targeted-advertising business model altogether.
Before we can figure out the future of personal data collection, we need to learn more about its present. The cascade of privacy scandals that have come to light in recent years—from Cambridge Analytica to Google’s shady location tracking practices—have demonstrated that users still don’t know all the ways their information is being sold, traded, and shared. Until consumers actually understand the ecosystem they’ve unwittingly become a part of, we won’t be able to grapple with it in the first place.