The privacy equation in artificial intelligence

Once the internet has really took off and the use of technology in almost every activity has become the new norm, people started to voluntarily trade their privacy for convenience. And they have done it for years. Yet, nowadays, individuals are finally thinking about questioning this exchange and paying more attention to their data privacy.

According to a study by Intouch International, 9 in 10 American internet users stated they are concerned about the privacy and security of their personal information online. Moreover, 67% of them are currently advocating for strict national privacy laws.

Considering the advent of artificial intelligence, whose successful application is entirely based on large amounts of data, the need to ensure individuals’ privacy is paramount.

Reading Hannah Fry’s book “Hello World: How to be human in the age of the machine”, it has become clear that the information we are giving away, voluntarily or unintended, eventually end up in large databases. In most cases, these databases are afterwards exploited for any number of uses, including marketing opportunities, purchasing recommendations, credit scoring, etc.

Even though we can argue that individuals technically give their consent by accepting the infamous ‘Terms and Conditions’, most user agreements are too lengthy and we might not even realize what privacy rights we are about to lose.

Sophia the Robot, 2017

How exactly is our privacy affected by AI?

Scoring and ratings

We see AI as the largest data collector and interpreter, yet we have never stopped to fully comprehend the meaning of this. The information artificial intelligence analyzes are often used to classify, evaluate and rank individuals. Apart from the fact this is usually done without users' consent, it can also lead to discrimination, missed opportunities. Take for example the case of Sesame Credit, a citizen scoring system used by the Chinese Government. When the scoring will become mandatory in 2020, it is expected that people with low scores will witness the repercussions in every aspect of their lives. Li Yingyun, the company’s technology director, stated that the details of the complex scoring algorithm are not to be disclosed, however, he did share some insights on how it works. For instance, someone who plays video games a few hours a day is considered less reliable than someone who might be a parent because their shopping behavior shows they buy diapers frequently.

Voice and facial recognition

Speech and facial recognition are two methods of identification that AI is becoming increasingly fond of. The downside is that these methods have the potential to severely compromise anonymity in the public space. To illustrate, consider the case of a law enforcement agency who uses facial and voice recognition to find individuals, bypassing the legal requirements.

A display shows a vehicle and person recognition system for law enforcement during the NVIDIA GPU Technology Conference, which showcases artificial intelligence, deep learning, virtual reality and autonomous machines (AFP Photo/SAUL LOEB)


Obviously, using machine learning algorithms to predict certain outcomes could turn out to be of a real help in many industries, including healthcare, justice, manufacturing, etc. Yet, in some situations, this could also significantly invade people’s privacy. Activity logs, location data, and other similar metrics are endless sources that can allow others to deduce a person’s political views, ethnic identity, sexual orientation, and even overall health.

There are sophisticated ML algorithms that can predict someone’s predisposition to a certain disease, be it genetic or not. Take the case of genomics and biotechnology company 23andMe, which provides a full report of individuals’ genetic traits in exchange for one of their most valuable assets, their DNA. In the end, the company has a large amount of genetic data that it can train into a ML model to predict desired outcomes. Or it can further distribute the database to data brokers.

Identification and tracking

We’ve already established that AI can be used to identify, track and monitor individuals across multiple devices, whether they are at work, at home, or at a public location. If you’re thinking you are safe because your data is anonymized, think again. This might come as a surprise for some, but once it becomes a part of a large data set, an AI can de-anonymize any data based on inferences from other devices.

What can policy-makers do?

In the last couple of years, we have witnessed a lot of progress in terms of regulation related to privacy and data protection. In Europe, the General Data Protection Regulation that entered into effect in May 2018, brought European citizens much closer to being in charge of how their data is used. Similarly, in the US, the California Consumer Privacy Act is expected to enter into effect in January 2020. The document will grant US citizens new rights when it comes to privacy. In the rest of the US, experts, researchers and politicians are also pushing for GDPR-like directives that will protect individuals’ data, especially in the age of the machines. For instance, the Senate introduced in 2017 a federal bill called the Future of Artificial Intelligence Act aimed at protecting the privacy of individuals against potential abuses from AI.

Although it’s reassuring that regulators have finally understood the necessity for such legislation, this is only a small step towards fixing the privacy issue in AI. Since artificial intelligence requires access to a lot of data, we still need to make sure we have available all the information required to properly train a model. So, how do we create a balance between ensuring individuals’ privacy and gathering the right amount of data for our AI-based projects?

According to Bernhard Debatin, an Ohio University professor and director of the Institute for Applied and Professional Ethics, good privacy legislation in the age of AI should include five components:

  • AI systems must be transparent.
  • An AI must have a “deeply rooted” right to the information it is collecting.
  • Consumers must be able to opt out of the system.
  • The data collected and the purpose of the AI must be limited by design
  • Data must be deleted upon consumer request

Even though this might not guarantee that we’ll never face another privacy issue, at least it will protect us, to a certain extent, from potential AI-based discrimination, lack of consent and data abuse.

What can individuals do?

As in the case of privacy and data protection outside the AI sphere, some of the responsibility also lies with the individuals. The first thing we should do is become more aware of how, where and with whom we share our data. Second, we need to acknowledge that, in most cases, nothing comes for free and we should understand what we’re giving up in exchange for knowing how we’ll look when we’re older or if we have any chances of developing a certain disease.

As usual, we leave you with an interesting talk about the broad privacy implications of data and artificial intelligence.