Laurent - stock.adobe.com
GDPR a challenge to AI black boxes
Most artificial intelligence “black boxes” do not comply with EU data protection laws and will have to be re-engineered, warns security researcher and consultant
Developers of machine learning systems fuelled by personal data need to comply with the EU’s General Data Protection Regulation (GDPR), says Alessandro Guarino, principal consultant at StudioAG.
The GDPR applies to all automated individual decision-making and profiling, which is currently the most common application of artificial intelligence (AI) and specifically machine learning algorithms, he told the EEMA ISSE 2018 cyber security conference in Brussels.
The GDPR defines profiling as any form of automated processing of personal data consisting of the use of personal data to evaluate certain personal aspects relating to a natural person, in particular to analyse or predict aspects concerning that natural person’s performance at work, economic situation, health, personal preferences, interests, reliability, behaviour, location or movements.
However, Guarino said there was a problem because most of these machine learning decision-making systems were “black boxes” rather than old-style rule-based expert systems, and therefore failed to comply with the GDPR requirements of transparency, accountability and putting the data subject in control.
“Accountability is one of the main underlying principles of the GDPR, and that poses a very big problem for machine learning algorithms, especially for newer tools such as deep learning and automated feature extraction because we don’t know how the evaluation is being done or what features [data points] are being used,” he said.
Machine learning models are fuelled by data, and personal data in particular, said Guarino. This means there are privacy risks which have to be overcome in an ethical way that respects the privacy of the individual and without being tantamount to surveillance or discrimination.
Another problem, said Guarino, is that the GDPR applies to data controllers outside the EU that process the personal data of European citizens, but in countries such as the US the data is not by law controlled by data subjects themselves, but data supplied becomes the property of the company, which means they can reuse or resell that data.
Risk-based approach
Developers and suppliers need to adopt a risk-based approach, he said, because the GDPR does not have a checklist, instead requiring data controllers who process personal data to assess the risks for the data subjects and be accountable for how they handle that data.
“We need to find a way to design and use machine learning algorithms in a way that is compliant with the GDPR, because they will generate value for both service providers and data subjects if done correctly,” said Guarino.
“The algorithms need to be accountable in some way, but it is not yet clear how this could be done, and research in this area is still ongoing. Machine learning processes cannot be treated as black boxes and will have to make it clear how they arrive at decisions,” he said.
Alessandro Guarino, StudioAG
To be compliant with the GDPR, Guarino said all products and services that use personal data should be designed from the ground up with privacy and data protection in mind. “The process will have to be documented and demonstrated to meet the requirements of transparency and accountability.”
The GDPR principle of data minimisation, said Guarino, is also a potential challenge to machine learning algorithms because any processing activity is required to process only the data needed for a specific purpose.
“But this is difficult to do in machine learning because the models require as much data as possible, so system developers will have to pay attention to data management from the start,” he said.
In general, the GDPR prohibits automated profiling unless one of three specific conditions is met. “The approach is very strict, and the most practical condition to meet is explicit consent from the individual concerned,” said Guarino.
“The other conditions are that a decision is necessary for the performance of a contract, or where the decision is authorised by EU or member state law applicable to the controller, but are more difficult to meet than explicit consent,” he said.
GDPR compliance a challenge
The challenge, said Guarino, is to design and develop GDPR-compliant machine learning systems from the ground up, especially when they are predictive or used to support decisions about the data subject.
“The reality is that most existing applications of this kind will have to be re-engineered or re-designed according to the principles of privacy by design and accountability.
“This means developers will have to document everything they do in the design process, including data management and so on, with particular attention to demonstrating that the algorithms are fair or neutral and therefore not discriminatory,” he said.
The requirement that algorithms are fair and transparent sets a higher bar than human decisions in many cases, said Guarino, which could skew the market into non-adoption of innovative systems or adoption of “weakened AI”.
While machine learning systems can be designed to be GDPR compliant, he said it would not be straightforward and would require a lot of work.
“It implies a change in the way of thinking about big data and machine learning, and cross-disciplinary competences are needed because we need not only developers, but also domain experts as well as legal and data protection experts, so it will be interesting to see what the future will bring and how far the GDPR can reach.”
Read more about machine learning
- What machine learning practitioners can learn from data warehousing.
- Machine Learning brains need a training model.
- Using AI, machine learning in networking to improve analytics.
- Digital Catapult launches Ethics Framework for AI and machine learning.