跳到主要內容

How will the GDPR impact machine learning?

How will the GDPR impact machine learning?

Answers to the three most commonly asked questions about maintaining GDPR-compliant machine learning programs.
Light structure
Light structure (source: Pixabay)
Much has been made about the potential impact of the EU’s General Data Protection Regulation (GDPR) on data science programs. But there’s perhaps no more important—or uncertain—question than how the regulation will impact machine learning (ML), in particular. Given the recent advancements in ML, and given increasing investments in the field by global organizations, ML is fast becoming the future of enterprise data science.
This article aims to demystify this intersection between ML and the GDPR, focusing on the three biggest questions I’ve received at Immuta about maintaining GDPR-compliant data science and R&D programs. Granted, with an enforcement data of May 25, the GDPR has yet to come into full effect, and a good deal of what we do know about how it will be enforced is either vague or evolving (or both!). But key questions and key challenges have already started to emerge.

1. Does the GDPR prohibit machine learning?

The short answer to this question is that, in practice, ML will not be prohibited in the EU after the GDPR goes into effect. It will, however, involve a significant compliance burden, which I’ll address shortly.
Technically, and misleadingly, however, the answer to this question actually appears to be yes, at least at first blush. The GDPR, as a matter of law, does contain a blanket prohibition on the use of automated decision-making, so long as that decision-making occurs without human intervention and produces significant effects on data subjects. Importantly, the GDPR itself applies to all uses of EU data that could potentially identify a data subject—which, in any data science program using large volumes of data, means that the GDPR will apply to almost all activities (as study after study has illustrated the ability to identify individuals given enough data).
When the GDPR uses the term “automated decision-making,” the regulation is referring to any model that makes a decision without a human being involved in the decision directly. This could include anything from the automated “profiling” of a data subject, like bucketing them into specific groups such as “potential customer” or “40-50 year old males,” to determining whether a loan applicant is directly eligible for a loan.
As a result, one of the first major distinctions the GDPR makes about ML models is whether they are being deployed autonomously, without a human directly in the decision-making loop. If the answer is yes—as, in practice, will be the case in a huge number of ML models—then that use is likely prohibited by default. The Working Party 29, an official EU group involved in drafting and interpreting the GDPR, has said as much, despite the objections of many lawyers and data scientists (including yours truly).
So why is interpreting the GDPR as placing a ban on ML so misleading?
Because there are significant exceptions to the prohibition on the autonomous use of ML—meaning that “prohibition” is way too strong of a word. Once the GDPR goes into effect, data scientists should expect most applications of ML to be achievable—just with a compliance burden they won’t be able to ignore.
Now, a bit more detail on the exceptions to the prohibition.
The regulation identifies three areas where the use of autonomous decisions is legal: where the processing is necessary for contractual reasons, where it’s separately authorized by another law, or when the data subject has explicitly consented.
In practice, it’s that last basis—when a data subject has explicitly allowed their data to be used by a model—that’s likely to be a common way around this prohibition. Managing user consent is not easy. Users can consent to many different types of data processing, and they can also withdraw that consent at anytime, meaning that consent management needs to be granular (allowing many different forms of consent), dynamic (allowing consent to be withdrawn), and user friendly enough that data subjects are actually empowered to understand how their data is being used and to assert control over that use.
So, does the GDPR really prohibit the use of ML models? Not completely - but it will, in many of ML’s most powerful use cases, make the deployment and management of these models and their input data increasingly difficult.
Receive weekly insight from industry insiders—plus exclusive content, offers, and more on the topic of data

2. Is there a “right to explainability” from ML?

This is one of the most common questions I receive about the GDPR, so much so that I wrote an entire article devoted to the subject last year. This question arises from the text of the GDPR itself, which has created a significant amount of confusion. And the stakes for this question are incredibly high. The existence of a potential right to explainability could have huge consequences for enterprise data science, as much of the predictive power of ML models lies in complexity that’s difficult, if not impossible, to explain.
Let’s start with the text.
In Articles 13-15 of the regulation, the GDPR states repeatedly that data subjects have a right to “meaningful information about the logic involved” and to “the significance and the envisaged consequences” of automated decision-making. Then, in Article 22 of the regulation, the GDPR states that data subjects have the right not to be subject to such decisions when they’d have the type of impact described above. Lastly, Recital 71, which is part of a non-binding commentary included in the regulation, states that data subjects are entitled to an explanation of automated decisions afterthey are made, in addition to being able to challenge those decisions. Taken together, these three provisions create a host of new and complex obligations between data subjects and the models processing their data, suggesting a pretty strong right to explainability.
While it is possible, in theory, that EU regulators could interpret these provisions in the most stringent way—and assert that some of the most powerful uses of ML will require a full explanation of the model’s innerworkings—this outcome seems implausible.
What’s more likely is that EU regulators will read these provisions as suggesting that when ML is used to make decisions without human intervention, and when those decisions significantly impact data subjects, those individuals are entitled to some basic form of information about what is occurring. What the GDPR calls “meaningful information” and “envisaged consequences” will likely be read within this context. EU regulators are likely to focus on a data subject’s ability to make informed decisions about the use of their data—basically, the level of transparency available to the data subject—based on information about the model and the context within which it’s deployed.

3. Do data subjects have the ability to demand that models be retrained without their data?

This is perhaps one of the most difficult questions to answer about the impact of GDPR on ML. Put another way: if a data scientist uses a data subject’s data to train a model, and then deploys that model against new data, does the data subject have any right over the model that their data helped to originally train?
As best as I can tell, the answer is going to be no, at least in practice—with a very theoretical exception. To understand why, I’ll start with the exception.

SAFARI

Join Safari. Get a free trial today and find answers on the fly, or master something new and useful.
Learn more 
Under the GDPR, all uses of data require a legal basis in processing, and Article 6 of the regulation sets forth six corresponding bases. The two most important are likely to be the “legitimate interest” basis (where the interests of the organization justify specific uses of that data, which might cover a use like fraud prevention) and where the user has explicitly consented to the use of that data. When the legal basis for the processing is the latter, the data subject will retain a significant degree of control over that data, meaning they can withdraw consent at any time and the legal basis for processing that data will no longer remain.
So, if an organization collects data from a data subject, the user consents to have their data used to train a particular model, and then the data subject later withdraws that consent, when could the user force the model to be retrained on new data?
The answer is only if that model continued to use that users’ data. As the Working Party 29 has specified, even after consent is withdrawn, all processing that occurred before the withdrawal remains legal. So, if the data was legally used to create a model or a prediction, whatever that data gave rise to may be retained. In practice, once a model is created with a set of training data, that training data can be deleted or modified without affecting the model.
Technically, however, some research suggests that models may retain information about the training data in ways that could allow the discovery of the original data even after training data has been deleted, as researchers Nicolas Papernot and others have written about extensively. This means that in some circumstances, deleting the training data without retraining the model is no guarantee that the training data could not be rediscovered, or no guarantee that the original data isn’t, at least in some senses, still being used.
But how likely is training data going to be rediscovered through a model? Pretty unlikely.
To my knowledge, rediscovery of this sort has only been conducted in academic environments that are pretty far removed from the everyday realities of enterprise data science. It’s for this reason that I don’t expect models to be subject to constant demands of being retrained on new data due to the GDPR. Though this is theoreticallya possibility, it seems to be an edge case that regulators and data scientists will only have to address if this specific type of instance becomes more realistic.
All that said, there’s a huge amount of nuance to all these questions—and future nuances will surely arise. With 99 Articles and 173 Recitals, the GDPR is long, complex, and likely to get more complex over time as its many provisions are enforced.
At this point, however, at least one thing is clear: thanks to the GDPR, lawyers and privacy engineers are going to be a central component of large-scale data science programs in the future.

留言

這個網誌中的熱門文章

opencv4nodejs Asynchronous OpenCV 3.x Binding for node.js   122     2715     414   0   0 Author Contributors Repository https://github.com/justadudewhohacks/opencv4nodejs Wiki Page https://github.com/justadudewhohacks/opencv4nodejs/wiki Last Commit Mar. 8, 2019 Created Aug. 20, 2017 opencv4nodejs           By its nature, JavaScript lacks the performance to implement Computer Vision tasks efficiently. Therefore this package brings the performance of the native OpenCV library to your Node.js application. This project targets OpenCV 3 and provides an asynchronous as well as an synchronous API. The ultimate goal of this project is to provide a comprehensive collection of Node.js bindings to the API of OpenCV and the OpenCV-contrib modules. An overview of available bindings can be found in the  API Documentation . Furthermore, contribution is highly appreciated....

2017通訊大賽「聯發科技物聯網開發競賽」決賽團隊29強出爐!作品都在11月24日頒獎典禮進行展示

2017通訊大賽「聯發科技物聯網開發競賽」決賽團隊29強出爐!作品都在11月24日頒獎典禮進行展示 LIS   發表於 2017年11月16日 10:31   收藏此文 2017通訊大賽「聯發科技物聯網開發競賽」決賽於11月4日在台北文創大樓舉行,共有29個隊伍進入決賽,角逐最後的大獎,並於11月24日進行頒獎,現場會有全部進入決賽團隊的展示攤位,總計約為100個,各種創意作品琳琅滿目,非常值得一看,這次錯過就要等一年。 「聯發科技物聯網開發競賽」決賽持續一整天,每個團隊都有15分鐘面對評審團做簡報與展示,並接受評審們的詢問。在所有團隊完成簡報與展示後,主辦單位便統計所有評審的分數,並由評審們進行審慎的討論,決定冠亞季軍及其他各獎項得主,結果將於11月24日的「2017通訊大賽頒獎典禮暨成果展」現場公佈並頒獎。 在「2017通訊大賽頒獎典禮暨成果展」現場,所有入圍決賽的團隊會設置攤位,總計約為100個,展示他們辛苦研發並實作的作品,無論是想觀摩別人的成品、了解物聯網應用有那些新的創意、尋找投資標的、尋找人才、尋求合作機會或是單純有興趣,都很適合花點時間到現場看看。 頒獎典禮暨成果展資訊如下: 日期:2017年11月24日(星期五) 地點:中油大樓國光廳(台北市信義區松仁路3號) 我要報名參加「2017通訊大賽頒獎典禮暨成果展」>>> 在參加「2017通訊大賽頒獎典禮暨成果展」之前,可以先在本文觀看各團隊的作品介紹。 決賽29強團隊如下: 長者安全救星 可隨意描繪或書寫之電子筆記系統 微觀天下 體適能訓練管理裝置 肌少症之行走速率檢測系統 Sugar Robot 賽亞人的飛機維修輔助器 iTemp你的溫度個人化管家 語音行動冰箱 MR模擬飛行 智慧防盜自行車 跨平台X-Y視覺馬達控制 Ironmet 菸消雲散 無人小艇 (Mini-USV) 救OK-緊急救援小幫手 穿戴式長照輔助系統 應用於教育之模組機器人教具 這味兒很台味 Aquarium Hub 發展遲緩兒童之擴增實境學習系統 蚊房四寶 車輛相控陣列聲納環境偵測系統 戶外團隊運動管理裝置 懷舊治療數位桌曆 SeeM智能眼罩 觸...

聊天機器人到底在紅什麼?三分鐘帶你了解!

聊天機器人到底在紅什麼?三分鐘帶你了解! 2017/6/14     精選轉貼     AI 、 Bot 、 聊天機器人 評論 本文原作者  優拓資訊  為新銳 AI 團隊,以 Chatbot、網路爬蟲、自然語言處理為核心技術,在資訊爆炸年代,協助企業擷取關鍵情報、槓桿社群效益、提升行銷效率、與受眾進行精準溝通。業務發展重點為  Aloha.AI  ── 商務機器人解決方案,與  Poller.AI  ── 品牌輿情監測助手。原文刊登於  yoctol.com  ,INSIDE 獲授權轉載。 優拓的共同創辦人黃鐘揚教授,本次很榮幸地受邀至台北國際電腦展開展論壇 e21FORUM 演講,以「三分鐘讓你了解聊天機器人在紅什麼?」為題,帶領聽眾從社群網路、人工智慧、數據分析等多種角度切入,介紹並探究聊天機器人 (chatbot) 的特點及其於生活與商務上的創新應用,並討論因應而生的產業發展趨勢。   不只是聊天而已!聊天機器人還有哪些功能? 常駐在 Facebook Messenger、LINE、WeChat 等通訊軟體中的聊天機器人,它們所能做到的事情,當然不僅僅是與使用者聊天而已,透過「如同聊天般的操作方式」這個特點,聊天機器人能依照建造者不同的目的、發展成具備不同功能的工具,這也顯示出除了聊天以外,聊天機器人還擁有更龐大的潛力等著被開發。 舉例來說,對於企業而言,Chatbot 是親切聰明的品牌代言人,能為訪客介紹公司的頂尖技術、案例、當前工作職缺;對於商家而言, Chatbot 則是具愛心且效率極高的門市人員,能為訪客推薦最合適的旅程、投資項目;無須另行安裝,能直接且直覺地開始互動的 Chatbot,當然也可以是整合實體空間資訊的助手,提供展場導覽、現場優惠推播等服務。 聊天機器人到底在紅什麼? 簡而言之,由於聊天機器人技術的普及,人們第一次可以輕易的把「人工智慧」應用在社群與商務上面,於是經營社群的行銷人員,可以獲得技術支持;技術人員,也可以獲得實踐技術的社群。此外,我們近幾年也觀察到以下幾個趨勢: ▲左方為 Carousel(櫥窗樣板)的應用案例,C...