You may not have noticed, but two of the world’s most popular machine learning frameworks — TensorFlow and PyTorch — have taken steps in recent months toward privacy with solutions that incorporate federated learning.
Instead of gathering data in the cloud from users to train data sets, federated learning trains AI models on mobile devices in large batches, then transfers those learnings back to a global model without the need for data to leave the device.
As part of the latest release of Facebook’s popular deep learning framework PyTorch last month, the company’s AI Research group rolled out Secure and Private AI, a free two-month Udacity course on the use of methods like encrypted computation, differential privacy, and federated learning. The first course began last week and is being taught by Andrew Trask, a senior research scientist at Google’s DeepMind. He’s also the leader of Openmined, a privacy-focused open source AI community that in March released PySyft to bring PyTorch and federated learning together.
“It’s not just Facebook, I think the [AI] field in general is looking at this direction pretty seriously,” PyTorch creator Soumith Chintala told VentureBeat in an interview. “Yeah, I think you will absolutely see more effort, more direction, [and] more packages, both in terms of PyTorch and others, coming in this direction for sure.”
As privacy becomes a selling point, federated learning is poised to grow in popularity among both tech giants and industries where privacy protection is required, like health care.
Building privacy into AI
Google AI researchers first introduced federated learning in 2017, and since then it’s been cited more than 300 times by research scientists, according to arXiv. In March, Google released TensorFlow Federated to make federated learning easier to perform with its popular machine learning framework.
At the Google I/O conference in May 2019, CEO Sundar Pichai talked about federated learning as part of his pitch to the world that Google is serious about privacy for all, alongside features like Incognito Mode in Google Maps and using your Android phone as a security key for two-step verification. Speed improvements with on-device machine learning will also be making Google Assistant up to 10 times faster in the coming months.
Back in 2017, Gboard, the Android device keyboard, began to use federated learning to learn new words from users and predict the next word or emoji to use.
“It’s still very early, but we are excited about the progress and the potential of federated learning across many more of our products,” Pichai said onstage during the 2019 keynote address.
Beyond giving Android users a smarter keyboard, Google is exploring the use of federated learning to improve security, Google head of account security Mark Risher told VentureBeat AI staff writer Kyle Wiggers in a recent phone interview. Federated learning will enable malicious third parties to test against on-device anti-phishing security models, so it’s not a great fit in security yet, but they’re working towards that goal, Risher said.
Federated learning still faces challenges, though, including an inability to inspect training examples, bandwidth issues, and the need for a WiFi connection, and for labeling to be naturally inferred from user interactions.
Why federated learning improves privacy
Updates sent from devices can still contain some personal data or tell you about a person, and so differential privacy is used to add gaussian noise to data shared by devices, Google AI researcher Brendan McMahan said in a 2018 presentation.
Distributing model training and predictions to devices instead of sharing data in the cloud also saves battery and bandwidth, since you would have to download the model on Wi-Fi, he said.
Use of federated learning, for example, led to a 50x decrease in the number of rounds of communication necessary to get a reasonably accurate CIFAR convolutional neural net for computer vision.
Looking at things in the aggregate means the server doesn’t need very much data from devices, McMahan said.
“In fact, all the server really needs to know is the average of the updates or the sum of those updates. It doesn’t care about any individual update,” he said in the presentation. “Wouldn’t it be great if Google could not see those individual updates and only got that aggregate?”
McMahan was coauthor of the influential 2017 research paper introducing federated learning to the world. A team of Google AI researchers including McMahan and Ian Goodfellow also authored a heavily cited 2016 paper titled “Deep Learning with Differential Privacy.” Goodfellow left Google in 2019 to be director of a machine learning special projects group at Apple.
In 2016, a year before Google introduced federated learning and differential privacy for Gboard, Apple did the same for QuickType and emoji suggestions in iOS 10.
Applications for protected data
Federated learning’s ability to mask data has led to exploration of its applications in industries like health care. The technique is powering a platform from Owkin, a company backed by GV. The platform helps medical professionals conduct tests and experiments to predict disease evolution and drug toxicity. In recent months, AI researchers from Harvard University, MIT’s CSAIL, and Tsinghua University’s Academy of Arts and Design devised a method to analyze electronic medical records with federated learning.
Training models with encrypted or protected data isn’t an altogether new thing. For example, Microsoft AI researchers applied neural networks to encrypted data for its CryptoNets model back in 2016.
However, federated learning and approaches that deliver machine intelligence without collection of raw data will likely grow in popularity as people care more about privacy and more device manufacturers turn to on-device machine learning.