Sadly, none of us could attend the NeurIPS conference this year. Hopefully, this will change for 2020. Nevertheless, it is always worth checking out the topics covered, and a few got my instant attention.
Have you ever enjoyed waking up several Siris at once by yelling "Hej Siri" while running through your favorite coworking space?
If not, you should hurry. Starting with iOS13, this will become a prank of the past.
Privacy protection vs. service quality
For many use cases, privacy protection is not only required by law but also becomes a selling point you shouldn't ignore.
On the other hand, services based on machine learning algorithms, like Apples Siri, need many example data to improve over time.
Federated learning keeps your data at your phone
Julien Freudiger, Apple's head of privacy, showed at the NeurIPS conference how Apple is using federated learning to recognize your voice without compromising your privacy.
Federated learning is a machine learning method introduced by Google in 2017. If you would like to deep dive into this topic I can recommend starting with a talk held by Emily Glanz and Daniel Ramage at the Google I/O'19: "Federated Learning: Machine Learning on Decentralized Data"
It is using the locally available audio data to train a local copy of a speaker recognition model. It sends the updated models of all Siri users, who have allowed the usage, back to a central server, which is then combined into a master model.
By using this approach, your raw audio data is never leaving your iOS devices, but it has been shown that, with enough effort, private information could still get recovered. This is where "Differential Privacy "comes in.
Combining Differential Privacy with Federated Learning
Apple is using an additional layer of protection by injecting a small amount of noise into the original data, which makes it more difficult to reverse-engineer the original information.
"Adding noise" sounds easy, but one has to keep in mind that the information is changed to protect the privacy, but we still want it to be very accurate so that we can train models with it.
If you are interested in how this magic happens, I can highly recommend the talk "The Definition of Differential Privacy" by Cynthia Dwork.
How to use it yourself
If you are interested in this technology, I would recommend heading to the TensorFlow documentation. Of course, dropping a message would also work ;).
Just in case you need a management version, I would recommend this high-level summary by the Google AI team, although your management should have some kind of humor :). If they don't, you should probably switch your management, aka job anyway.
A matter of trust
I think that it is very promising that the tech giants are working on data privacy tools, and I am grateful that we can use the technology in our projects.
But in the end, it is still a matter of trust if they use the described technology or go the easy path and collect everything.
Photo by Przemyslaw Marczynski on Unsplash