The Intersection of Computation and Society: Privacy and Fairness Challenges
Exploring the impact of computational algorithms on society, particularly in terms of privacy, fairness, and policy considerations. Collaboration between computer science and other disciplines is highlighted, along with the importance of addressing issues such as data privacy, AI ethics, and digital footprints.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Computation and Society: The Case of Privacy and Fairness Omer Reingold Stanford CS, April 2017 Collaborators: Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Aaron Roth, Guy Rothblum, Salil Vadhan, Rich Zemel,
CS and Other Disciplines First: tell me what you do again? (aka. I have that problem with my modem ) Then: We have tons of data, do you have any clever algorithm for us? Now: The power of the computational lens: Various natural and social phenomena can be viewed as computations. (Recent example: A Lizard With Scales That Behave Like a Computer Simulation NYT 4/12/17 reporting on a Nature article.) The age of collaboration!
Computation and Society With the centrality of algorithms and data, more and more policy questions revolve around computation: Here: tradeoffs between privacy, fairness, and economic utility. Other examples Censorship vs. free speech in social platforms, Filtering of news (the filtered bubble), Identifying fake news, Net neutrality, National security vs. individual freedoms (the San Bernardino cell phone case), Loss of jobs due to automatization, Fear of AI, CS can inform public debate but also extend the range of solutions.
Sensitive Information Digital Footprint: browsing history, social network interactions, location, emails, pictures, levels of physical-activity, food consumption
Privacy vs. Secrecy Private analysis\learning from a corpus of data What can Crypto do it for us? Encryption, Computation on Encrypted Data Secure Function Evaluation Secrecy rather than privacy: Privacy: what (is safe) to compute and share? Crypto (Secure Function Evaluation): how to compute? Invaluable when data curator untrusted (or distributed) Lots of good research questions, lots of good questions on the crypto side for another talk
Notions of Privacy - Anonimization The outcome of a learning algorithm may leak sensitive data Traditionaly (with some legal protections): Anonymization, Deidentification, k-Anonymity The President's Council of Advisors on Science and Technology report to the president on big data and privacy: Anonymization is increasingly easily defeated by the very techniques that are being developed for many legitimate applications of big data. In general, as the size and diversity of available data grows, the likelihood of being able to re identify individuals (that is, re associate their records with their names) grows substantially. While anonymization may remain somewhat useful as an added safeguard in some situations, approaches that deem it, by itself, a sufficient safeguard need updating. Industry reaction: nah
Notions of Privacy - DP The outcome of a learning algorithm may leak sensitive data Recent (decade-old) Notions: Differential Privacy (DP) [Dwork, McSherry, Nissim, Smith] Differential Privacy: (loosely) your increased harm from being in the corpus is small. One motivation: encourages opt-in Incredible impact on various disciplines as well as industry (Google, Apple, Startups, ). Lots of variants: Distributional DP, Pan Privacy, Lots of good questions, for another talk
DP via Expectation of Privacy A study on the connection between smoking and cancer compromises the privacy of smokers (even with DP). No single definition need to incorporate social choice What is a reasonable expectation of privacy? Assume I only want to protect Alice Allow Alice to erase herself and a few others from the database DP provides similar protection simultaneously to everyone Any different protection for individual implies a different variant of DP A way to interface policy-makers and privacy experts
Classification Advertising Health Care Taxation Taxation Financial aid
Privacy and Classifiers Privacy preserving classifiers (observable outcomes of classification): Alice sees a particular ad Alice clicks on the ad What information is leaked about Alice? More challenging scenario, missing even a good definition
Apply Classifier on a Coarse Noisy Version? Influenced by our definition of fairness (later) If the coarse version doesn t distinguishs possible omers, then sensitive properties may be protected ??
Good Definition? Not as strong as crypto defs and even DP: information is leaked Protection: Blend me in with the (surrounding) crowd If your surrounding is normative may imply meaningful protection (and substantiate, currently unjustified, sense of security of users). Lots of possible failings (as with k-anonymity). As strong as the similarity metric
Fairness in Classification Advertising Health Care Taxation Taxation Financial aid
Concern: Discrimination Population includes minorities Ethnic, religious, medical, geographic Protected by law, policy, ethics A catalog of evils: redlining, reverse tokenism, self fulfilling prophecy, discrimination may be subtle!
Credit Application (WSJ 8/4/10) User visits capitalone.com Capital One uses tracking information provided by the tracking network [x+1] to personalize offers Concern: Steering minorities into higher rates (illegal)*
Suggested A CS Perspective An individual based notion of fairness fairness through awareness Versatile framework for obtaining and understanding fairness (including fair affairmative action) Fairness vs. Privacy: Privacy does not imply fairness but definitions and techniques useful
Fairness through Blindness Ignore all irrelevant/protected attributes e.g., Facebook sex & interested in men/women Point of failure: Redundant encodings Machine learning: You don t need to see the label to be able to predict it E.g., redlining
Group Fairness (Statistical Parity) Equalize minority S with general population T at the level of outcomes Pr[outcome o | S] = Pr[outcome o | T] Insufficient as a notion of fairness Has some merit, but can be abused Example: Advertise burger joint to carnivores in T and vegans in S. Example: Self fulfilling prophecy Example: Multiculturalism
Lesson: Fairness is task-specific Fairness requires understanding of classification task Utility and fairness align! Cultural understanding of protected groups Awareness! Secrecy fairness
Our Approach: Individual Fairness Treat similar individuals similarly Similar for the purpose of (fairness in) the classification task Similar distribution over outcomes
Metric Who Decides? Assume task-specific similarity metric Extent to which two individuals are similar w.r.t. the classification task at hand Privacy and fairness are context specific and depends on society s norms How can we facilitate informed public discussion (taking into account algorithmic limitations and ML insights)? Can we learn a good metric? Can we avoid learning past biases? User control? Not obvious if possible Users need to be informed
I Was Rejected Why? NYC teachers Simple explanations of complicated classifiers? Additional risk of gaming? Books in parents home Adversarial errors in deep learning
False discovery Just Getting Worse Trouble at the Lab The Economist
e Accuracy and Privacy Align Showed how to use facilitate adaptive investigations using differential privacy Reusable holdout Limit on how much we can squeeze data for privacy but also for the risk of overfitting