The Intersection of Computation and Society: Privacy and Fairness Challenges

 
Computation and Society:
The Case of Privacy and
Fairness
 
Omer Reingold
Stanford CS, April 2017
 
Collaborators: Cynthia Dwork, Vitaly Feldman,
Moritz Hardt, Aaron Roth, Guy Rothblum, Salil
Vadhan, Rich Zemel, …
CS and Other Disciplines
 
First: 
tell me what you do again?
 
(aka. “I have that
problem with my modem …”)
Then: We have tons of data, do you have 
any clever
algorithm for us
?
Now: The power of the 
computational lens
: Various
natural and social phenomena can be viewed as
computations.
(Recent example: “A Lizard With Scales That Behave Like
a Computer Simulation” NYT 4/12/17 reporting on a
Nature article.)
The age of collaboration!
Big Data + ML Revolution
 
 
 
Computation and Society
 
With the centrality of algorithms and data, more and more 
policy
questions revolve around computation
:
Here: tradeoffs between 
privacy
, 
fairness
, and economic
utility
. Other examples
Censorship vs. free speech in social platforms,
Filtering of news (the filtered bubble),
Identifying fake news,
Net neutrality,
National security vs. individual freedoms (the San Bernardino cell
phone case),
Loss of jobs due to automatization,
Fear of AI, …
CS can 
inform
 public 
debate
 but also 
extend the range of
solutions
.
Sensitive Information
 
Digital Footprint
: browsing history, social
network interactions, location, emails, pictures,
levels of physical-activity, food consumption
Privacy vs. Secrecy
 
Private analysis\learning
 from a corpus of data
What can 
Crypto
 do it for us?
Encryption, Computation on Encrypted Data
Secure Function Evaluation
Secrecy rather than privacy
:
Privacy: 
what 
(is safe) to compute and share?
Crypto (Secure Function Evaluation): 
how
 to compute?
Invaluable when data curator untrusted 
(or
distributed)
Lots of good research questions, lots of good
questions on the crypto side for another talk …
Notions of Privacy - Anonimization
 
The outcome of a learning algorithm may leak sensitive
data
“Traditionaly” (with some legal protections):
Anonymization, Deidentification, k-Anonymity…
The President's Council of Advisors on Science and Technology report to the
president on big data and privacy:
Anonymization is increasingly easily defeated 
by the very techniques that are
being developed for many legitimate applications of big data.  In general, as the
size and diversity of available data grows, the likelihood of being able to
re‐identify individuals (that is, re associate their records with their names) grows
substantially. While anonymization may remain somewhat useful as an added
safeguard in some situations, 
approaches that deem it, by itself, a sufficient
safeguard need updating
.”
Industry reaction: nah
Notions of Privacy - DP
 
The outcome of a learning algorithm may leak
sensitive data
Recent (decade-old) Notions:
Differential Privacy (DP)
[Dwork, McSherry, Nissim, Smith]
Incredible impact on various disciplines as well as
industry (Google, Apple, Startups, …).
Lots of variants: Distributional DP, Pan Privacy, …
Lots of good questions, for another talk …
Differential Privacy
: (loosely) your increased harm from being in the corpus is
small.
One motivation: encourages opt-in
DP via Expectation of Privacy
 
A study on the connection between smoking and cancer
compromises the privacy of smokers (even with DP).
 No single definition – need to incorporate social
choice
What is a 
reasonable expectation of privacy
?
Assume I only want to protect Alice
Allow Alice to erase herself and a few others from the
database
DP
 provides similar protection 
simultaneously to everyone
Any different “protection for individual” implies a
different variant of DP
A way to interface policy-makers and privacy experts
 
Classification
Privacy and Classifiers
 
Privacy preserving classifiers
 (observable outcomes of
classification):
Alice sees a particular ad
Alice clicks on the ad
What information is leaked about Alice?
More challenging scenario, missing even a good definition
Apply Classifier on a Coarse
Noisy Version?
 
Influenced by our definition of 
fairness 
(later)
If the coarse version doesn’t 
distinguishs possible
omers
, then sensitive properties may be protected
 
??
Good Definition?
 
Not as strong as crypto defs and even DP: 
information
is leaked
Protection: 
Blend me in with the (surrounding) crowd
If your 
surrounding is “normative” 
may imply
meaningful protection (and substantiate, currently
unjustified, sense of security of users).
Lots of possible failings (as with k-anonymity).
As strong as the similarity metric
 
Fairness in Classification
 
Concern: Discrimination
 
Population includes minorities
Ethnic, religious, medical, geographic
Protected by law, policy, ethics
A catalog of evils: redlining, reverse tokenism,
self fulfilling prophecy, … 
discrimination may
be subtle!
Credit Application (WSJ 8/4/10)
 
User visits 
capitalone.com
Capital One uses tracking information provided by the
tracking network [x+1] to personalize offers
Concern: 
Steering
 minorities into higher rates (illegal)
 
*
 
Suggested A CS Perspective
 
An individual based notion of fairness – 
fairness
through awareness
Versatile 
framework
 for obtaining and
understanding fairness (including fair
affairmative action)
Fairness vs. 
Privacy:
Privacy does not imply fairness but definitions and
techniques useful
Fairness through Blindness
 
Ignore all irrelevant/protected attributes
e.g., Facebook “sex” & “interested in men/women”
Point of 
failure
: Redundant encodings
Machine learning: You don’t need to see the label to
be able to predict it
E.g., redlining
Group Fairness (Statistical Parity)
 
Equalize minority 
S
 with general population 
T
at the level of outcomes
Pr[outcome o | 
S
] = Pr[outcome o | 
T
]
Insufficient 
as a notion of fairness
Has some merit, but 
can be abused
Example:
 Advertise burger joint to carnivores in 
T
and vegans in 
S.
Example:
 Self fulfilling prophecy
Example:
 Multiculturalism …
Lesson: Fairness is 
task-specific
 
Fairness requires understanding of
classification task
Utility and fairness align
!
Cultural understanding of protected
groups
Awareness!
Secrecy
 
 fairness
Our Approach: Individual Fairness
Treat 
similar
 individuals 
similarly
 
Assume 
task-specific similarity metric
Extent to which two individuals are similar w.r.t. the
classification task at hand
Privacy and fairness are context specific and depends
on
 
society’s norms
How can we facilitate informed public discussion
(taking into account algorithmic limitations and ML
insights)?
Can we learn a good metric? Can we avoid 
learning
past biases
?
User control?
Not obvious if possible
Users need to be informed …
Metric – Who Decides?
 
NYC teachers
Simple explanations of complicated
classifiers?
Additional risk of gaming?
Books in parents home
Adversarial errors in deep learning
I Was Rejected Why?
 
 
 
False discovery — 
Just Getting Worse
 
“Trouble at the Lab” – The Economist
 
Showed how to use facilitate adaptive
investigations using differential privacy
Reusable holdout
Limit on how much we can squeeze data – for
privacy but also for the risk of overfitting
e
Accuracy and Privacy Align
 
Showed how to use facilitate adaptive
investigations using differential privacy
Reusable holdout
Limit on how much we can squeeze data – for
privacy but also for the risk of overfitting
Commit to the learning procedure: allows DP,
prevents p-hacking (seeing the data)
e
Accuracy and Privacy Align
Slide Note
Embed
Share

Exploring the impact of computational algorithms on society, particularly in terms of privacy, fairness, and policy considerations. Collaboration between computer science and other disciplines is highlighted, along with the importance of addressing issues such as data privacy, AI ethics, and digital footprints.

  • Computation
  • Society
  • Privacy
  • Fairness
  • Data privacy

Uploaded on Sep 29, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Computation and Society: The Case of Privacy and Fairness Omer Reingold Stanford CS, April 2017 Collaborators: Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Aaron Roth, Guy Rothblum, Salil Vadhan, Rich Zemel,

  2. CS and Other Disciplines First: tell me what you do again? (aka. I have that problem with my modem ) Then: We have tons of data, do you have any clever algorithm for us? Now: The power of the computational lens: Various natural and social phenomena can be viewed as computations. (Recent example: A Lizard With Scales That Behave Like a Computer Simulation NYT 4/12/17 reporting on a Nature article.) The age of collaboration!

  3. Big Data + ML Revolution

  4. Computation and Society With the centrality of algorithms and data, more and more policy questions revolve around computation: Here: tradeoffs between privacy, fairness, and economic utility. Other examples Censorship vs. free speech in social platforms, Filtering of news (the filtered bubble), Identifying fake news, Net neutrality, National security vs. individual freedoms (the San Bernardino cell phone case), Loss of jobs due to automatization, Fear of AI, CS can inform public debate but also extend the range of solutions.

  5. Sensitive Information Digital Footprint: browsing history, social network interactions, location, emails, pictures, levels of physical-activity, food consumption

  6. Privacy vs. Secrecy Private analysis\learning from a corpus of data What can Crypto do it for us? Encryption, Computation on Encrypted Data Secure Function Evaluation Secrecy rather than privacy: Privacy: what (is safe) to compute and share? Crypto (Secure Function Evaluation): how to compute? Invaluable when data curator untrusted (or distributed) Lots of good research questions, lots of good questions on the crypto side for another talk

  7. Notions of Privacy - Anonimization The outcome of a learning algorithm may leak sensitive data Traditionaly (with some legal protections): Anonymization, Deidentification, k-Anonymity The President's Council of Advisors on Science and Technology report to the president on big data and privacy: Anonymization is increasingly easily defeated by the very techniques that are being developed for many legitimate applications of big data. In general, as the size and diversity of available data grows, the likelihood of being able to re identify individuals (that is, re associate their records with their names) grows substantially. While anonymization may remain somewhat useful as an added safeguard in some situations, approaches that deem it, by itself, a sufficient safeguard need updating. Industry reaction: nah

  8. Notions of Privacy - DP The outcome of a learning algorithm may leak sensitive data Recent (decade-old) Notions: Differential Privacy (DP) [Dwork, McSherry, Nissim, Smith] Differential Privacy: (loosely) your increased harm from being in the corpus is small. One motivation: encourages opt-in Incredible impact on various disciplines as well as industry (Google, Apple, Startups, ). Lots of variants: Distributional DP, Pan Privacy, Lots of good questions, for another talk

  9. DP via Expectation of Privacy A study on the connection between smoking and cancer compromises the privacy of smokers (even with DP). No single definition need to incorporate social choice What is a reasonable expectation of privacy? Assume I only want to protect Alice Allow Alice to erase herself and a few others from the database DP provides similar protection simultaneously to everyone Any different protection for individual implies a different variant of DP A way to interface policy-makers and privacy experts

  10. Classification Advertising Health Care Taxation Taxation Financial aid

  11. Privacy and Classifiers Privacy preserving classifiers (observable outcomes of classification): Alice sees a particular ad Alice clicks on the ad What information is leaked about Alice? More challenging scenario, missing even a good definition

  12. Apply Classifier on a Coarse Noisy Version? Influenced by our definition of fairness (later) If the coarse version doesn t distinguishs possible omers, then sensitive properties may be protected ??

  13. Good Definition? Not as strong as crypto defs and even DP: information is leaked Protection: Blend me in with the (surrounding) crowd If your surrounding is normative may imply meaningful protection (and substantiate, currently unjustified, sense of security of users). Lots of possible failings (as with k-anonymity). As strong as the similarity metric

  14. Fairness in Classification Advertising Health Care Taxation Taxation Financial aid

  15. Concern: Discrimination Population includes minorities Ethnic, religious, medical, geographic Protected by law, policy, ethics A catalog of evils: redlining, reverse tokenism, self fulfilling prophecy, discrimination may be subtle!

  16. Credit Application (WSJ 8/4/10) User visits capitalone.com Capital One uses tracking information provided by the tracking network [x+1] to personalize offers Concern: Steering minorities into higher rates (illegal)*

  17. Suggested A CS Perspective An individual based notion of fairness fairness through awareness Versatile framework for obtaining and understanding fairness (including fair affairmative action) Fairness vs. Privacy: Privacy does not imply fairness but definitions and techniques useful

  18. Fairness through Blindness Ignore all irrelevant/protected attributes e.g., Facebook sex & interested in men/women Point of failure: Redundant encodings Machine learning: You don t need to see the label to be able to predict it E.g., redlining

  19. Group Fairness (Statistical Parity) Equalize minority S with general population T at the level of outcomes Pr[outcome o | S] = Pr[outcome o | T] Insufficient as a notion of fairness Has some merit, but can be abused Example: Advertise burger joint to carnivores in T and vegans in S. Example: Self fulfilling prophecy Example: Multiculturalism

  20. Lesson: Fairness is task-specific Fairness requires understanding of classification task Utility and fairness align! Cultural understanding of protected groups Awareness! Secrecy fairness

  21. Our Approach: Individual Fairness Treat similar individuals similarly Similar for the purpose of (fairness in) the classification task Similar distribution over outcomes

  22. Metric Who Decides? Assume task-specific similarity metric Extent to which two individuals are similar w.r.t. the classification task at hand Privacy and fairness are context specific and depends on society s norms How can we facilitate informed public discussion (taking into account algorithmic limitations and ML insights)? Can we learn a good metric? Can we avoid learning past biases? User control? Not obvious if possible Users need to be informed

  23. I Was Rejected Why? NYC teachers Simple explanations of complicated classifiers? Additional risk of gaming? Books in parents home Adversarial errors in deep learning

  24. False discovery Just Getting Worse Trouble at the Lab The Economist

  25. e Accuracy and Privacy Align Showed how to use facilitate adaptive investigations using differential privacy Reusable holdout Limit on how much we can squeeze data for privacy but also for the risk of overfitting

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#