The Intersection of Computation and Society: Privacy and Fairness Challenges

Computation and Society:

The Case of Privacy and

Fairness

Omer Reingold

Stanford CS, April 2017

Collaborators: Cynthia Dwork, Vitaly Feldman,

Moritz Hardt, Aaron Roth, Guy Rothblum, Salil

Vadhan, Rich Zemel, …

CS and Other Disciplines

•

First:

tell me what you do again?

(aka. “I have that

problem with my modem …”)

•

Then: We have tons of data, do you have

any clever

algorithm for us

•

Now: The power of the

computational lens

: Various

natural and social phenomena can be viewed as

computations.

–

(Recent example: “A Lizard With Scales That Behave Like

a Computer Simulation” NYT 4/12/17 reporting on a

Nature article.)

–

The age of collaboration!

Big Data + ML Revolution

Computation and Society

With the centrality of algorithms and data, more and more

policy

questions revolve around computation

•

Here: tradeoffs between

privacy

fairness

, and economic

utility

. Other examples

–

Censorship vs. free speech in social platforms,

–

Filtering of news (the filtered bubble),

–

Identifying fake news,

–

Net neutrality,

–

National security vs. individual freedoms (the San Bernardino cell

phone case),

–

Loss of jobs due to automatization,

–

Fear of AI, …

CS can

inform

 public

debate

 but also

extend the range of

solutions

Sensitive Information

Digital Footprint

: browsing history, social

network interactions, location, emails, pictures,

levels of physical-activity, food consumption

Privacy vs. Secrecy

Private analysis\learning

 from a corpus of data

What can

Crypto

 do it for us?

–

Encryption, Computation on Encrypted Data

–

Secure Function Evaluation

•

Secrecy rather than privacy

–

Privacy:

what

(is safe) to compute and share?

–

Crypto (Secure Function Evaluation):

how

 to compute?

•

Invaluable when data curator untrusted

(or

distributed)

Lots of good research questions, lots of good

questions on the crypto side for another talk …

Notions of Privacy - Anonimization

The outcome of a learning algorithm may leak sensitive

data

•

“Traditionaly” (with some legal protections):

Anonymization, Deidentification, k-Anonymity…

The President's Council of Advisors on Science and Technology report to the

president on big data and privacy:

“

Anonymization is increasingly easily defeated

by the very techniques that are

being developed for many legitimate applications of big data.  In general, as the

size and diversity of available data grows, the likelihood of being able to

re‐identify individuals (that is, re associate their records with their names) grows

substantially. While anonymization may remain somewhat useful as an added

safeguard in some situations,

approaches that deem it, by itself, a sufficient

safeguard need updating

.”

Industry reaction: nah

Notions of Privacy - DP

The outcome of a learning algorithm may leak

sensitive data

•

Recent (decade-old) Notions:

–

Differential Privacy (DP)

[Dwork, McSherry, Nissim, Smith]

–

Incredible impact on various disciplines as well as

industry (Google, Apple, Startups, …).

–

Lots of variants: Distributional DP, Pan Privacy, …

–

Lots of good questions, for another talk …

Differential Privacy

: (loosely) your increased harm from being in the corpus is

small.

One motivation: encourages opt-in

DP via Expectation of Privacy

A study on the connection between smoking and cancer

compromises the privacy of smokers (even with DP).



 No single definition – need to incorporate social

choice

•

What is a

reasonable expectation of privacy

–

Assume I only want to protect Alice

–

Allow Alice to erase herself and a few others from the

database

–

DP

 provides similar protection

simultaneously to everyone

•

Any different “protection for individual” implies a

different variant of DP

•

A way to interface policy-makers and privacy experts

Classification

Privacy and Classifiers

Privacy preserving classifiers

 (observable outcomes of

classification):

•

Alice sees a particular ad

•

Alice clicks on the ad

•

What information is leaked about Alice?

More challenging scenario, missing even a good definition

Apply Classifier on a Coarse

Noisy Version?

Influenced by our definition of

fairness

(later)

If the coarse version doesn’t

distinguishs possible

omers

, then sensitive properties may be protected

??

Good Definition?

Not as strong as crypto defs and even DP:

information

is leaked

Protection:

Blend me in with the (surrounding) crowd

•

If your

surrounding is “normative”

may imply

meaningful protection (and substantiate, currently

unjustified, sense of security of users).

•

Lots of possible failings (as with k-anonymity).

–

As strong as the similarity metric

Fairness in Classification

Concern: Discrimination

•

Population includes minorities

–

Ethnic, religious, medical, geographic

–

Protected by law, policy, ethics

•

A catalog of evils: redlining, reverse tokenism,

self fulfilling prophecy, …

discrimination may

be subtle!

Credit Application (WSJ 8/4/10)

User visits

capitalone.com

Capital One uses tracking information provided by the

tracking network [x+1] to personalize offers

Concern:

Steering

 minorities into higher rates (illegal)

Suggested A CS Perspective

•

An individual based notion of fairness –

fairness

through awareness

•

Versatile

framework

 for obtaining and

understanding fairness (including fair

affairmative action)

•

Fairness vs.

Privacy:

Privacy does not imply fairness but definitions and

techniques useful

Fairness through Blindness

•

Ignore all irrelevant/protected attributes

–

e.g., Facebook “sex” & “interested in men/women”

•

Point of

failure

: Redundant encodings

–

Machine learning: You don’t need to see the label to

be able to predict it

–

E.g., redlining

Group Fairness (Statistical Parity)

•

Equalize minority

 with general population

at the level of outcomes

–

Pr[outcome o |

] = Pr[outcome o |

•

Insufficient

as a notion of fairness

–

Has some merit, but

can be abused

–

Example:

 Advertise burger joint to carnivores in

and vegans in

S.

–

Example:

 Self fulfilling prophecy

–

Example:

 Multiculturalism …

Lesson: Fairness is

task-specific

•

Fairness requires understanding of

classification task

–

Utility and fairness align

•

Cultural understanding of protected

groups

–

Awareness!

–

Secrecy



 fairness

Our Approach: Individual Fairness

Treat

similar

 individuals

similarly

•

Assume

task-specific similarity metric

–

Extent to which two individuals are similar w.r.t. the

classification task at hand

•

Privacy and fairness are context specific and depends

on

society’s norms

•

How can we facilitate informed public discussion

(taking into account algorithmic limitations and ML

insights)?

•

Can we learn a good metric? Can we avoid

learning

past biases

•

User control?

–

Not obvious if possible

–

Users need to be informed …

Metric – Who Decides?

•

NYC teachers

•

Simple explanations of complicated

classifiers?

•

Additional risk of gaming?

–

Books in parents home

–

Adversarial errors in deep learning

I Was Rejected Why?

False discovery —

Just Getting Worse

“Trouble at the Lab” – The Economist

•

Showed how to use facilitate adaptive

investigations using differential privacy

–

Reusable holdout

•

Limit on how much we can squeeze data – for

privacy but also for the risk of overfitting

Accuracy and Privacy Align

•

Showed how to use facilitate adaptive

investigations using differential privacy

–

Reusable holdout

•

Limit on how much we can squeeze data – for

privacy but also for the risk of overfitting

•

Commit to the learning procedure: allows DP,

prevents p-hacking (seeing the data)

Accuracy and Privacy Align

Slide Note

Embed Share

Download

Exploring the impact of computational algorithms on society, particularly in terms of privacy, fairness, and policy considerations. Collaboration between computer science and other disciplines is highlighted, along with the importance of addressing issues such as data privacy, AI ethics, and digital footprints.

hell Follow

Uploaded on Sep 29, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Computation and Society: The Case of Privacy and Fairness Omer Reingold Stanford CS, April 2017 Collaborators: Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Aaron Roth, Guy Rothblum, Salil Vadhan, Rich Zemel,

CS and Other Disciplines First: tell me what you do again? (aka. I have that problem with my modem ) Then: We have tons of data, do you have any clever algorithm for us? Now: The power of the computational lens: Various natural and social phenomena can be viewed as computations. (Recent example: A Lizard With Scales That Behave Like a Computer Simulation NYT 4/12/17 reporting on a Nature article.) The age of collaboration!

Big Data + ML Revolution

Computation and Society With the centrality of algorithms and data, more and more policy questions revolve around computation: Here: tradeoffs between privacy, fairness, and economic utility. Other examples Censorship vs. free speech in social platforms, Filtering of news (the filtered bubble), Identifying fake news, Net neutrality, National security vs. individual freedoms (the San Bernardino cell phone case), Loss of jobs due to automatization, Fear of AI, CS can inform public debate but also extend the range of solutions.

Sensitive Information Digital Footprint: browsing history, social network interactions, location, emails, pictures, levels of physical-activity, food consumption

Privacy vs. Secrecy Private analysis\learning from a corpus of data What can Crypto do it for us? Encryption, Computation on Encrypted Data Secure Function Evaluation Secrecy rather than privacy: Privacy: what (is safe) to compute and share? Crypto (Secure Function Evaluation): how to compute? Invaluable when data curator untrusted (or distributed) Lots of good research questions, lots of good questions on the crypto side for another talk

Notions of Privacy - Anonimization The outcome of a learning algorithm may leak sensitive data Traditionaly (with some legal protections): Anonymization, Deidentification, k-Anonymity The President's Council of Advisors on Science and Technology report to the president on big data and privacy: Anonymization is increasingly easily defeated by the very techniques that are being developed for many legitimate applications of big data. In general, as the size and diversity of available data grows, the likelihood of being able to re identify individuals (that is, re associate their records with their names) grows substantially. While anonymization may remain somewhat useful as an added safeguard in some situations, approaches that deem it, by itself, a sufficient safeguard need updating. Industry reaction: nah

Notions of Privacy - DP The outcome of a learning algorithm may leak sensitive data Recent (decade-old) Notions: Differential Privacy (DP) [Dwork, McSherry, Nissim, Smith] Differential Privacy: (loosely) your increased harm from being in the corpus is small. One motivation: encourages opt-in Incredible impact on various disciplines as well as industry (Google, Apple, Startups, ). Lots of variants: Distributional DP, Pan Privacy, Lots of good questions, for another talk

DP via Expectation of Privacy A study on the connection between smoking and cancer compromises the privacy of smokers (even with DP). No single definition need to incorporate social choice What is a reasonable expectation of privacy? Assume I only want to protect Alice Allow Alice to erase herself and a few others from the database DP provides similar protection simultaneously to everyone Any different protection for individual implies a different variant of DP A way to interface policy-makers and privacy experts

Classification Advertising Health Care Taxation Taxation Financial aid

Privacy and Classifiers Privacy preserving classifiers (observable outcomes of classification): Alice sees a particular ad Alice clicks on the ad What information is leaked about Alice? More challenging scenario, missing even a good definition

Apply Classifier on a Coarse Noisy Version? Influenced by our definition of fairness (later) If the coarse version doesn t distinguishs possible omers, then sensitive properties may be protected ??

Good Definition? Not as strong as crypto defs and even DP: information is leaked Protection: Blend me in with the (surrounding) crowd If your surrounding is normative may imply meaningful protection (and substantiate, currently unjustified, sense of security of users). Lots of possible failings (as with k-anonymity). As strong as the similarity metric

Fairness in Classification Advertising Health Care Taxation Taxation Financial aid

Concern: Discrimination Population includes minorities Ethnic, religious, medical, geographic Protected by law, policy, ethics A catalog of evils: redlining, reverse tokenism, self fulfilling prophecy, discrimination may be subtle!

Credit Application (WSJ 8/4/10) User visits capitalone.com Capital One uses tracking information provided by the tracking network [x+1] to personalize offers Concern: Steering minorities into higher rates (illegal)*

Suggested A CS Perspective An individual based notion of fairness fairness through awareness Versatile framework for obtaining and understanding fairness (including fair affairmative action) Fairness vs. Privacy: Privacy does not imply fairness but definitions and techniques useful

Fairness through Blindness Ignore all irrelevant/protected attributes e.g., Facebook sex & interested in men/women Point of failure: Redundant encodings Machine learning: You don t need to see the label to be able to predict it E.g., redlining

Group Fairness (Statistical Parity) Equalize minority S with general population T at the level of outcomes Pr[outcome o | S] = Pr[outcome o | T] Insufficient as a notion of fairness Has some merit, but can be abused Example: Advertise burger joint to carnivores in T and vegans in S. Example: Self fulfilling prophecy Example: Multiculturalism

Lesson: Fairness is task-specific Fairness requires understanding of classification task Utility and fairness align! Cultural understanding of protected groups Awareness! Secrecy fairness

Our Approach: Individual Fairness Treat similar individuals similarly Similar for the purpose of (fairness in) the classification task Similar distribution over outcomes

Metric Who Decides? Assume task-specific similarity metric Extent to which two individuals are similar w.r.t. the classification task at hand Privacy and fairness are context specific and depends on society s norms How can we facilitate informed public discussion (taking into account algorithmic limitations and ML insights)? Can we learn a good metric? Can we avoid learning past biases? User control? Not obvious if possible Users need to be informed

I Was Rejected Why? NYC teachers Simple explanations of complicated classifiers? Additional risk of gaming? Books in parents home Adversarial errors in deep learning

False discovery Just Getting Worse Trouble at the Lab The Economist

e Accuracy and Privacy Align Showed how to use facilitate adaptive investigations using differential privacy Reusable holdout Limit on how much we can squeeze data for privacy but also for the risk of overfitting

The Intersection of Computation and Society: Privacy and Fairness Challenges

Download Presentation

Presentation Transcript

Related

More Related Content