Trusting Developers of AI Systems: When and How?

undefined

You should trust developers of AI systems when

they have demonstrated that they are trustworthy

•

The problem

•

Concrete mechanisms to address the problem

•

How the mechanisms fit together

•

A wide range of risks from AI

•

AI principles do not specify solutions

•

No easy way to verify compliance with principles

•

•

•

•

•

•

•

Rapid progress => few established best practices, little regulation

•

Norms (openness) and incentives (VC, data) push for low barriers to

entry and emphasis on rapid scaling

•

General purpose technology => a wide range of impacted domains

•

Risks are context sensitive, mitigations are context sensitive

•

Need to rely on experts for verification

•

But want to avoid “marking own work”

•

Mechanisms internal to the

development organization

•

Privacy-preserving machine

learning tools

•

Audit trails

•

Transparency tools

•

Red teams

•

Mechanisms external to the

development organization

•

Third-party audits

•

Bias and safety bounty

programs

•

Public incident databases

•

What the mechanisms do not address

•

Do not address developer incentives

•

Do not address earlier stages (problem identification)

•

Do not address later stages (deployment, use, termination)

•

Do not address broader ecosystem considerations

•

Healthy, equitable and diverse work environment

•

Protection of whistleblowers (including anti-retaliation policies)

•

Broader ethical, environmental and social responsibilities

•

Federated learning

•

Differential privacy

•

Encrypted computation

•

PyTorch’s Opacus and Tensorflow Privacy

•

Federated AI (FedAI), PySyft, Flower, and OpenFL

•

Document the development process

•

Document the system and its makeup

•

Document the system’s activities

•

Rolls-Royce Aletheia Framework

•

ABOUT ML, Model Cards, Datasheets for Datasets

•

Meta AI OPT-175 logbook

•

•

•

•

We need more interpretability, explainability and reproducibility tools

that specifically address risks (e.g. accidents, bias)

•

Deliberately try to find out what could go wrong – in order to address it

•

A range from “what might happen” to “let’s try and break this thing”

•

A natural extension of existing computer security, with some novel ML

challenges

•

Microsoft Impact Assessments

•

Machine Intelligence Garage Ethics Framework

•

OpenAI DALL-E 2 Red Teaming

•

rd

 party red teaming

•

Established in computer security but largely absent in AI

•

Aligns incentives of vendors and security researchers

•

Can be used more broadly than security, to identify and report socio-

technical harms

•

Algorithmic Justice League Bug Bounties report

•

Twitter’s algorithmic bias bounty challenge

•

Be seen to follow best practice…

•

… without sharing proprietary or private data

•

Proactive independent risk assessment

•

Reliance on standardised audit trails

•

Independent assessment of adherence to guidelines and regulations

•

Also need to trust the auditors

•

Z-inspection

•

Evidence about real harms is highly relevant to trust

•

also to R&D, regulation and policy

•

•

Can be a regulatory requirement

•

Can be facilitated through a third party that allows anonymous reporting

•

AI Incident Database

•

Thank you!

•

Check out the paper (DOI: 10.1126/science.abi7176)

and the full report (arXiv:2004.07213)

•

Miles Brundage, Shahar Avin, Jasmine Wang, Haydn Belfield, Gretchen Krueger, Gillian

Hadfield, Heidy Khlaaf, Jingying Yang, Helen Toner, Ruth Fong, Tegan Maharaj, Pang

Wei Koh, Sara Hooker, Jade Leung, Andrew Trask, Emma Bluemke, Jonathan

Lebensold, Cullen O'Keefe, Mark Koren, Théo Ryffel, JB Rubinovitz, Tamay Besiroglu,

Federica Carugati, Jack Clark, Peter Eckersley, Sarah de Haas, Maritza Johnson, Ben

Laurie, Alex Ingerman, Igor Krawczuk, Amanda Askell, Rosario Cammarota, Andrew

Lohn, David Krueger, Charlotte Stix, Peter Henderson, Logan Graham, Carina Prunkl,

Bianca Martin, Elizabeth Seger, Noa Zilberman, Seán Ó hÉigeartaigh, Frens Kroeger,

Girish Sastry, Rebecca Kagan, Adrian Weller, Brian Tse, Elizabeth Barnes, Allan Dafoe,

Paul Scharre, Ariel Herbert-Voss, Martijn Rasser, Shagun Sodhani, Carrick Flynn,

Thomas Krendl Gilbert, Lisa Dyer, Saif Khan, Yoshua Bengio, Markus Anderljung

Slide Note

Embed Share

Download

Trusting developers of AI systems should be based on their demonstrated trustworthiness. The problem lies in the wide range of risks associated with AI, where principles often lack concrete solutions, making compliance verification challenging. Addressing this requires assurance of safety, security, fairness, and accountability throughout the lifecycle of AI systems, alongside the need for expert verification while avoiding biases. Trust in AI developers necessitates a balance between progress, regulation, and contextual considerations.

siennarose Follow

Uploaded on Mar 23, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

When should you trust the developers of AI systems? ITU AI for Good Webinar Shahar Avin sa478@cam.ac.uk Centre for the Study of Existential Risk

You should trust developers of AI systems when they have demonstrated that they are trustworthy

Outline The problem Concrete mechanisms to address the problem How the mechanisms fit together

The problem A wide range of risks from AI AI principles do not specify solutions No easy way to verify compliance with principles

A wide range of risks from AI Accident Tesla autopilot crash Malicious Pornographic and political deepfakes Systemic Biased decisions in healthcare, finance, justice Now Early warning false alarm Automated, large scale spear phishing Ubiquitous surveillance coupled with centrally controlled autonomous lethal force Highly persuasive conversation bots Soon Misaligned power- seeking systems Tech company singularities Later Value erosion and value lock-in

AI principles do not specify solutions assurance of safety and security of AI systems throughout their life cycles, especially in safety-critical domains prevention of misuse protection of user privacy and source data ensuring that systems are fair and minimize bias, especially when such biases amplify existing discrimination and inequality ensuring the decisions made by AI systems, as well as any failures, are interpretable, explainable,reproducibleand allow challenge or remedy identifying individuals or institutions who can be held accountable for the behaviors of AI systems

No easy way to verify compliance with principles Rapid progress => few established best practices, little regulation Norms (openness) and incentives (VC, data) push for low barriers to entry and emphasis on rapid scaling General purpose technology => a wide range of impacted domains Risks are context sensitive, mitigations are context sensitive Need to rely on experts for verification But want to avoid marking own work

Concrete mechanisms to address the problem What the mechanisms do not address Mechanisms internal to the development organization Mechanisms external to the development organization Privacy-preserving machine learning tools Third-party audits Bias and safety bounty programs Audit trails Transparency tools Public incident databases Red teams

What the mechanisms do not address Do not address developer incentives Do not address earlier stages (problem identification) Do not address later stages (deployment, use, termination) Do not address broader ecosystem considerations Healthy, equitable and diverse work environment Protection of whistleblowers (including anti-retaliation policies) Broader ethical, environmental and social responsibilities

Privacy-preserving Machine Learning tools Federated learning Differential privacy Encrypted computation PyTorch s Opacus and Tensorflow Privacy Federated AI (FedAI), PySyft, Flower, and OpenFL

Audit trails Document the development process Document the system and its makeup Document the system s activities Rolls-Royce Aletheia Framework ABOUT ML, Model Cards, Datasheets for Datasets Meta AI OPT-175 logbook

Transparency tools Methods should provide sufficient insight for the end-user to understand how a model produced an output; Interpretable explanations should be faithful to the model, accurately reflecting its underlying behaviour Techniques should not be misused by AI developers to provide misleading explanations for system behaviour. We need more interpretability, explainability and reproducibility tools that specifically address risks (e.g. accidents, bias)

Red teams Deliberately try to find out what could go wrong in order to address it A range from what might happen to let s try and break this thing A natural extension of existing computer security, with some novel ML challenges Microsoft Impact Assessments Machine Intelligence Garage Ethics Framework OpenAI DALL-E 2 Red Teaming 3rd party red teaming

Bias and safety bounty programs Established in computer security but largely absent in AI Aligns incentives of vendors and security researchers Can be used more broadly than security, to identify and report socio- technical harms Algorithmic Justice League Bug Bounties report Twitter s algorithmic bias bounty challenge

Third-party audits Be seen to follow best practice without sharing proprietary or private data Proactive independent risk assessment Reliance on standardised audit trails Independent assessment of adherence to guidelines and regulations Also need to trust the auditors Z-inspection

Public incident databases Evidence about real harms is highly relevant to trust also to R&D, regulation and policy Developers are disincentivized from sharing incidents in their own systems, especially if they cannot trust competitors to share similar incidents Can be a regulatory requirement Can be facilitated through a third party that allows anonymous reporting AI Incident Database

How the mechanisms fit together

Thank you! Check out the paper (DOI: 10.1126/science.abi7176) and the full report (arXiv:2004.07213) Miles Brundage, Shahar Avin, Jasmine Wang, Haydn Belfield, Gretchen Krueger, Gillian Hadfield, Heidy Khlaaf, Jingying Yang, Helen Toner, Ruth Fong, Tegan Maharaj, Pang Wei Koh, Sara Hooker, Jade Leung, Andrew Trask, Emma Bluemke, Jonathan Lebensold, Cullen O'Keefe, Mark Koren, Th o Ryffel, JB Rubinovitz, Tamay Besiroglu, Federica Carugati, Jack Clark, Peter Eckersley, Sarah de Haas, Maritza Johnson, Ben Laurie, Alex Ingerman, Igor Krawczuk, Amanda Askell, Rosario Cammarota, Andrew Lohn, David Krueger, Charlotte Stix, Peter Henderson, Logan Graham, Carina Prunkl, Bianca Martin, Elizabeth Seger, Noa Zilberman, Se n h igeartaigh, Frens Kroeger, Girish Sastry, Rebecca Kagan, Adrian Weller, Brian Tse, Elizabeth Barnes, Allan Dafoe, Paul Scharre, Ariel Herbert-Voss, Martijn Rasser, Shagun Sodhani, Carrick Flynn, Thomas Krendl Gilbert, Lisa Dyer, Saif Khan, Yoshua Bengio, Markus Anderljung

Trusting Developers of AI Systems: When and How?

Download Presentation

Presentation Transcript

Related

More Related Content