Trusting Developers of AI Systems: When and How?

Slide Note
Embed
Share

Trusting developers of AI systems should be based on their demonstrated trustworthiness. The problem lies in the wide range of risks associated with AI, where principles often lack concrete solutions, making compliance verification challenging. Addressing this requires assurance of safety, security, fairness, and accountability throughout the lifecycle of AI systems, alongside the need for expert verification while avoiding biases. Trust in AI developers necessitates a balance between progress, regulation, and contextual considerations.


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.



Uploaded on Mar 23, 2024 | 0 Views


Presentation Transcript


  1. When should you trust the developers of AI systems? ITU AI for Good Webinar Shahar Avin sa478@cam.ac.uk Centre for the Study of Existential Risk

  2. You should trust developers of AI systems when they have demonstrated that they are trustworthy

  3. Outline The problem Concrete mechanisms to address the problem How the mechanisms fit together

  4. The problem A wide range of risks from AI AI principles do not specify solutions No easy way to verify compliance with principles

  5. A wide range of risks from AI Accident Tesla autopilot crash Malicious Pornographic and political deepfakes Systemic Biased decisions in healthcare, finance, justice Now Early warning false alarm Automated, large scale spear phishing Ubiquitous surveillance coupled with centrally controlled autonomous lethal force Highly persuasive conversation bots Soon Misaligned power- seeking systems Tech company singularities Later Value erosion and value lock-in

  6. AI principles do not specify solutions assurance of safety and security of AI systems throughout their life cycles, especially in safety-critical domains prevention of misuse protection of user privacy and source data ensuring that systems are fair and minimize bias, especially when such biases amplify existing discrimination and inequality ensuring the decisions made by AI systems, as well as any failures, are interpretable, explainable,reproducibleand allow challenge or remedy identifying individuals or institutions who can be held accountable for the behaviors of AI systems

  7. No easy way to verify compliance with principles Rapid progress => few established best practices, little regulation Norms (openness) and incentives (VC, data) push for low barriers to entry and emphasis on rapid scaling General purpose technology => a wide range of impacted domains Risks are context sensitive, mitigations are context sensitive Need to rely on experts for verification But want to avoid marking own work

  8. Concrete mechanisms to address the problem What the mechanisms do not address Mechanisms internal to the development organization Mechanisms external to the development organization Privacy-preserving machine learning tools Third-party audits Bias and safety bounty programs Audit trails Transparency tools Public incident databases Red teams

  9. What the mechanisms do not address Do not address developer incentives Do not address earlier stages (problem identification) Do not address later stages (deployment, use, termination) Do not address broader ecosystem considerations Healthy, equitable and diverse work environment Protection of whistleblowers (including anti-retaliation policies) Broader ethical, environmental and social responsibilities

  10. Privacy-preserving Machine Learning tools Federated learning Differential privacy Encrypted computation PyTorch s Opacus and Tensorflow Privacy Federated AI (FedAI), PySyft, Flower, and OpenFL

  11. Audit trails Document the development process Document the system and its makeup Document the system s activities Rolls-Royce Aletheia Framework ABOUT ML, Model Cards, Datasheets for Datasets Meta AI OPT-175 logbook

  12. Transparency tools Methods should provide sufficient insight for the end-user to understand how a model produced an output; Interpretable explanations should be faithful to the model, accurately reflecting its underlying behaviour Techniques should not be misused by AI developers to provide misleading explanations for system behaviour. We need more interpretability, explainability and reproducibility tools that specifically address risks (e.g. accidents, bias)

  13. Red teams Deliberately try to find out what could go wrong in order to address it A range from what might happen to let s try and break this thing A natural extension of existing computer security, with some novel ML challenges Microsoft Impact Assessments Machine Intelligence Garage Ethics Framework OpenAI DALL-E 2 Red Teaming 3rd party red teaming

  14. Bias and safety bounty programs Established in computer security but largely absent in AI Aligns incentives of vendors and security researchers Can be used more broadly than security, to identify and report socio- technical harms Algorithmic Justice League Bug Bounties report Twitter s algorithmic bias bounty challenge

  15. Third-party audits Be seen to follow best practice without sharing proprietary or private data Proactive independent risk assessment Reliance on standardised audit trails Independent assessment of adherence to guidelines and regulations Also need to trust the auditors Z-inspection

  16. Public incident databases Evidence about real harms is highly relevant to trust also to R&D, regulation and policy Developers are disincentivized from sharing incidents in their own systems, especially if they cannot trust competitors to share similar incidents Can be a regulatory requirement Can be facilitated through a third party that allows anonymous reporting AI Incident Database

  17. How the mechanisms fit together

  18. Thank you! Check out the paper (DOI: 10.1126/science.abi7176) and the full report (arXiv:2004.07213) Miles Brundage, Shahar Avin, Jasmine Wang, Haydn Belfield, Gretchen Krueger, Gillian Hadfield, Heidy Khlaaf, Jingying Yang, Helen Toner, Ruth Fong, Tegan Maharaj, Pang Wei Koh, Sara Hooker, Jade Leung, Andrew Trask, Emma Bluemke, Jonathan Lebensold, Cullen O'Keefe, Mark Koren, Th o Ryffel, JB Rubinovitz, Tamay Besiroglu, Federica Carugati, Jack Clark, Peter Eckersley, Sarah de Haas, Maritza Johnson, Ben Laurie, Alex Ingerman, Igor Krawczuk, Amanda Askell, Rosario Cammarota, Andrew Lohn, David Krueger, Charlotte Stix, Peter Henderson, Logan Graham, Carina Prunkl, Bianca Martin, Elizabeth Seger, Noa Zilberman, Se n h igeartaigh, Frens Kroeger, Girish Sastry, Rebecca Kagan, Adrian Weller, Brian Tse, Elizabeth Barnes, Allan Dafoe, Paul Scharre, Ariel Herbert-Voss, Martijn Rasser, Shagun Sodhani, Carrick Flynn, Thomas Krendl Gilbert, Lisa Dyer, Saif Khan, Yoshua Bengio, Markus Anderljung

Related