AI Technical Testing Best Practices Overview

Slide Note

Explore the current structure and key elements of Deliverable 7.2 AI Technical Test Specification presented during the E-meeting. Dive into best practices in AI testing, common test types, and principles of software testing. Understand the nuances of functional, non-functional, black-box, and maintenance testing in the context of artificial intelligence. Discover how to assess and test an AI assessment platform efficiently.

bhar Follow

Uploaded on Sep 20, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

FGAI4H-L-051 E-meeting, 19-21 May 2021 Source: Editor 7.2; WG-DAISAM Title: DEL7.2: AI Technical Test Specification - Progress Review Purpose: Discussion Contact: Auss Abbood, Robert Koch Institute, Berlin, Germany E-mail: abbooda@rki.de Abstract: This PPT contains the current structure of the Deliverable 7.2 AI technical test specification.

Deliverable: AI Technical Test Specification Auss Abbood Robert Koch-Institute, Berlin, Germany Online 20th May 2021

Motivation What are best practices in AI testing that TGs can adapt? Once we have an assessment platform running, how do we test it? AI Technical Test Specification - FG-AI4L 3

Background Contains SOTA in testing as described by books, International Software Testing Qualification Board, and ISO/IEC/IEEE standards A summary of commonly used terms and principles in software testing Already slightly filtered for our purpose AI Technical Test Specification - FG-AI4L 4

Background Test Type Explanation Tests what the system should do by specifying some precondition, running code and then compare the result of this execution with some postcondition. It is applied at each level of testing although in acceptance testing most implemented functions should already work. A measure of thoroughness of functional testing is coverage. Functional Testing Test how well a system performs. This includes testing of usability, performance efficiency, or security of a system and other characteristics found at ISO/IEC 25010. This test can be performed on all levels of test. Coverage for non- functional testing means how many of such characteristics were tested for. Non-functional Testing Tests the internal structure of a system or its implementation. Its is mostly tested in component and system testing. Coverage in this test measures the proportion of code components that have been tested as is part of component and White-box Testing Opposed to white-box testing, here we treat software as a black box with no knowledge on how software achieves its intended functionality. Merely the output of this form of testing is compared with the expected output or behaviour. The advantage of black-box testing is that no programming knowledge is required and therefore well equipped to detect biases that arise if only programmer write and test software. This test can be applied at all levels of testing. Tests changes of already delivered software for functional and non-functional quality characteristics. Black-box Testing Maintenance Testing Form of testing that does not execute code but manually examines the system, i.e. through reviews, linters, or formal proofs of the program. Static Testing Tests whether changes corrected (confirmation testing) or caused errors (regression testing). Change-related testing can be applied on all levels of testing. Change-related Testing This tests aims to make the software fail by proving unintended inputs which tests the robustness of the software. This can be applied on all levels of software testing. Destructive Testing AI Technical Test Specification - FG-AI4L 5

What does functional testing typically not cover? Testing will mostly be black-box testing. What is the difference between a black box software and black-box AI? Testing should appreciate connection between data, software, hardware, and AI MLFlow, Docker, Sacred, etc. can help device-specific properties of produced data BUT, not all forms of input can be tested (General Principles of Software Validation; Final Guidance for Industry and FDA Staff ). When is our testing done? AI Technical Test Specification - FG-AI4L 6

What does non-functional testing typically not cover? Many questions overlap with other deliverables (e.g., 5 or 7.3) Inadequate input does not necessarily break the AI. How do we test this? Testing should include tests for biases, data leakage, etc. Leaderboard probing Data aggregation or missing data Vulnerable metrics AI Technical Test Specification - FG-AI4L 7