On-Orbit Anomaly Research at NASA: Causes and Solutions
On-Orbit Anomaly Research (OOAR) at NASA's IV&V Facility involves studying mishaps related to space missions, identifying anomalies, and improving IV&V processes. The research delves into the causes of anomalies, such as operating system faults, and proximate causes like software deficiencies. Detailed analyses show how anomalies like 64-bit floating point errors can occur and be traced back to incorrect implementations within commercial off-the-shelf (COTS) operating systems.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Joel Abraham* On-Orbit Anomaly Research NASA IV&V Facility Fairmont, WV 4thInternational Workshop on Independent Verification & Validation of Software September 11 - 13, 2012 Morgantown, WV * Intern - West Virginia Wesleyan College; Buckhannon, WV
Agenda On-Orbit Anomaly Research Description of Anomaly Causes of Anomaly Root Cause: Operating System Fault Proximate Cause: FSW Bounds-Checking Deficiency IV&V Observations 2
On-Orbit Anomaly Research (OOAR) at NASA IV&V Facility Study anomalies and mishaps associated with NASA space missions Assist with on-going NASA IV&V analysis of heritage software with past faulty history Help improve NASA IV&V processes 3
Sequential Causes of Anomaly (contd) In summary: ANOMALY 64-bit floating-pt. Vx too big to convert to a 32-bit integer Proximate Cause x component of velocity, Vx, became too large cos( ) and x component of position flipped signs fmod( ,2 ) returned incorrect value, off by Incorrect Implementation of fmod in the (COTS) OS Root Cause 7
Root Cause: Operating System Fault fmod Overview fmod (a , b) returning remainder of a b Example: fmod (16 , 3) = 1 16 = (3 x 5) + 1 Human implementation: Calculate the remainder r by finding the largest integer n in 16 = (3 x n) + r such that: 0 r = 16 (3 x n) < 3 8
Root Cause: Operating System Fault fmod Overview 9
Root Cause: Operating System Fault OS Comparison of Numbers in fmod 10
Root Cause: Operating System Fault OS Comparison of Numbers in fmod 11
Root Cause: Operating System Fault OS Comparison of Numbers in fmod For certain values of slightly larger than odd multiples of 2 , fmod ( , 2 ) failed at the last steps: 12
Proximate Cause: FSW Bounds- Checking Deficiency Requirement: Transform (cast) double-precision floating-point 64- bit value into int16 16-bit signed integer FSW implementation: Conversion performed in two steps: 64-bit floating-point to 32-bit signed integer 32-bit signed integer to 16-bit signed integer 64-bit value: -14900351502.5752239 Requiring a 35-bit signed integer to properly convert 32-bit signed integer not able to hold values less than -2147483648 Failure to transform into 32-bit signed integer 13
Proximate Cause: FSW Bounds- Checking Deficiency No bounds-checking, of values to be typecast, performed by FSW Anomaly could have been prevented if 64-bit floating-point value checked 14
OOAR Observations To prevent similar anomalies in the future: IV&V verification of bounds checking in flight software Apply IV&V to code external to flight software 15
OOAR Observations (contd) Apply IV&V to Code External to Flight Software (FSW) Ex: OS running FSW, an integral element of S/C operation May be the weak link if not analyzed along with FSW Issues with FSW may originate in external code, e.g., fmod bug in OS New software interacting with FSW may warrant IV&V analysis Critical space missions may justify analyzing external code Past problems (e.g., a specific OS) may indicate a need for further assessment. 16
OOAR Observations (contd) Apply IV&V to Code External to FSW (cont d): Challenges Full-cycle analysis may not be possible, e.g., OS already developed Comprehensive testing may be only option Testing cannot catch all bugs Ex.: fmod bug becoming active on the S/C once every two years at two instances a few seconds apart, which may or may not become an issue depending on whether a SW application happens to consume the bug at those instances Bugs may be too subtle to be detected easily (cf. fmod bug) Limited IV&V resources to do both FSW and external software analysis May not be cost-effective to analyze external software with long, successful track records 17
OOAR Observations (contd) IV&V Verification of Bounds Checking in FSW Examples of bounds checking: Array index checking Checking for division by zero Screening for taking the square root of a negative number Bounds checking effective as fault protection in the code Bounds checking of typecasting would have prevented the fmod bug from leading to an anomaly Some compilers may not provide automatic run-time bounds checking. Verification of bounds checking, e.g., array index checking, readily performed by static code analysis tools Relatively easy to manually inspect the code to verify bounds-checking of pre-identified operations, e.g., typecasting, square root, etc. 18
OOAR Observations (contd) IV&V Verification of Bounds Checking in FSW (cont d): Challenges Some bounds checking requires dynamic code analysis tools Logistics of dynamic code analysis complicated Correct configuration of code analysis tools may also require significant time investment 19
Questions? Thank you for your participation! Contact Info: Joel Abraham [Joel.A.Abraham@gmail.com] Joseph Painter [Joseph.D.Painter@ivv.nasa.gov] Koorosh Mirfakhraie [Koorosh.Mirfakhraie@ivv.nasa.gov] Ken Costello [Kenneth.A.Costello@nasa.gov] Sam Cilento [Salvatore.J.Cilento@ivv.nasa.gov] 20