Enhancing Mobile App Testing Strategies for Quality Assurance

Slide Note
Embed
Share

Innovative approaches for testing mobile apps are crucial due to the dynamic nature of the app market and increasing user expectations. This research discusses guided stochastic model-based GUI testing, challenges in testing mobile apps, a simple cookbook app for efficient recipe management, and existing research works on random testing/fuzzing. Various methods like model-based testing, evolutionary algorithms, and symbolic execution are explored to improve app quality and ensure a better user experience.


Uploaded on Sep 22, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Guided, Stochastic Model-Based GUI Testing of Android Apps Ting Su, Guozhu Meng, Yuting Chen, Ke Wu, Weiming Yang, Yao Yao, Geguang Pu, Yang Liu, Zhendong Su ESEC/FSE 2017 PADERBORN, GERMANY 2024/9/22 1

  2. Mobile Apps Mobile apps (Android, iOS, ) Ubiquitous 3 Million+ apps on Google Play, and 50K+ new apps each month Event-centric programs accept inputs via Graphical User Interface (GUI) Complex environment Interplay users, fragmentations (different OSes, SDKs, sensors), other apps Time-to-market pressure deliver apps as quickly as possible to compete with the counterparts They may not be thoroughly tested before releases only test usage scenarios that are believed to be important Inadequately test environment interplay low code coverage 2024/9/22 2

  3. A Simple Cookbook App -- Bites Send Ingredients via SMSs Interact with a third-party app to manage the shopping list Add recipes from SMSs or files downloaded from Browser Share recipes via SMSs or emails Fill in cooking methods Create recipes 2024/9/22 3 Bites: https://code.google.com/archive/p/bites-android/

  4. Mobile Apps Mobile apps (Android, iOS, ) Ubiquitous 3 Million+ apps on Google Play, and 50K+ new apps each month Event-centric programs accept inputs via Graphical User Interface (GUI) Complex environment interplay users, fragmentations (different OSes, SDKs, sensors), other apps However, ensuring app quality is challenging Time-to-market pressure, manual testing, only test usage scenarios that are believed to be important Inadequately consider the effects of environment interplay 2024/9/22 4

  5. Existing Research Work Random Testing/Fuzzing Google Monkey, Dynodroid[FSE 13] Symbolic Execution ACTeve[FSE 12], JPF-Android[SSEN 12] Evolutionary (Genetic) Algorithm Evodroid[FSE 14], Sapienz[ISSTA 16] Model-based Testing (MBT) GUIRipper[ASE 12], ORBIT[FASE 13], A3E[OOPSALA 13], SwiftHand[OOPSALA 13], PUMA[MobiSys 14], MobiGuitar[IEEE Software 15] Other Approaches MonkeyLab[MSR 15], CrashScope[ICST 16], TrimDroid[ICSE 16] 2024/9/22 5

  6. Challenges for MBT Path Explosion Problem Bites, 1027 LOC, its model has 21 states and 70 transitions. 6 one-event tests, 36 two-event tests, 567K three-event tests Generate/Execute exhaustive tests from model is impossible/ineffective No app models is available Existing reverse-engineering tools achieves fairly low coverage (only half of the coverage achieved by Monkey) Incomplete UI exploration Inadequate testing strategies Only consider UI-level events, neglect system-level events Only target at model-based coverage, neglect code coverage 2024/9/22 6

  7. Our Approach Stoat (Stochastic model App Tester) A guided, stochastic model-based GUI testing approach A fully-automatic tool for testing/fuzzing Android apps Given an app as input, 1. Model Construction Use dynamic/static analysis to learn a stochastic model 2. Test Generation Adopt Gibbs sampling to iteratively mutate/refine the model and guide testing towards fruitful regions Validate apps with various user/system-level events 2024/9/22 7

  8. Evaluation Results Subjects 93 open-source android apps from F-droid 1661 closed-source, most popular apps from Google Play Results Models produced by Stoat cover 17~31% more code than those by MobiGuitar and PUMA. Stoat detects 3X more unique crashes than Monkey and Sapienz. Stoat detects 2110 unique previously-unknown crashes from 1661 Google Play apps F-droid: https://f-droid.org/packages/ Google Play: https://play.google.com/store/apps?hl=en 2024/9/22 8

  9. Workflow of Our Approach 2024/9/22 9

  10. Test Suite Optimization (Sampling) M_0 M_i mutate M_i+1 mutate S0 S0 S0 p1 0.4 p1 p2 0.6 p2 p3 0.2 p3 S2 S2 s1 s1 S2 s1 p4 0.3 p4 p5 1.0 p5 p6 0.5 p6 S3 S3 S3 p7' 0.3 p7 p8' 0.7 p8 S4 S4 S4 Generate Tests Generate Tests Generate Tests Continue Test Suite T_i+1 Test Suite T_i Test Suite T_0 Execute Tests Execute Tests Execute Tests <? yes No Metric Values f_0 Metric Values f_i Metric Values f_i+1 Select M_i+1 as the model for next mutation Otherwise, discard M_i+1 with certain probability, and select M_i as the model for next mutation 2024/9/22 10

  11. Our Approach Model Construction Dynamic Analysis Static Analysis Gibbs Sampling 2024/9/22 11

  12. Our Approach Model Construction Dynamic Analysis Static Analysis Gibbs Sampling 2024/9/22 12

  13. Model Construction A stochastic, finite state machine M Defined as a 5-tuple M = (Q, , ,s0,F) Q: the set of app states, a state s Q : the set of input events, an event e : Q P(Q [0, 1]) the probabilistic transition function. s0 Q , the starting app state, F Q the set of final states 2024/9/22 13

  14. App State An app state s is abstracted as an app page represented as a widget hierarchy tree non-leaf nodes denote layout widgets (e.g., LinearLayout), leaf nodes executable widgets (e.g., Button); when a page s structure (and properties) changes, a new state is created 2024/9/22 14

  15. (1) An app state is abstracted as a view hierarchy tree. (2) An app state is differentiated from others via this tree. map FrameLayout(0) 0 TabHost(01) TextView(00) ( Bites ) 0 1 TabWidget(010) FrameLayout(011) 0 1 ListView(0111) TextView(0110) ( Tomatoes ) 1 0 0 2 1 RelativeLayout(0100) LinearLayout(011) 0 1 LinearLayout(01110) 0 2 1 TextView(01000) ( Recipes ) TextView(01001) ( Ingredients ) TextView(01002) ( Method ) 0 1 0 1 TextView(011100) ( Eggs ) CheckBox(011101) CheckBox(011111) TextView(011110) ( Tomatoes ) Inexecutable(leaf) Widgets Layout Widgets Executable (leaf) Widgets 2024/9/22 15

  16. Dynamic analysis Goal: Explore as many app behaviors as possible Case Study: 50 most popular apps with 10 categories from Google Play Three key observations to improve performance Frequency of Event Execution Type of Events (UI events navigation events) Number of Subsequent Unexercised Widgets 2024/9/22 16

  17. Transition Probability A probability value p denoting the selection weight of e in test generation. p is initially assigned the ratio of e s observed execution times over the total execution times of all events w.r.t. s (e s). Test Generation from the Model Start from the entry state, and select next event according to its probability value The higher the event probability value, the more likely the event will be selected. 2024/9/22 17

  18. Static Analysis Static analysis identifies those events that are missed by dynamic analysis Events that are registered on UI widgets (e.g., setOnLongClickListener) Events that are implemented by overriding class methods (e.g., onCreateOptionsMenu). Model Compaction Identify structurally-different pages as different states, and merges similar ones. Omit minor UI changes (e.g., text changes, UI property changes) 2024/9/22 18

  19. A Simple Cookbook App -- Bites 2024/9/22 19

  20. Model of Bites Produced by Stoat 2024/9/22 20

  21. Our Approach Model Construction Dynamic Analysis Static Analysis Gibbs Sampling Optimization problem Objective function 2024/9/22 21

  22. Gibbs Sampling Metropolis Hastings algorithm is one of Markov Chain Monte Carlo (MCMC) methods a class of algorithms to draw samples from a desired probability distribution p(x) , for which direct sampling is difficult Gibbs Sampling is a special case of the Metropolis-Hastings algorithm designed to draw samples when p(x) is a joint distribution of multiple random variables 2024/9/22 22

  23. Gibbs Sampling (Cont.) Sampling acceptance ratio Simplified, which q is a symmetric function In our Setting, we propose p(x) as constant value Test suite T generated from M Objective (optimization) function f stochastic model M 2024/9/22 exponential distribution exp 23

  24. Guided Test Generation Reduce guided testing as an optimization problem Let all transition probabilities be random variables, and draw samples by iteratively mutating them. Each stochastic model is a sample Allow samples to be drawn more often from the region with good stochastic models. 2024/9/22 24

  25. Objective Function By good , we mean favor test suites that can achieve high coverage and contain diverse event sequences trigger more program states and behaviors, and thus increase the chance of detecting bugs 2024/9/22 25

  26. Objective Function (Cont.) Three key metrics Code coverage (measures how thoroughly the app code is tested) Ting Su, Ke Wu, Weikai Miao, Geguang Pu, Jifeng He, Yuting Chen, and Zhendong Su. 2017. A Survey on Data-Flow Testing. ACM Comput. Surv. 2016. Hong Zhu, Patrick A. V. Hall, and John H. R. May. 1997. Software Unit Test Coverage and Adequacy. ACM Comput. Surv. 1997 Model coverage (how completely the app model is covered) Atif M. Memon, Mary Lou Sofa, and Martha E. Pollack. 2001. Coverage criteria for GUI testing. FSE 00 Test diversity (how diverse the event sequences are in the test suite) Borislav Nikolik. 2006. Test diversity. IST 06 Qing Xie and Atif M. Memon. 2006. Studying the Characteristics of a "Good" GUI Test Suite. ISSRE 06 Line Coverage for open-source apps, and Method Coverage for closed-source apps Event Coverage of the app model Convert event sequences into vectors, and compute their cosine similarity as diversity 2024/9/22 26

  27. Evaluation RQ1. Model Construction RQ2. Code Coverage RQ3. Fault Detection RQ4. Usability and Effectiveness 2024/9/22 27

  28. RQ1: Model Construction Comparison tools MobiGuitar-Systematic ( M-S ) Breadth-first Exploration of GUIs MobiGuitar-Random ( M-R ) Random Exploration of GUIs PUMA ( PU ) Sequentially explore GUIs, and stops exploring when all app states were visited. Stoat ( St ) Weighted UI Exploration + Static Analysis Subjects 93 open-source apps from F-droid (including 68 widely-used subjects from previous work, and 25 subjects randomly selected from F-droid). 2024/9/22 28

  29. Results of RQ1 Stoat can cover more app behaviors, and produce more complete models Stoat covers 31% and 17% more code, respectively, than M-S and M-R, and 23% more than PUMA. Results of model construction: Line coverage 2024/9/22 29

  30. Results of RQ1 Stoat achieves higher code coverage, but the models are still compact. Results of model construction: Model Complexity 2024/9/22 30

  31. RQ2: Code Coverage Comparison tools A3E (systematic exploration, A ) Monkey (random testing, M ) Sapienz (genetic algorithm, Sa ) Stoat (model-based testing, St ) Subjects 93 open-source apps from F-droid. 2024/9/22 31

  32. Results of RQ2 A3E, Monkey, Sapienz, and Stoat achieve 25%, 52%, 51%, and 60% line coverage, respectively. Results of code coverage grouped by app sizes. 2024/9/22 32

  33. RQ3: Fault Detection Results of pair-wise comparison of app crashes by Monkey, Sapienz, and Stoat. 2024/9/22 33

  34. Comparison with Monkey/Sapienz Finding 1: Stoat is more effective in UI exploration Sapienz and Stoat on average take 56 and 60 minutes to finish the initial phase, and require 45 and 23 minutes to reach peak coverage, respectively 2024/9/22 34

  35. Comparison with Monkey/Sapienz Finding 2: Stoat is more effective in detecting deep crashes Sapienz generates new tests by randomly crossovering and mutating sequences. It may produce many infeasible" ones, and is less likely to reach deep code. Stoat guides test generation from an app s behavior model (captures all possible compositions of events), which is more likely to generate meaningful and diverse sequences to reveal deep bugs 2024/9/22 35

  36. Comparison with Monkey/Sapienz Finding 3: System-level events can reveal more unexpected crashes An app mileage was crashed by IllegalArgumentException when Stoat launches its chart activities and sends them empty intents. It directly takes the null values to make database queries without any sanitization. 2024/9/22 36

  37. RQ4: Usability and Effectiveness Subjects 1661 most popular apps from Google Play Results Detected 2110 unique previously-unknown crashes from 691 apps 452 crashes from model construction, 1927 from Gibbs sampling, and 269 are detected in both phases. Distribution of found app crashes 2024/9/22 37

  38. RQ4 (Cont.) Results 43 developers have replied that they are investigating our bug reports. 20 of our reported crashes have been confirmed, and 8 have already been fixed. 2024/9/22 38

  39. Conclusion Goal thoroughly test the functionalities of an app, and validate the app s behavior by enforcing various user/system-level interactions Proposal: Stoat (Stochastic model App Tester) A Guided, Stochastic model-based GUI testing approach Model Construction (Weighted UI Exploration, Static Analysis) Guided Test Generation (Gibbs Sampling-guided Optimization) Stoat is available at https://tingsu.github.io/files/stoat.html 2024/9/22 39

Related


More Related Content