Real-World Concurrency Bugs and Detection Strategies
Explore the complexities of real-world concurrency bugs through a study of 105 bugs from major open-source programs. Learn about bug patterns, manifestation conditions, diagnosing strategies, and fixing methods to improve bug detection and avoidance. Gain insights from methodologies evaluating applications like MySQL, Apache, Mozilla, and OpenOffice, revealing bug sources, classifications, and patterns like atomicity and order violations.
Uploaded on Sep 06, 2024 | 0 Views
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Microsoft Research Faculty Summit 2008 Faculty Summit 2008 Microsoft Research
Learning from Mistakes Real World Concurrency Bug Characteristics Yuanyuan(YY) Zhou Associate Professor University of Illinois, Urbana-Champaign
How to Improve the State-of-Art? Concurrency bug detection What types of bugs are un-solved? How bugs are diagnosed in practice? Concurrent program testing What are the manifestation conditions for concurrency bugs? Concurrent program language design How many mistakes can be avoided What other support do we need? All can benefit from a closer look at real world concurrency bugs need to know need to know
Learning from Mistakes 105 real-world concurrency bugs from 4 large open source programs Study from 4 dimensions Bug patterns Manifestation condition Diagnosing strategy Fixing methods Details in our ASPLOS 08 paper
Methodology: Evaluated Applications MySQL Apache Mozilla OpenOffice Server Server Client GUI Software Type Language C++/C Mainly C C++ C++ LOC (M line) 2 0.3 4 6 Bug DB history 6 years 7 years 10 years 8 years 5
Methodology: Bug Sources MySQL Apache Mozilla OpenOffice Total Non-deadlock 14 13 41 6 74 Deadlock 9 4 16 2 31 Limitations No scientific computing applications No JAVA programs Never-enough bug samples
Non-Deadlock Bug Pattern Classified based on root causes Categories Atomicity violation The desired atomicity of certain code region is violated Order violation The desired order between two (sets of) accesses is flipped Others Pattern Thread 1 Thread 2 X Thread 1 Thread 2 X
Non-Deadlock Bug Pattern Characteristics Implications 50 We should focus on atomicity violation and order violation 40 OpenOffice Mozilla Apache MySQL 30 Bug detection tools for order violation bugs are desired 20 10 0 Atomicity Order Other *There are 3-bug overlap between Atomicity and Order
An Order-Related Bug Example Note that order violations can be fixed by adding locks to ensure atomicity with the previous operation to ensure order. But the root cause is the incorrect assumption about execution order.
Another Example OK Woops!
How to Trigger a Bug? Manifestation Bug manifestation condition A specific execution order among a smallest set of memory accesses The bug is guaranteed to manifest, as long as the condition is satisfied How many threads are involved? How many variables are involved? How many accesses are involved? 11
Single Variable vs. Multiple Variable MySQL Apache Mozilla OpenOffice Findings 60 49 (66%) 40 # of Bugs 25 (34%) 20 0 1 Variable > 1 Variables Implications Single variables are more common The widely-used simplification is reasonable Multi-variable concurrency bugs are non-negligible Techniques to detect multi-variable concurrency bugs are needed 12
Multi-Variable Concurrency Bug Example Thread 1 Thread 2 js_FlushPropertyCache ( ) { js_PropertyCacheFill ( ) { memset ( cache table, 0, SIZE); cache table[indx] = obj; cache } empty = TRUE; cache } empty = FALSE; Control the order among accesses to any one variable can not guarantee the bug manifestation 13
Number of Accesses/Operations Involved Deadlock bugs Non-deadlock bugs MySQL Apache Mozilla OpenOffice MySQL Apache Mozilla OpenOffice 35 25 Double lock 30 20 25 # of Bugs 15 20 7 (9%) 15 10 10 1 (3%) 5 5 0 0 1 acc. 2 acc. 3 acc. 4 acc. >4 acc. 1 acc. 2 acc. 3 acc. 4 acc. >4 acc. Concurrent program testing can focus on small groups of accesses The testing target shrinks from exponential to polynomial 14
Number Threads Involved 101 out of 105 (96%) bugs involve at most two threads Most bugs can be reliably disclosed if we check all possible interleaving between each pair of threads Few bugs cannot Example: Intensive resource competition among many threads causes unexpected delay
How Were Non-Deadlock Bugs Fixed? Fix Adding/changing locks 20 (27%) Condition check Data-structure change 19 (26%) Code switch Other 19 (26%) 20 (27%) 10 (13%) 6 ( 8%) No silver bullet for fixing concurrency bugs. Lock usage information is not enough to fix bugs. Implications
How Were Deadlock Bugs Fixed? Fix Give up resource acquisition Change resource acquisition order 7 (23%) Split the resource to smaller ones 1 ( 3%) Others 19 (61%) 4 (13%) Might introduce non-deadlock bugs Implications We need to pay attention to the correctness of ``fixed deadlock bugs 17
Other Findings Impact of concurrency bugs ~ 70% leads to program crash or hang Reproducing bugs are critical to diagnosis Many examples Programmers lack diagnosis tools No automated diagnosis tools mentioned Most are diagnosed via code review Reproduce bugs are extremely hard and directly determines the diagnosing time 60% 1st-time patches still contain concurrency bugs (old or new) Usefulness and concerns of transactional memory Referto our paper for more details and examples
Summary Bug detection needs to look at order-violation bugs and multi-variable concurrency bugs Testing can target at more realistic interleaving coverage goals Fixing concurrency bugs is not trivial and not easy to get right Support from automated tools is needed Diagnosing tools are needed, especially bug reproducing tools Current bug detection tools are not widely used 19
Some Follow-up Work AVIO: atomicity violation detection [ASPLOS 06] Automatically extracting interleaving invariants to detect violations Multi-variable concurrency bug detection [SOSP 07] Mining variable correlations and then feed them to bug detectors Concurrency interleaving testing coverage criteria [FSE 07]