The Clang Static Analyzer by Vince Bridgers

undefined
 
U
s
i
n
g
 
t
h
e
 
C
l
a
n
g
 
S
t
a
t
i
c
 
A
n
a
l
y
z
e
r
 
Vince Bridgers
A
b
o
u
t
 
t
h
i
s
 
t
u
t
o
r
i
a
l
 
“Soup to nuts” – Small amount of theory to a practical example
Why Static Analysis?
Static Analysis in Continuous Integration
What is Cross Translation Unit Analysis, and how Z3 can help
Using Clang Static Analysis on an Open Source Project
 
N
o
t
i
c
e
 
m
o
s
t
 
b
u
g
s
 
a
r
e
 
i
n
t
r
o
d
u
c
e
d
e
a
r
l
y
 
i
n
 
t
h
e
 
d
e
v
e
l
o
p
m
e
n
t
 
p
r
o
c
e
s
s
,
a
n
d
 
a
r
e
 
c
o
d
i
n
g
 
a
n
d
 
d
e
s
i
g
n
p
r
o
b
l
e
m
s
.
M
o
s
t
 
b
u
g
s
 
a
r
e
 
f
o
u
n
d
 
d
u
r
i
n
g
 
u
n
i
t
t
e
s
t
,
 
w
h
e
r
e
 
t
h
e
 
c
o
s
t
 
i
s
 
h
i
g
h
e
r
T
h
e
 
c
o
s
t
 
o
f
 
f
i
x
i
n
g
 
b
u
g
s
 
g
r
o
w
e
x
p
o
n
e
n
t
i
a
l
l
y
 
a
f
t
e
r
 
r
e
l
e
a
s
e
C
o
n
c
l
u
s
i
o
n
:
 
T
h
e
 
e
a
r
l
i
e
r
 
t
h
e
 
b
u
g
s
f
o
u
n
d
,
 
a
n
d
 
m
o
r
e
 
b
u
g
s
 
f
o
u
n
d
e
a
r
l
i
e
r
 
i
n
 
t
h
e
 
d
e
v
e
l
o
p
m
e
n
t
 
p
r
o
c
e
s
s
t
r
a
n
s
l
a
t
e
s
 
t
o
 
l
e
s
s
 
c
o
s
t
W
h
y
 
t
o
o
l
s
 
l
i
k
e
 
S
t
a
t
i
c
 
A
n
a
l
y
s
i
s
?
 
:
 
C
o
s
t
 
o
f
 
b
u
g
s
S
o
u
r
c
e
:
 
A
p
p
l
i
e
d
 
S
o
f
t
w
a
r
e
 
M
e
a
s
u
r
e
m
e
n
t
,
 
C
a
s
p
e
r
s
 
J
o
n
e
s
,
 
1
9
9
6
F
i
n
d
i
n
g
 
F
l
a
w
s
 
i
n
 
S
o
u
r
c
e
 
C
o
d
e
 
Compiler diagnostics
Code reviews
“Linting” checks, like Clang-tidy
Static Analysis using Symbolic Execution
Analysis Performed executing the code symbolically through simulation
Dynamic Analysis – Examples include UBSAN, TSAN, and ASAN
Analysis performed by instrumenting and running the code on a real target
Difficult to test the entire program, and all paths – dependent upon test cases
 
undefined
 
F
o
u
r
 
P
i
l
l
a
r
s
 
o
f
 
P
r
o
g
r
a
m
 
A
n
a
l
y
s
i
s
5
 
None
 
Programmatic
checks
 
No
 
Clang, gcc, cl
 
C
o
m
p
i
l
e
r
d
i
a
g
n
o
s
t
i
c
s
 
L
i
n
t
e
r
s
,
 
s
t
y
l
e
c
h
e
c
k
e
r
s
 
Lint, clang-tidy,
Clang-format,
indent, sparse
 
Yes
 
Text/AST
matching
 
Extra compile step
 
S
t
a
t
i
c
 
A
n
a
l
y
s
i
s
 
Cppcheck, gcc
10+, clang
 
Yes
 
Symbolic Execution
 
Extra compile step
 
D
y
n
a
m
i
c
 
A
n
a
l
y
s
i
s
 
Not likely, but
possible
 
Valgrind, gcc
and clang
 
Injection of runtime
checks, library
 
Extra compile step,
extended run times
undefined
Quick Feedback
Code Change
 
Automated
Program
Analysis
Manual
Code
Review
Test
 
Ready to commit
S
y
n
t
a
x
,
 
S
e
m
a
n
t
i
c
,
 
a
n
d
 
A
n
a
l
y
s
i
s
 
C
h
e
c
k
s
:
C
a
n
 
a
n
a
l
y
z
e
 
p
r
o
p
e
r
t
i
e
s
 
o
f
 
c
o
d
e
 
t
h
a
t
 
c
a
n
n
o
t
 
b
e
 
t
e
s
t
e
d
 
(
c
o
d
i
n
g
 
s
t
y
l
e
)
!
Automates and offloads portions of manual code review
Tightens up CI loop for many issues
Report coding errors
T
y
p
i
c
a
l
 
C
I
 
L
o
o
p
 
w
i
t
h
 
A
u
t
o
m
a
t
e
d
 
A
n
a
l
y
s
i
s
6
F
i
n
d
i
n
g
 
b
u
g
s
 
w
i
t
h
 
t
h
e
 
C
o
m
p
i
l
e
r
 
Static analysis can find deeper bugs through program analysis techniques – like memory leaks,
buffer overruns, logic errors.
 
 
 
 
  1: 
#include 
<stdio.h>
  2: 
int
 main(
void
) {
  3:     printf("%s%lb%d", "unix", 10, 20);
  4:     
return
 
0
;
  5: }
 
$ clang t.c
t.c:3:17: 
warning:
 invalid conversion specifier 'b' [-Wformat-invalid-specifier]
    printf("%s%lb%d", "unix", 10, 20);
              ~~^
t.c:3:35: 
warning:
 data argument not used by format string [-Wformat-extra-args]
    printf("%s%lb%d", "unix", 10, 20);
           ~~~~~~~~~              ^
2 warnings generated.
F
i
n
d
i
n
g
 
b
u
g
s
 
w
i
t
h
 
t
h
e
 
A
n
a
l
y
z
e
r
 
This example compiles fine – but there are errors here.
Static analysis can find deeper bugs through program analysis techniques
This one is simple, but imagine a large project – thousands of files, millions of lines of code
 
 
 
 1:
int
 function(
int
 b) {
 2:    
int
 a, c;
 3:    
switch
 (b) {
 4:        
case
 
1
: a = b / 
0
; 
break
;
 5:        
case
 
4
: c = b - 
4
;
 6:                a = b/c; 
break
;
 7:    }
 8:    
return
 a;
 9:}
P
r
o
g
r
a
m
 
A
n
a
l
y
s
i
s
 
v
s
 
T
e
s
t
i
n
g
 
“Ad hoc” Testing usually tests a subset of paths in the program.
Usually “happy paths”
May miss errors
It’s fast, but real coverage can be sparse
Same is true for other testing methods such as Sanitizers
All used together – a useful combination
 
 
 
1
2
3
4
P
r
o
g
r
a
m
 
A
n
a
l
y
s
i
s
 
v
s
 
T
e
s
t
i
n
g
 
Program analysis can exhaustively explore all execution paths
Reports errors as traces, or “chains of reasoning”
Downside – doesn’t scale well – path explosion
Path Explosion mitigation techniques …
Bounded model checking – breadth-first search approach
Depth-first search for symbolic execution
 
 
 
1
2
3
4
5
6
7
8
12
C
l
a
n
g
 
S
t
a
t
i
c
 
A
n
a
l
y
z
e
r
 
(
C
S
A
)
 
The CSA performs context-sensitive, inter-procedural analysis
Designed to be fast to detect common mistakes
Speed comes at the expense of some precision
Normally, clang static analysis works in the boundary of a single translation unit.
With additional steps and configuration, static analysis can use multiple translation units.
 
 
 
C
l
a
n
g
 
S
t
a
t
i
c
 
A
n
a
l
y
z
e
r
 
 
S
y
m
b
o
l
i
c
 
E
x
e
c
u
t
i
o
n
 
b: $b
b: $b
b: $b
b: $b
b: $b
b: $b
c: 0
b: $b
c: 0
 
Finds bugs without running the code
Path sensitive analysis
CFGs used to create exploded graphs of
simulated control flows
 
 
 
 
int
 function(
int
 b) {
    
int
 a, c;
    
switch
 (b) {
        
case
 
1
: a = b / 
0
; 
break
;
        
case
 
4
: c = b – 
4
;
                a = b/c; 
break
;
    }
    
return
 a;
}
 
default
 
case 1
 
case 4
 
switch(b)
 
$b=[1,1]
 
$b=[4,4]
 
c=b-4
 
a=b/0
 
$b=[4,4]
 
a=b/c
 
R
e
t
u
r
n
G
a
r
b
a
g
e
 
v
a
l
u
e
 
D
i
v
i
d
e
 
b
y
 
0
 
D
i
v
i
d
e
 
b
y
 
0
Source: Clang Static Analysis - Gabor Horvath - Meeting C++ 2016
 
C
o
m
p
i
l
e
r
w
a
r
n
s
 
h
e
r
e
U
s
i
n
g
 
t
h
e
 
C
l
a
n
g
 
S
t
a
t
i
c
 
A
n
a
l
y
z
e
r
 
 
E
x
a
m
p
l
e
 
1
 
Basic example ….
$  clang --analyze div0.c
Runs the analyzer, outputs text report
$  clang --analyze -Xclang -analyzer-output=html -o <output-dir> div0.c
Runs the analyzer on div0.c, outputs an HTML formatted “chain of reasoning” to the
output directory.
cd to <output-dir>, firefox report* &
 
 
 
 
U
s
i
n
g
 
t
h
e
 
C
l
a
n
g
 
S
t
a
t
i
c
 
A
n
a
l
y
z
e
r
 
 
E
x
a
m
p
l
e
 
2
 
Basic example ….
$  scan-build -V clang -c div0.c
Runs the analyzer on div0.c, brings up an HTML report
 
 
 
C
l
a
n
g
 
S
t
a
t
i
c
 
A
n
a
l
y
z
e
r
 
 
E
x
a
m
p
l
e
 
1
void f6(int x) {
    int a[4];
    if (x==5) {
        if (a[x] == 123) {}
    }
}
 
$ clang --analyze -Xclang -analyzer-output=html -o somedir check.c
check.c:6:18: 
warning:
 The left operand of '==' is a garbage value due to array index out of bounds [core.UndefinedBinaryOperatorResult]
        if (a[x] == 123) {}
            ~~~~ ^
1 warning generated.
 
Intra procedural
Array index out of bounds.
 
 
C
l
a
n
g
 
S
t
a
t
i
c
 
A
n
a
l
y
z
e
r
 
 
E
x
a
m
p
l
e
 
2
  1:
  2: int foobar() {
  3:     int i;
  4:     int *p = &i;
  5:     return *p;
  6: }
 
Intra procedural
‘i’ declared without an initial value
‘*p’, undefined or garbage value
C
l
a
n
g
 
S
t
a
t
i
c
 
A
n
a
l
y
z
e
r
 
 
E
x
a
m
p
l
e
 
3
  1:
  2: #include <stdlib.h>
  3:
  4: int process(void *ptr, int cond) {
  5:     if (cond)
  6:         free(ptr);
  7: }
  8:
  9: int entry(size_t sz, int cond) {
 10:     void *ptr = malloc(sz);
 11:     if (ptr)
 12:         process(ptr, cond);
 13:
 14:     return 0;
 15: }
 
Analysis spans functions – said to be
“inter-procedural”
A Memory leak!
 
 
W
h
a
t
 
a
b
o
u
t
 
a
n
a
l
y
z
i
n
g
 
c
a
l
l
s
 
t
o
 
e
x
t
e
r
n
a
l
 
f
u
n
c
t
i
o
n
s
?
 
These examples were single translation unit only.
In other words, in the same, single source file – “inter-procedural”, or inside of a
single translation unit
What if a function calls another function outside of it’s translation unit?
Referred to as “Cross translation Unit”
Examples …
 
 
 
C
r
o
s
s
 
T
r
a
n
s
l
a
t
i
o
n
 
U
n
i
t
 
A
n
a
l
y
s
i
s
 
CTU gives the analyzer a view across translation units
Avoids false positives caused by lack of information
Helps the analyzer constrain variables during analysis
int foo();
int main() {
    return 3/foo();
}
 
int foo() {
    return 0;
}
foo() is not known to
be 0 without CTU
Main.cpp
 
Foo.cpp
H
o
w
 
d
o
e
s
 
C
T
U
 
w
o
r
k
?
CTU
Build
Call
Graph
Function
index
AST
Dumps
Analyzer
Analysis
results
Source code and JSON Compilation Database
Pass 1
 
Pass 2
c
o
m
p
i
l
e
_
c
o
m
m
a
n
d
s
.
j
s
o
n
M
a
n
u
a
l
 
C
T
U
 
 
c
o
m
p
i
l
e
_
c
o
m
m
a
n
d
s
.
j
s
o
n
[
  {
    "directory": “<root>/examples/ctu",
    "command": "clang++ -c foo.cpp -o foo.o",
    "file": "foo.cpp"
  },
  {
    "directory": “<root>/examples/ctu",
    "command": "clang++ -c main.cpp -o main.o",
    "file": "main.cpp"
  }
]
 
Mappings implicitly use the compile_commands.json file
Analysis phase uses compile_command.json to locate the source files.
S
o
u
r
c
e
:
 
h
t
t
p
s
:
/
/
c
l
a
n
g
.
l
l
v
m
.
o
r
g
/
d
o
c
s
/
a
n
a
l
y
z
e
r
/
u
s
e
r
-
d
o
c
s
/
C
r
o
s
s
T
r
a
n
s
l
a
t
i
o
n
U
n
i
t
.
h
t
m
l
 
M
a
n
u
a
l
 
C
T
U
 
-
 
D
e
m
o
 
# Generate the AST (or the PCH)
clang++ -emit-ast -o foo.cpp.ast foo.cpp
# Generate the CTU Index file, holds external defs info
clang-extdef-mapping -p . foo.cpp > externalDefMap.txt
 
# Fixup for cpp -> ast, use relative paths
sed -i -e "s/.cpp/.cpp.ast/g" externalDefMap.txt
sed -i -e "s|$(pwd)/||g" externalDefMap.txt
 
# Do the analysis
clang++ --analyze \
    -Xclang -analyzer-config -Xclang experimental-enable-naive-ctu-analysis=true \
    -Xclang -analyzer-config -Xclang ctu-dir=. \
    -Xclang -analyzer-output=plist-multi-file \
    main.cpp
 
 
 
U
s
i
n
g
 
C
r
o
s
s
 
T
r
a
n
s
l
a
t
i
o
n
 
U
n
i
t
 
A
n
a
l
y
s
i
s
 
scan-build.py within Clang can be used to drive Static Analysis on projects, scan-
build is not actively maintained for Cross Translation Unit Analysis.
Ericsson’s Open Source CodeChecker tool supports CTU flows
Let’s see an example …
C
o
d
e
C
h
e
c
k
e
r
 
a
u
t
o
m
a
t
e
s
 
t
h
i
s
 
p
r
o
c
e
s
s
 
# Create a compile.json
CodeChecker log –b “clang main.cpp foo.cpp” –o compile.json
 
 
# First, try without CTU
CodeChecker analyze –e default –clean compile.json –o result
CodeChecker parse result
 
# Add CTU
CodeChecker analyze –e default –ctu –clean compile.json –o result
CodeChecker parse result
 
# try with scan build
scan-build clang main.cpp foo.cpp
 
 
 
 
 
 
B
e
n
e
f
i
t
s
 
o
f
 
C
T
U
 
2.4x Average
2.1x median
5x peak
Note there are some lost
defects when using CTU
 
 
 
 
See 
https://llvm.org/devmtg/2017-03//assets/slides/cross_translation_unit_analysis_in_clang_static_analyzer.pdf
 ,
https://www.youtube.com/watch?v=7AWgaqvFsgs
C
S
A
 
M
o
d
e
l
i
n
g
 
W
e
a
k
n
e
s
s
e
s
 
CSA does a good job modeling program execution, but does have some
weaknesses.
CSA is built for speed, and common cases. The constraint solver gives up on some
complex expressions when they appear with symbolic values.
An example …
E
x
a
m
p
l
e
 
o
f
 
u
n
h
a
n
d
l
e
d
 
b
i
t
w
i
s
e
 
o
p
e
r
a
t
i
o
n
s
This program is safe, albeit brittle
1:
 unsigned int 
func(
unsigned int 
a) {
2:
     
unsigned int 
*z = 
0
;
3:
     
if
 ((a & 
1
) && ((a & 
1
) ^
1
))
4:
         
return
 *z; 
// unreachable
5:
     
return
 
0
;
6: 
}
 
$ clang --analyze test.cpp
test.cpp:5:16: 
warning:
 Dereference of null pointer (loaded from variable 'z') [core.NullDereference]
        return *z;
               ^~
1 warning generated.
 
$ clang --analyze -Xclang -analyzer-config -Xclang crosscheck-with-z3=true test.cpp
 
$ clang --analyze  -Xclang -analyzer-constraints=z3 func.c
 
Z3 Refutation, preferred
 
Z3 constraint manager, slower
S
o
u
r
c
e
:
 
R
e
f
u
t
i
n
g
 
f
a
l
s
e
 
b
u
g
s
 
i
n
 
t
h
e
 
c
l
a
n
g
 
s
t
a
t
i
c
 
a
n
a
l
y
z
e
r
,
 
G
a
d
e
l
h
a
 
h
t
t
p
s
:
/
/
w
w
w
.
y
o
u
t
u
b
e
.
c
o
m
/
w
a
t
c
h
?
v
=
S
O
8
4
A
m
b
W
i
L
A
 
R
e
f
u
t
i
n
g
 
F
a
l
s
e
 
P
o
s
i
t
i
v
e
s
 
w
i
t
h
 
Z
3
 
CSA sometimes detects false positives because of limitations in the CSA
constraint manager.
Speed comes at the expense of precision -- symbolic analysis does not handle
some arithmetic and bitwise operations. Z3 can compensate for some of these
shortcoming.
CodeChecker enables Z3 by default, if found.
See 
https://github.com/Z3Prover/z3
. Clang can be compiled to use Z3.
W
h
y
 
n
o
t
 
j
u
s
t
 
r
e
p
l
a
c
e
 
t
h
e
 
C
S
A
 
s
o
l
v
e
r
?
 
First SMT backend solver (Z3) implemented in late 2017. It aimed to replace the
CSA constraint solver.
This solver was 20 times slower than the built in solver.
A refutation approach gives us best of both worlds
Clang Static Analyzer’s Speed for common cases
A chance for a Z3 solver to refute bugs
So, this is the approach for now
 
 
 
P
u
t
t
i
n
g
 
i
t
 
a
l
l
 
t
o
g
e
t
h
e
r
 
 
How do we use everything we’ve learned to find some real bugs?
Using LLVM/Clang “tip of tree”, compiled with Z3 “tip of tree”
Let’s look at the “bitcoin curve” library 
https://github.com/bitcoin-core/secp256k1.git
.
 It’s small enough to demonstrate, and does have some bugs CSA can find
I’ll demonstrate how to run Static Analysis on this code, and the differences in analysis
results using Z3 and Cross Translation Unit Analysis
I’ll also demonstrate using Clang Static Analyzer on a well developed project, gzip
 
 
 
R
e
s
u
l
t
s
 
&
 
C
o
n
c
l
u
s
i
o
n
 
We found some real bugs in the “bit coin curve” library.
Demonstrated how more bugs can be found, or refuted, using CTU and Z3
Shown you how to make use of Clang tools to find real bugs
 
 
 
 
R
e
f
e
r
e
n
c
e
s
 
 
Using scan-build 
https://clang-analyzer.llvm.org/scan-build.html
Cross Translation Unit Analysis 
https://clang.llvm.org/docs/analyzer/user-docs/CrossTranslationUnit.html
CodeChecker 
https://github.com/Ericsson/codechecker
Z3 Refutation in Clang - 
https://arxiv.org/pdf/1810.12041.pdf
Implementation of CTU in Clang - 
https://dl.acm.org/doi/pdf/10.1145/3183440.3195041
https://llvm.org/devmtg/2017-03//assets/slides/cross_translation_unit_analysis_in_clang_static_analyzer.pdf
SMT based refutation of spurious bug reports in CSA - 
https://www.youtube.com/watch?v=WxzC_kprgP0
“Bit coin curve” library - 
https://github.com/bitcoin-core/secp256k1.git
Compile command JSON Specification 
https://clang.llvm.org/docs/JSONCompilationDatabase.html
Z3 
https://github.com/Z3Prover/z3
Tutorial Source - 
https://github.com/vabridgers/LLVM-Virtual-Tutorial-2020.git
 
T
h
a
n
k
 
y
o
u
 
f
o
r
 
a
t
t
e
n
d
i
n
g
!
 
 
D
e
m
o
 
n
o
t
e
s
 
 
git clone 
https://github.com/Z3Prover/z3.git
cd z3; 
mkdir build; cd build
cmake -G Ninja ../ ; ninja ; sudo ninja install # assumes installed at /usr/local/lib/libz3.so
CodeChecker pulled/installed from 
https://github.com/Ericsson/CodeChecker.git
Be sure to set “
CC_ANALYZERS_FROM_PATH=1”, set PATH to your clang
Bit coin curve library git clone 
https://github.com/bitcoin-core/secp256k1.git
Gzip
 
https://git.savannah.gnu.org/git/gzip.git
 
Run scan-build -> “scan-build make”
CodeChecker command notes …
CodeChecker log –b “make” –o compile_commands.json
CodeChecker analyze –e default –clean –j 16 compile_commands.json –o outputdir
CodeChecker analyze –e default –ctu –clean –j 16 compile_commands.json –o outputdir
CodeChecker analyze –e default –ctu –z3-refutation off –clean –j 16 compile_commands.json –o outputdir
CodeChecker parse –e html –o html-output-dir outputdir
 
Slide Note
Embed
Share

Explore the benefits of tools like Static Analysis, discover different types of program analysis such as linting checks and Compiler diagnostics, and learn about finding flaws in source code through symbolic and dynamic analysis approaches. The tutorial covers the importance of early bug detection, usage in Continuous Integration, and the impact on development costs.

  • Static Analysis
  • Program Analysis
  • Compiler Diagnostics
  • Source Code Analysis
  • Continuous Integration

Uploaded on Jul 16, 2024 | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Using the Clang Static Analyzer Vince Bridgers

  2. About this tutorial Soup to nuts Small amount of theory to a practical example Why Static Analysis? Static Analysis in Continuous Integration What is Cross Translation Unit Analysis, and how Z3 can help Using Clang Static Analysis on an Open Source Project

  3. Why tools like Static Analysis? : Cost of bugs Notice most bugs are introduced early in the development process, and are coding and design problems. Most bugs are found during unit test, where the cost is higher The cost of fixing bugs grow exponentially after release Conclusion: The earlier the bugs found, and more bugs found earlier in the development process translates to less cost Source: Applied Software Measurement, Caspers Jones, 1996

  4. Finding Flaws in Source Code Compiler diagnostics Code reviews Linting checks, like Clang-tidy Static Analysis using Symbolic Execution Analysis Performed executing the code symbolically through simulation Dynamic Analysis Examples include UBSAN, TSAN, and ASAN Analysis performed by instrumenting and running the code on a real target Difficult to test the entire program, and all paths dependent upon test cases

  5. Four Pillars of Program Analysis Linters, style checkers Compiler diagnostics Static Analysis Dynamic Analysis Lint, clang-tidy, Clang-format, indent, sparse Yes Cppcheck, gcc 10+, clang Valgrind, gcc and clang Examples Clang, gcc, cl Not likely, but possible False positives No Yes Symbolic Execution Inner Workings Injection of runtime checks, library Text/AST matching Programmatic checks None Extra compile step Extra compile step Extra compile step, extended run times Compile and Runtime affects 5

  6. Typical CI Loop with Automated Analysis Code Change Ready to commit Manual Code Review Automated Program Analysis Test Report coding errors Quick Feedback Syntax, Semantic, and Analysis Checks: Can analyze properties of code that cannot be tested (coding style)! Automates and offloads portions of manual code review Tightens up CI loop for many issues 6

  7. Finding bugs with the Compiler 1: #include <stdio.h> 2: int main(void) { 3: printf("%s%lb%d", "unix", 10, 20); 4: return 0; 5: } $ clang t.c t.c:3:17: warning: invalid conversion specifier 'b' [-Wformat-invalid-specifier] printf("%s%lb%d", "unix", 10, 20); ~~^ t.c:3:35: warning: data argument not used by format string [-Wformat-extra-args] printf("%s%lb%d", "unix", 10, 20); ~~~~~~~~~ ^ 2 warnings generated. Static analysis can find deeper bugs through program analysis techniques like memory leaks, buffer overruns, logic errors.

  8. Finding bugs with the Analyzer 1:int function(int b) { 2: int a, c; 3: switch (b) { 4: case 1: a = b / 0; break; 5: case 4: c = b - 4; 6: a = b/c; break; 7: } 8: return a; 9:} This example compiles fine but there are errors here. Static analysis can find deeper bugs through program analysis techniques This one is simple, but imagine a large project thousands of files, millions of lines of code

  9. Program Analysis vs Testing Ad hoc Testing usually tests a subset of paths in the program. 1 Usually happy paths 2 May miss errors 3 It s fast, but real coverage can be sparse 4 Same is true for other testing methods such as Sanitizers All used together a useful combination

  10. Program Analysis vs Testing Program analysis can exhaustively explore all execution paths 1 7 5 Reports errors as traces, or chains of reasoning 2 8 Downside doesn t scale well path explosion 3 12 6 Path Explosion mitigation techniques 4 Bounded model checking breadth-first search approach Depth-first search for symbolic execution

  11. Clang Static Analyzer (CSA) The CSA performs context-sensitive, inter-procedural analysis Designed to be fast to detect common mistakes Speed comes at the expense of some precision Normally, clang static analysis works in the boundary of a single translation unit. With additional steps and configuration, static analysis can use multiple translation units.

  12. Clang Static Analyzer Symbolic Execution switch(b) Finds bugs without running the code b: $b case 4 Path sensitive analysis default case 1 b: $b b: $b b: $b CFGs used to create exploded graphs of simulated control flows $b=[4,4] $b=[1,1] Return Garbage value c=b-4 Compiler warns here b: $b c: 0 int function(int b) { int a, c; switch (b) { case 1: a = b / 0; break; case 4: c = b 4; a = b/c; break; } return a; } a=b/0 $b=[4,4] b: $b a=b/c Divide by 0 b: $b c: 0 Divide by 0 Source: Clang Static Analysis - Gabor Horvath - Meeting C++ 2016

  13. Using the Clang Static Analyzer Example 1 Basic example . $ clang --analyze div0.c Runs the analyzer, outputs text report $ clang --analyze -Xclang -analyzer-output=html -o <output-dir> div0.c Runs the analyzer on div0.c, outputs an HTML formatted chain of reasoning to the output directory. cd to <output-dir>, firefox report* &

  14. Using the Clang Static Analyzer Example 2 Basic example . $ scan-build -V clang -c div0.c Runs the analyzer on div0.c, brings up an HTML report

  15. Clang Static Analyzer Example 1 void f6(int x) { int a[4]; if (x==5) { if (a[x] == 123) {} } } Intra procedural Array index out of bounds. $ clang --analyze -Xclang -analyzer-output=html -o somedir check.c check.c:6:18: warning: The left operand of '==' is a garbage value due to array index out of bounds [core.UndefinedBinaryOperatorResult] if (a[x] == 123) {} ~~~~ ^ 1 warning generated.

  16. Clang Static Analyzer Example 2 1: 2: int foobar() { 3: int i; 4: int *p = &i; 5: return *p; 6: } Intra procedural i declared without an initial value *p , undefined or garbage value

  17. Clang Static Analyzer Example 3 1: 2: #include <stdlib.h> 3: 4: int process(void *ptr, int cond) { 5: if (cond) 6: free(ptr); 7: } 8: 9: int entry(size_t sz, int cond) { 10: void *ptr = malloc(sz); 11: if (ptr) 12: process(ptr, cond); 13: 14: return 0; 15: } Analysis spans functions said to be inter-procedural A Memory leak!

  18. What about analyzing calls to external functions? These examples were single translation unit only. In other words, in the same, single source file inter-procedural , or inside of a single translation unit What if a function calls another function outside of it s translation unit? Referred to as Cross translation Unit Examples

  19. Cross Translation Unit Analysis Foo.cpp Main.cpp int foo() { return 0; } int foo(); int main() { return 3/foo(); } foo() is not known to be 0 without CTU CTU gives the analyzer a view across translation units Avoids false positives caused by lack of information Helps the analyzer constrain variables during analysis

  20. How does CTU work? Call Graph Pass 2 Pass 1 CTU Build Function index Analysis results Analyzer AST Dumps Source code and JSON Compilation Database compile_commands.json

  21. Manual CTU compile_commands.json [ { "directory": <root>/examples/ctu", "command": "clang++ -c foo.cpp -o foo.o", "file": "foo.cpp" }, { "directory": <root>/examples/ctu", "command": "clang++ -c main.cpp -o main.o", "file": "main.cpp" } ] Mappings implicitly use the compile_commands.json file Analysis phase uses compile_command.json to locate the source files. Source: https://clang.llvm.org/docs/analyzer/user-docs/CrossTranslationUnit.html

  22. Manual CTU - Demo # Generate the AST (or the PCH) clang++ -emit-ast -o foo.cpp.ast foo.cpp # Generate the CTU Index file, holds external defs info clang-extdef-mapping -p . foo.cpp > externalDefMap.txt # Fixup for cpp -> ast, use relative paths sed -i -e "s/.cpp/.cpp.ast/g" externalDefMap.txt sed -i -e "s|$(pwd)/||g" externalDefMap.txt # Do the analysis clang++ --analyze \ -Xclang -analyzer-config -Xclang experimental-enable-naive-ctu-analysis=true \ -Xclang -analyzer-config -Xclang ctu-dir=. \ -Xclang -analyzer-output=plist-multi-file \ main.cpp

  23. Using Cross Translation Unit Analysis scan-build.py within Clang can be used to drive Static Analysis on projects, scan- build is not actively maintained for Cross Translation Unit Analysis. Ericsson s Open Source CodeChecker tool supports CTU flows Let s see an example

  24. CodeChecker automates this process # Create a compile.json CodeChecker log b clang main.cpp foo.cpp o compile.json # First, try without CTU CodeChecker analyze e default clean compile.json o result CodeChecker parse result # Add CTU CodeChecker analyze e default ctu clean compile.json o result CodeChecker parse result # try with scan build scan-build clang main.cpp foo.cpp

  25. Benefits of CTU 2.4x Average 2.1x median 5x peak Note there are some lost defects when using CTU See https://llvm.org/devmtg/2017-03//assets/slides/cross_translation_unit_analysis_in_clang_static_analyzer.pdf , https://www.youtube.com/watch?v=7AWgaqvFsgs

  26. CSA Modeling Weaknesses CSA does a good job modeling program execution, but does have some weaknesses. CSA is built for speed, and common cases. The constraint solver gives up on some complex expressions when they appear with symbolic values. An example

  27. Example of unhandled bitwise operations 1: unsigned int func(unsigned int a) { 2: unsigned int *z = 0; 3: if ((a & 1) && ((a & 1) ^1)) 4: return *z; // unreachable 5: return 0; 6: } This program is safe, albeit brittle $ clang --analyze test.cpp test.cpp:5:16: warning: Dereference of null pointer (loaded from variable 'z') [core.NullDereference] return *z; ^~ 1 warning generated. Z3 Refutation, preferred $ clang --analyze -Xclang -analyzer-config -Xclang crosscheck-with-z3=true test.cpp Z3 constraint manager, slower $ clang --analyze -Xclang -analyzer-constraints=z3 func.c Source: Refuting false bugs in the clang static analyzer, Gadelha https://www.youtube.com/watch?v=SO84AmbWiLA

  28. Refuting False Positives with Z3 CSA sometimes detects false positives because of limitations in the CSA constraint manager. Speed comes at the expense of precision -- symbolic analysis does not handle some arithmetic and bitwise operations. Z3 can compensate for some of these shortcoming. CodeChecker enables Z3 by default, if found. See https://github.com/Z3Prover/z3. Clang can be compiled to use Z3.

  29. Why not just replace the CSA solver? First SMT backend solver (Z3) implemented in late 2017. It aimed to replace the CSA constraint solver. This solver was 20 times slower than the built in solver. A refutation approach gives us best of both worlds Clang Static Analyzer s Speed for common cases A chance for a Z3 solver to refute bugs So, this is the approach for now

  30. Putting it all together How do we use everything we ve learned to find some real bugs? Using LLVM/Clang tip of tree , compiled with Z3 tip of tree Let s look at the bitcoin curve library https://github.com/bitcoin-core/secp256k1.git. It s small enough to demonstrate, and does have some bugs CSA can find I ll demonstrate how to run Static Analysis on this code, and the differences in analysis results using Z3 and Cross Translation Unit Analysis I ll also demonstrate using Clang Static Analyzer on a well developed project, gzip

  31. Results & Conclusion We found some real bugs in the bit coin curve library. Demonstrated how more bugs can be found, or refuted, using CTU and Z3 Shown you how to make use of Clang tools to find real bugs

  32. References Using scan-build https://clang-analyzer.llvm.org/scan-build.html Cross Translation Unit Analysis https://clang.llvm.org/docs/analyzer/user-docs/CrossTranslationUnit.html CodeChecker https://github.com/Ericsson/codechecker Z3 Refutation in Clang - https://arxiv.org/pdf/1810.12041.pdf Implementation of CTU in Clang - https://dl.acm.org/doi/pdf/10.1145/3183440.3195041 https://llvm.org/devmtg/2017-03//assets/slides/cross_translation_unit_analysis_in_clang_static_analyzer.pdf SMT based refutation of spurious bug reports in CSA - https://www.youtube.com/watch?v=WxzC_kprgP0 Bit coin curve library - https://github.com/bitcoin-core/secp256k1.git Compile command JSON Specification https://clang.llvm.org/docs/JSONCompilationDatabase.html Z3 https://github.com/Z3Prover/z3 Tutorial Source - https://github.com/vabridgers/LLVM-Virtual-Tutorial-2020.git

  33. Thank you for attending!

  34. Demo notes git clone https://github.com/Z3Prover/z3.git cd z3; mkdir build; cd build cmake -G Ninja ../ ; ninja ; sudo ninja install # assumes installed at /usr/local/lib/libz3.so CodeChecker pulled/installed from https://github.com/Ericsson/CodeChecker.git Be sure to set CC_ANALYZERS_FROM_PATH=1 , set PATH to your clang Bit coin curve library git clone https://github.com/bitcoin-core/secp256k1.git Gzip https://git.savannah.gnu.org/git/gzip.git Run scan-build -> scan-build make CodeChecker command notes CodeChecker log b make o compile_commands.json CodeChecker analyze e default clean j 16 compile_commands.json o outputdir CodeChecker analyze e default ctu clean j 16 compile_commands.json o outputdir CodeChecker analyze e default ctu z3-refutation off clean j 16 compile_commands.json o outputdir CodeChecker parse e html o html-output-dir outputdir

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#