Software Development Process

N
a
t
u
r
a
l
 
l
a
n
g
u
a
g
e
 
i
s
 
a
p
r
o
g
r
a
m
m
i
n
g
 
l
a
n
g
u
a
g
e
Michael D. Ernst
UW CSE
Joint work with Arianna Blasi, 
Juan Caballero,
Sergio Delgado Castellanos, 
Alberto Goffi,
Alessandra Gorla, Xi Victoria Lin, Deric Pang,
Mauro Pezzè, 
Irfan Ul Haq, Kevin Vu,
Chenglong Wang, 
Luke Zettlemoyer, and Sai
Zhang
Q
u
e
s
t
i
o
n
s
 
a
b
o
u
t
 
s
o
f
t
w
a
r
e
 
How many of you have used software?
How many of you have written software?
W
h
a
t
 
i
s
 
s
o
f
t
w
a
r
e
?
 
W
h
a
t
 
i
s
 
s
o
f
t
w
a
r
e
?
A sequence of instructions that perform some task
W
h
a
t
 
i
s
 
s
o
f
t
w
a
r
e
?
An engineered object amenable to formal analysis
A sequence of instructions that perform some task
W
h
a
t
 
i
s
 
s
o
f
t
w
a
r
e
?
A sequence of instructions that perform some task
W
h
a
t
 
i
s
 
s
o
f
t
w
a
r
e
?
A sequence of instructions that perform some task
W
h
a
t
 
i
s
 
s
o
f
t
w
a
r
e
?
A sequence of instructions that perform some task
Test cases
Version control history
Issue tracker
Documentation
How should it be analyzed?
Programming
User stories
Requirements
Specifications
Tests
Version control
Discussions
Architecture
Process
Models
Documentation
Programs
Issue tracker
Programming
Programs
User stories
Requirements
Specifications
Tests
Version control
Documentation
Variable names
Discussions
Architecture
Process
Models
Documentation
Structure
PL
Output strings
Issue tracker
Programming
Programs
User stories
Requirements
Specifications
Tests
Version control
Documentation
Variable names
Discussions
Architecture
Process
Models
Documentation
Structure
PL
Output strings
Issue tracker
Programming
Programs
User stories
Requirements
Specifications
Tests
Version control
Documentation
Variable names
Discussions
Architecture
Process
Models
Documentation
Structure
PL
Output strings
Issue tracker
Programming
Programs
User stories
Requirements
Specifications
Tests
Version control
Documentation
Output strings
Variable names
Discussions
Architecture
Process
Models
Documentation
Structure
PL
Issue tracker
A
n
a
l
y
s
i
s
 
o
f
 
a
 
n
a
t
u
r
a
l
 
o
b
j
e
c
t
Machine learning over executions
Version control history analysis
Bug prediction
Upgrade safety
Prioritizing warnings
Program repair
S
p
e
c
i
f
i
c
a
t
i
o
n
s
 
a
r
e
 
n
e
e
d
e
d
;
T
e
s
t
s
 
a
r
e
 
a
v
a
i
l
a
b
l
e
 
b
u
t
 
i
g
n
o
r
e
d
Specs are needed.  
Many papers start:
“Given a program and its specification…”
Tests are ignored.  
Formal verification process:
Write the program
Test the program
Verify the program, 
ignoring
 testing artifacts
Observation
: Programmers embed semantic info in tests
Goal
:  translate tests into specifications
Approach
:  machine learning over executions
D
y
n
a
m
i
c
 
d
e
t
e
c
t
i
o
n
o
f
 
l
i
k
e
l
y
 
i
n
v
a
r
i
a
n
t
s
Observe values that the program computes
Generalize over them via machine learning
Result:  invariants (as in 
assert
s or specifications)
x > abs(y)
x = 16*y + 4*z + 3
array 
a
 contains no duplicates
for each node 
n
,
 
n
 
=
 
n.child.parent
graph 
g
 is acyclic
Unsound, incomplete, and 
useful
https://plse.cs.washington.edu/daikon/
[ICSE 1999]
Programming
Programs
User stories
Requirements
Specifications
Tests
Version control
Documentation
Variable names
Discussions
Architecture
Process
Models
Documentation
Structure
PL
Output strings
Issue tracker
Programming
Programs
User stories
Requirements
Specifications
Tests
Version control
Documentation
Variable names
Discussions
Architecture
Process
Models
Documentation
Structure
PL
Output strings
Issue tracker
Programming
Programs
User stories
Requirements
Specifications
Tests
Version control
Documentation
Variable names
Discussions
Architecture
Process
Models
Documentation
Structure
PL
Output strings
Issue tracker
A
p
p
l
y
i
n
g
 
N
L
P
 
t
o
 
s
o
f
t
w
a
r
e
 
e
n
g
i
n
e
e
r
i
n
g
Problems
inadequate
diagnostics
incorrect
operations
 
missing
tests
unimplemented
functionality
 
NL sources
error
messages
variable
names
code
comments
user
questions
 
NLP techniques
 document
 similarity
 word
 semantics
 parse
 trees
 translation
Analyze
existing 
code
Generate
new 
code
A
p
p
l
y
i
n
g
 
N
L
P
 
t
o
 
s
o
f
t
w
a
r
e
 
e
n
g
i
n
e
e
r
i
n
g
Problems
inadequate
diagnostics
incorrect
operations
 
missing
tests
unimplemented
functionality
NL sources
error
messages
variable
names
code
comments
user
questions
NLP techniques
 document
 similarity
 word
 semantics
 parse
 trees
 translation
[ISSTA 2015]
I
n
a
d
e
q
u
a
t
e
 
d
i
a
g
n
o
s
t
i
c
 
m
e
s
s
a
g
e
s
Scenario:
  user supplies a wrong configuration option
--port_num=100.0
Problem:
  software issues an unhelpful error message
“unexpected system failure”
“unable to establish connection”
Hard for end users to diagnose
Goal:  
detect such problems 
before
 shipping the code
Better message:  “
--port_num
 should be an integer”
C
h
a
l
l
e
n
g
e
s
 
f
o
r
 
p
r
o
a
c
t
i
v
e
 
d
e
t
e
c
t
i
o
n
o
f
 
i
n
a
d
e
q
u
a
t
e
 
d
i
a
g
n
o
s
t
i
c
 
m
e
s
s
a
g
e
s
How to 
trigger
 
a configuration error
?
How to 
determine the inadequacy 
of a diagnostic message?
How to 
trigger
 
a configuration error
?
How to 
determine the inadequacy 
of a diagnostic message?
C
o
n
f
D
i
a
g
D
e
t
e
c
t
o
r
s
 
s
o
l
u
t
i
o
n
s
 
Configuration mutation 
+ 
run system tests
 
Use a 
NLP technique to check its semantic meaning
 
 
 configuration
 
+
User manual
 
 
Similar semantic meanings?
 
(Assumption:  a manual,
webpage, or man page exists.)
 
(We know the root cause.)
W
h
e
n
 
i
s
 
a
 
m
e
s
s
a
g
e
 
a
d
e
q
u
a
t
e
?
 
Contains the mutated 
option name
 or value [Keller’08,
Yin’11]
Mutated option:
     
--percentage-split
Diagnostic message:
the value of percentage-split should be > 0
Similar 
semantic meaning
 as the manual description
Mutated option:
     
--fnum
Diagnostic message:
Number of folds must be greater than 1
User manual description of 
--fnum
:
Sets number of folds for cross-validation
 
 
C
l
a
s
s
i
c
a
l
 
d
o
c
u
m
e
n
t
 
s
i
m
i
l
a
r
i
t
y
:
T
F
-
I
D
F
 
+
 
c
o
s
i
n
e
 
s
i
m
i
l
a
r
i
t
y
1.
Convert document into a real-valued vector
2.
Document similarity = vector cosine similarity
Vector length = dictionary size, values = term frequency (TF)
Example:  [2 
classical
, 8 
document
, 3 
problem
, 3 
values
, …]
Problem:  frequent words swamp important words
Solution:  values = TF x IDF   (inverse document frequency)
IDF = log(total documents / documents with the term)
Problem: does not work well on very short documents
T
e
x
t
 
s
i
m
i
l
a
r
i
t
y
 
t
e
c
h
n
i
q
u
e
 
[
M
i
h
a
l
c
e
a
0
6
]
The documents have similar semantic meanings
if many words in them have similar meanings
 
The program goes wrong
 
The software fails
 
Example:
 
1.
Remove all stop words.
2.
For each word in the diagnostic message,
try to find similar words in the manual.
3.
Two sentences are similar, if “many”  words
are similar between them.
R
e
s
u
l
t
s
Reported 25 missing and 18 inadequate messages
in Weka, JMeter, Jetty, Derby
Validation by 3 programmers:
0% false negative rate
Tool says message is adequate, humans say it is inadequate
2% false positive rate
Tool says message is inadequate, humans say it is adequate
Previous best: 16%
R
e
l
a
t
e
d
 
w
o
r
k
Configuration error diagnosis techniques
Dynamic tainting [Attariyan’08], static tainting
[Rabkin’11], Chronus [Whitaker’04]
Troubleshooting an exhibited error rather than detecting
inadequate diagnostic messages
Software diagnosability improvement techniques
PeerPressure [Wang’04], RangeFixer [Xiong’12], ConfErr
[Keller’08] and Spex-INJ [Yin’11], EnCore [Zhang’14]
Requires source code, usage history, or OS-level support
A
p
p
l
y
i
n
g
 
N
L
P
 
t
o
 
s
o
f
t
w
a
r
e
 
e
n
g
i
n
e
e
r
i
n
g
Problems
inadequate
diagnostics
incorrect
operations
 
missing
tests
unimplemented
functionality
NL sources
error
messages
variable
names
code
comments
user
questions
NLP techniques
 document
 similarity
 word
 semantics
 parse
 trees
 translation
[WODA 2015]
U
n
d
e
s
i
r
e
d
 
v
a
r
i
a
b
l
e
 
i
n
t
e
r
a
c
t
i
o
n
s
int totalPrice;
int itemPrice;
int shippingDistance;
totalPrice = itemPrice + shippingDistance;
The compiler issues no warning
A human can tell the abstract types are different
Idea:
Cluster variables based on usage in program operations
Cluster variables based on words in variable names
Differences indicate bugs or poor variable names
U
n
d
e
s
i
r
e
d
 
v
a
r
i
a
b
l
e
 
i
n
t
e
r
a
c
t
i
o
n
s
int totalPrice;
int itemPrice;
int shippingDistance;
totalPrice = itemPrice + 
shippingDistance
;
The compiler issues no warning
A human can tell the abstract types are different
Idea:
Cluster variables based on words in variable names
Cluster variables based on usage in program operations
Differences indicate bugs or poor variable names
U
n
d
e
s
i
r
e
d
 
i
n
t
e
r
a
c
t
i
o
n
s
distance
itemPrice
tax_rate
miles
shippingFee
percent_complete
U
n
d
e
s
i
r
e
d
 
i
n
t
e
r
a
c
t
i
o
n
s
distance
itemPrice
tax_rate
miles
shippingFee
percent_complete
itemPrice + distance
U
n
d
e
s
i
r
e
d
 
i
n
t
e
r
a
c
t
i
o
n
s
distance
itemPrice
tax_rate
miles
shippingFee
percent_complete
int
float
Program types don’t help
U
n
d
e
s
i
r
e
d
 
i
n
t
e
r
a
c
t
i
o
n
s
distance
itemPrice
tax_rate
miles
shippingFee
percent_complete
Language indicates the problem
V
a
r
i
a
b
l
e
s
V
a
r
i
a
b
l
e
 
c
l
u
s
t
e
r
i
n
g
Cluster based on
interactions:
operations
V
a
r
i
a
b
l
e
 
c
l
u
s
t
e
r
i
n
g
Cluster based on
language:
variable names
V
a
r
i
a
b
l
e
 
c
l
u
s
t
e
r
i
n
g
Cluster based on
language:
variable names
Cluster based on
interactions:
operations
 
Problem
 
Actual algorithm:
1.
Cluster based on 
operations
2.
Sub-cluster based on 
names
3.
Rank an 
operation cluster
 as suspicious
if it contains well-defined 
name sub-clusters
C
l
u
s
t
e
r
i
n
g
 
b
a
s
e
d
 
o
n
 
o
p
e
r
a
t
i
o
n
s
Abstract type inference [ISSTA 2006]
int totalCost(int miles, int price, int tax) {
  int year = 2016;
  if ((miles > 1000) && (year > 2000)) {
    int shippingFee = 10;
    return price + tax + shippingFee;
  } else {
    return price + tax;
  }
}
C
l
u
s
t
e
r
i
n
g
 
b
a
s
e
d
 
o
n
 
o
p
e
r
a
t
i
o
n
s
Abstract type inference [ISSTA 2006]
int 
totalCost(int 
miles
, int 
price
, int 
tax
) {
  int 
year
 = 
2016
;
  if ((
miles
 > 
1000
) && (
year
 > 
2000
)) {
    int 
shippingFee 
= 
10
;
    return 
price
 + 
tax
 + 
shippingFee
;
  } else {
    return 
price
 + 
tax
;
  }
}
C
l
u
s
t
e
r
i
n
g
 
b
a
s
e
d
 
o
n
 
v
a
r
i
a
b
l
e
 
n
a
m
e
s
Compute variable name similarity for var
1
 and var
2
1.
Tokenize
 each variable into dictionary words
in_authskey15
 
 {“in”, “authentications”, “key”}
Expand abbreviations, best-effort tokenization
2.
Compute 
word similarity
For all w
1
 
 var
1
 and w
2
 
 var
2
, use WordNet (or edit distance)
3.
Combine word similarity into 
variable name similarity
maxwordsim(w
1
, var
2
) =   max   wordsim(w
1
, w
2
)
varsim(var
1
, var
2
) =  average  maxwordsim(w
1
, var
2
)
w
2 
 var
2
w1 
 var1
R
e
s
u
l
t
s
Ran on grep and Exim mail server
Top-ranked mismatch indicates
an undesired variable interaction in grep
if (depth < delta[tree->label])
   delta[tree->label] = depth;
Loses top 3 bytes of depth
Not exploitable because of guards elsewhere in
program, but not obvious here
R
e
l
a
t
e
d
 
w
o
r
k
Reusing identifier names is error-prone [Lawrie
2007, Deissenboeck 2010, Arnaoudova 2010]
Identifier naming conventions [Simonyi]
Units of measure [Ada, F#, etc.]
Tokenization of variable names [Lawrie 2010,
Guerrouj 2012]
A
p
p
l
y
i
n
g
 
N
L
P
 
t
o
 
s
o
f
t
w
a
r
e
 
e
n
g
i
n
e
e
r
i
n
g
Problems
inadequate
diagnostics
incorrect
operations
 
missing
tests
unimplemented
functionality
NL sources
error
messages
variable
names
code
comments
user
questions
NLP techniques
 document
 similarity
 word
 semantics
 parse
 trees
 translation
[ISSTA 2016]
T
e
s
t
 
o
r
a
c
l
e
s
 
(
a
s
s
e
r
t
 
s
t
a
t
e
m
e
n
t
s
)
A test consists of
an 
input
 (for a unit test, a sequence of calls)
an 
oracle
 (an 
assert
 
statement)
Programmer-written tests
often trivial oracles, or too few tests
Automatic generation of tests:
inputs are easy to generate
oracles remain an open challenge
Goal:
  create test oracles
from what programmers already write
A
u
t
o
m
a
t
i
c
 
t
e
s
t
 
g
e
n
e
r
a
t
i
o
n
Code under test:
public class FilterIterator implements Iterator {
  public FilterIterator(Iterator i, Predicate p) {…}
  public Object next() {…}
}
Automatically generated test:
public void test() {
  FilterIterator i = new FilterIterator(null, null);
  i.next();
}
 
Throws NullPointerException!
Did the tool discover a bug?
 
It could be:
1. Expected behavior
2. Illegal input
3. Implementation bug
/** 
@throws NullPointerException if either
 * 
the iterator or predicate are null 
*/
A
u
t
o
m
a
t
i
c
a
l
l
y
 
g
e
n
e
r
a
t
e
d
 
t
e
s
t
s
A test generation tool outputs:
Failing tests – indicates a program bug
Passing tests – useful for regression testing
Without a specification, the tool 
guesses
whether a given behavior is 
correct
False positives
:  report a failing test
that was due to illegal inputs
False negatives
:  fail to report a failing test
because it might have been due to illegal inputs
P
r
o
g
r
a
m
m
e
r
s
 
w
r
i
t
e
 
c
o
d
e
 
c
o
m
m
e
n
t
s
Javadoc is standard procedure
documentation
/**
 * Checks whether the comparator is now
 * locked against further changes.
 *
 * @throws UnsupportedOperationException
 * if the comparator is locked
 */
protected void checkLocked() {...}
J
a
v
a
d
o
c
 
c
o
m
m
e
n
t
 
a
n
d
 
a
s
s
e
r
t
i
o
n
 
class MyClass {
  ArrayList allFoundSoFar = …;
  boolean canConvert(Object arg) { … }
  /** @throws IllegalArgumentException if the
   *  element is not in the list and is not
   *  convertible.  */
  void myMethod(Object element) { … }
}
 
Condition for exception:  
myMethod
 should throw iff …
   ( !allFoundSoFar.contains(element)
     && !canConvert(element) )
N
o
u
n
s
 
=
 
o
b
j
e
c
t
s
,
 
v
e
r
b
s
 
=
 
o
p
e
r
a
t
i
o
n
s
 
S
 
NP
 
VP
 
V
 
ADJP
 
ADJ
 
PP
The element is greater than the current maximum.
 
NP
 
PX
 
elt
 
compareTo()>0
 
currentMax
 
elt.compareTo(currentMax) > 0
 
noun
 
verb
 
noun
T
e
x
t
 
t
o
 
c
o
d
e
:
 
 
T
o
r
a
d
o
c
u
 
a
l
g
o
r
i
t
h
m
1.
Parse 
@param
, 
@return
, and 
@throws
 expressions
using the Stanford Parser
Parse tree, grammatical relations, cross-references
Challenges:
Often not a well-formed sentence; code snippets as nouns/verbs
Referents are implicit, assumes coding knowledge
2.
Match each subject to a Java element
Pattern matching
Lexical similarity to identifiers, types, documentation
3.
Match each predicate to a Java element
4.
Create assert statement from expressions and methods
R
e
s
u
l
t
s
Accuracy on 857 Javadoc tags:
97% precision
72% recall
Can tune parameters to favor either metric
Pre-processing and pattern-matching are important
Discovered specification errors
Improving test generation tools:
Reduced false positive test failures in EvoSuite by ≥ 1/3
Also improved Randoop, but by less
R
e
l
a
t
e
d
 
w
o
r
k
Heuristics
JCrasher, Crash’n’Check [Csallner’04, Csallner’05]
Randoop [Pacheco’07]
Specifications
ASTOOT [Doong’94]
Models, contracts, …
Properties
Cross-checking oracles [Carzaniga’14]
Metamorphic testing [Chen’13]
Symmetric testing [Gotlieb’03]
Natural language documentation
iComment, 
aComment, @tComment 
[Tan’07, Tan’11, Tan’12]
A
p
p
l
y
i
n
g
 
N
L
P
 
t
o
 
s
o
f
t
w
a
r
e
 
e
n
g
i
n
e
e
r
i
n
g
Problems
inadequate
diagnostics
incorrect
operations
 
missing
tests
unimplemented
functionality
NL sources
error
messages
variable
names
code
comments
user
questions
NLP techniques
 document
 similarity
 word
 semantics
 parse
 trees
 translation
M
a
c
h
i
n
e
 
t
r
a
n
s
l
a
t
i
o
n
English:  “My hovercraft is full of eels.”
Spanish:  “Mi aerodeslizador está lleno de anguilas.”
English:  “Don’t worry.”
Spanish:  “No te preocupes.”
S
e
q
u
e
n
c
e
-
t
o
-
s
e
q
u
e
n
c
e
 
r
e
c
u
r
r
e
n
t
n
e
u
r
a
l
 
n
e
t
w
o
r
k
 
t
r
a
n
s
l
a
t
o
r
s
 
My
 
is
 
hover-
craft
 
full
 
of
 
eels
 
.
 
<START>
 
Mi
 
Mi
 
aerodeslizador
 
aerodeslizador
 
input layer
 
output layer
 
hidden layer
 
 
 
attention mechanism
 
Input, hidden, and output functions
are inferred from training data
using probability maximization.
T
e
l
l
i
n
a
:
 
 
t
e
x
t
 
t
o
 
c
o
m
m
a
n
d
s
 
Training data:  ~5000  ⟨text, command⟩ pairs
Collected manually from webpages, plus cleaning
17 file system utilities, > 200 flags, 9 types of constants
Compound commands:  
()
, 
&&
, 
||
Nesting:  
|
, 
$()
, 
<()
Strings are opaque; no command interpreters (
awk
, 
sed
)
No bash compound statements (
for
)
R
e
s
u
l
t
s
Accuracy for Tellina’s first output:
Structure of command (without constants):  69%
Full command (with constants):  30%
User experiment:
Tellina makes users 22% more efficient
Even though it rarely gives a perfect command
Qualitative feedback
Most participants wanted to continue using Tellina (5.8/7 Likert scale)
Partially-correct answers were helpful, not too hard to correct
Output bash commands are sometimes non-syntactic or subtly wrong
Needs explanation of meaning of output bash commands
R
e
l
a
t
e
d
 
w
o
r
k
Neural machine translation
Sequence-to-sequence learning with neural nets
[Sutskever 2014]
Attention mechanism [Luong 2015]
Semantic parsing
Translating natural language to a formal representation
[Zettlemoyer 2007, Pasupat 2016]
Translating natural language to DSLs
If-this-then-that recipes [Quirk 2015]
Regular expressions [Locascio 2016]
Text editing, flight queries [Desai 2016]
O
t
h
e
r
 
s
o
f
t
w
a
r
e
 
e
n
g
i
n
e
e
r
i
n
g
 
p
r
o
j
e
c
t
s
Analyzing programs before they are written
Gamification (crowd-sourcing) of verification
Evaluating and improving fault localization
Pluggable type-checking for error prevention
… many more:  systems, synthesis, verification, etc.
                 UW is hiring!  Faculty, postdocs, grad students
A
p
p
l
y
i
n
g
 
N
L
P
 
t
o
 
s
o
f
t
w
a
r
e
 
e
n
g
i
n
e
e
r
i
n
g
Problems
inadequate
diagnostics
incorrect
operations
 
missing
tests
unimplemented
functionality
NL sources
error
messages
variable
names
code
comments
user
questions
NLP techniques
 document
 similarity
 word
 semantics
 parse
 trees
 translation
Programming
Programs
User stories
Requirements
Specifications
Tests
Version control
Documentation
Output strings
Variable names
Discussions
Architecture
Process
Models
Documentation
Structure
PL
Issue tracker
A
n
a
l
y
z
i
n
g
 
t
e
x
t
iComment [Tan 2007]:  pattern matching for null
N-gram models: code completion [Hindle 2011],
predict variable names, whitespace [Allemanis
2014]
Mining variable names [Pollock et al.]
Code 
 comments [Sridhara 2010]
DARPA Big Mechanism (read cancer papers)
JSNice [Raychev 2015]:  learn rules for identifiers
and types
A
n
a
l
y
z
i
n
g
 
o
t
h
e
r
 
a
r
t
i
f
a
c
t
s
 
b
y
m
a
c
h
i
n
e
 
l
e
a
r
n
i
n
g
 
o
v
e
r
 
t
h
e
 
p
r
o
g
r
a
m
Tests (dynamic invariant detection)
Mining software repositories
Defect prediction
Code completion
Clone detection
… many, many more
M
a
c
h
i
n
e
 
l
e
a
r
n
i
n
g
 
+
 
s
o
f
t
w
a
r
e
 
e
n
g
i
n
e
e
r
i
n
g
Software is more than source code
Formal program analysis is useful, but insufficient
Analyze and generate all software artifacts
A rich space for further exploration
Programming
Programs
User stories
Requirements
Specifications
Tests
Version control
Documentation
Output strings
Variable names
Discussions
Architecture
Process
Models
Documentation
Structure
PL
Issue tracker
Slide Note
Embed
Share

Exploring the definition and components of software, the process of software development, and key aspects such as requirements, programming, documentation, testing, and version control. The images provide insights into natural language programming and the various stages of software development.

  • Software Development
  • Programming
  • Natural Language
  • Documentation
  • Testing

Uploaded on Oct 10, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Natural language is a Natural language is a programming language programming language Michael D. Ernst UW CSE Joint work with Arianna Blasi, Juan Caballero, Sergio Delgado Castellanos, Alberto Goffi, Alessandra Gorla, Xi Victoria Lin, Deric Pang, Mauro Pezz , Irfan Ul Haq, Kevin Vu, Chenglong Wang, Luke Zettlemoyer, and Sai Zhang

  2. Questions about software Questions about software How many of you have used software? How many of you have written software?

  3. What is software? What is software?

  4. What is software? What is software? A sequence of instructions that perform some task

  5. What is software? What is software? An engineered object amenable to formal analysis A sequence of instructions that perform some task

  6. What is software? What is software? A sequence of instructions that perform some task

  7. What is software? What is software? A sequence of instructions that perform some task

  8. What is software? What is software? A sequence of instructions that perform some task Test cases Version control history Issue tracker Documentation How should it be analyzed?

  9. Programming Requirements Discussions Models Issue tracker Specifications User stories Documentation Version control Programs Process Architecture Tests

  10. Programming Requirements Discussions Models Issue tracker Specifications User stories Documentation Programs Version control PL Structure Process Architecture Documentation Output strings Tests Variable names

  11. Programming Requirements Discussions Models Issue tracker Specifications User stories Documentation Programs Version control PL Structure Process Architecture Documentation Output strings Tests Variable names

  12. Programming Requirements Discussions Models Issue tracker Specifications User stories Documentation Programs Version control PL Structure Process Architecture Documentation Output strings Tests Variable names

  13. Programming Requirements Discussions Models Issue tracker Specifications User stories Documentation Programs Version control PL Structure Process Architecture Documentation Output strings Tests Variable names

  14. Analysis of a natural object Analysis of a natural object Machine learning over executions Version control history analysis Bug prediction Upgrade safety Prioritizing warnings Program repair

  15. Specifications are needed; Specifications are needed; Tests are available but ignored Tests are available but ignored Specs are needed. Many papers start: Given a program and its specification Tests are ignored. Formal verification process: Write the program Test the program Verify the program, ignoring testing artifacts Observation: Programmers embed semantic info in tests Goal: translate tests into specifications Approach: machine learning over executions

  16. Dynamic detection Dynamic detection of likely invariants of likely invariants https://plse.cs.washington.edu/daikon/ [ICSE 1999] Observe values that the program computes Generalize over them via machine learning Result: invariants (as in asserts or specifications) x > abs(y) x = 16*y + 4*z + 3 array a contains no duplicates for each node n,n=n.child.parent graph g is acyclic Unsound, incomplete, and useful

  17. Programming Requirements Discussions Models Issue tracker Specifications User stories Documentation Programs Version control PL Structure Process Architecture Documentation Output strings Variable names Tests

  18. Programming Requirements Discussions Models Issue tracker Specifications User stories Documentation Programs Version control PL Structure Process Architecture Documentation Output strings Variable names Tests

  19. Programming Requirements Discussions Models Issue tracker Specifications User stories Documentation Programs Version control PL Structure Process Architecture Documentation Output strings Variable names Tests

  20. Applying NLP to software engineering Applying NLP to software engineering Problems NL sources NLP techniques inadequate diagnostics error messages document similarity Analyze existing code incorrect operations variable names word semantics missing tests code comments parse trees Generate new code unimplemented functionality user questions translation

  21. Applying NLP to software engineering Applying NLP to software engineering Problems NL sources NLP techniques inadequate diagnostics error messages document similarity [ISSTA 2015] incorrect operations variable names word semantics missing tests code comments parse trees unimplemented functionality user questions translation

  22. Inadequate diagnostic messages Inadequate diagnostic messages Scenario: user supplies a wrong configuration option --port_num=100.0 Problem: software issues an unhelpful error message unexpected system failure unable to establish connection Hard for end users to diagnose Goal: detect such problems before shipping the code Better message: --port_num should be an integer

  23. Challenges for proactive detection Challenges for proactive detection of inadequate diagnostic messages of inadequate diagnostic messages How to triggera configuration error? How to determine the inadequacy of a diagnostic message?

  24. ConfDiagDetectors solutions ConfDiagDetector s solutions How to triggera configuration error? Configuration mutation + run system tests + failed tests triggered errors (We know the root cause.) configuration system tests How to determine the inadequacy of a diagnostic message? Use a NLP technique to check its semantic meaning Similar semantic meanings? Diagnostic messages output by failed tests User manual (Assumption: a manual, webpage, or man page exists.)

  25. When is a message adequate? When is a message adequate? Contains the mutated option nameor value [Keller 08, Yin 11] Mutated option: --percentage-split Diagnostic message: the value of percentage-split should be > 0 Similar semantic meaning as the manual description Mutated option: --fnum Diagnostic message: Number of folds must be greater than 1 User manual description of --fnum: Sets number of folds for cross-validation

  26. Classical document similarity: Classical document similarity: TF TF- -IDF + cosine similarity IDF + cosine similarity 1. Convert document into a real-valued vector 2. Document similarity = vector cosine similarity Vector length = dictionary size, values = term frequency (TF) Example: [2 classical, 8 document, 3 problem, 3 values, ] Problem: frequent words swamp important words Solution: values = TF x IDF (inverse document frequency) IDF = log(total documents / documents with the term) Problem: does not work well on very short documents

  27. Text similarity technique Text similarity technique [Mihalcea 06] Manual description A message The documents have similar semantic meanings if many words in them have similar meanings Example: The program goes wrong 1. 2. Remove all stop words. For each word in the diagnostic message, try to find similar words in the manual. Two sentences are similar, if many words are similar between them. The software fails 3.

  28. Results Results Reported 25 missing and 18 inadequate messages in Weka, JMeter, Jetty, Derby Validation by 3 programmers: 0% false negative rate Tool says message is adequate, humans say it is inadequate 2% false positive rate Tool says message is inadequate, humans say it is adequate Previous best: 16%

  29. Related work Related work Configuration error diagnosis techniques Dynamic tainting [Attariyan 08], static tainting [Rabkin 11], Chronus [Whitaker 04] Troubleshooting an exhibited error rather than detecting inadequate diagnostic messages Software diagnosability improvement techniques PeerPressure [Wang 04], RangeFixer [Xiong 12], ConfErr [Keller 08] and Spex-INJ [Yin 11], EnCore [Zhang 14] Requires source code, usage history, or OS-level support

  30. Applying NLP to software engineering Applying NLP to software engineering Problems NL sources NLP techniques inadequate diagnostics error messages document similarity incorrect operations variable names word semantics [WODA 2015] missing tests code comments parse trees unimplemented functionality user questions translation

  31. Undesired variable interactions Undesired variable interactions int totalPrice; int itemPrice; int shippingDistance; totalPrice = itemPrice + shippingDistance; The compiler issues no warning A human can tell the abstract types are different Idea: Cluster variables based on usage in program operations Cluster variables based on words in variable names Differences indicate bugs or poor variable names

  32. Undesired variable interactions Undesired variable interactions int totalPrice; int itemPrice; int shippingDistance; totalPrice = itemPrice + shippingDistance; The compiler issues no warning A human can tell the abstract types are different Idea: Cluster variables based on words in variable names Cluster variables based on usage in program operations Differences indicate bugs or poor variable names

  33. Undesired interactions Undesired interactions distance itemPrice tax_rate miles shippingFee percent_complete

  34. Undesired interactions Undesired interactions distance itemPrice + distance itemPrice tax_rate miles shippingFee percent_complete

  35. Undesired interactions Undesired interactions float int distance itemPrice tax_rate miles shippingFee percent_complete Program types don t help

  36. Undesired interactions Undesired interactions distance itemPrice tax_rate miles shippingFee percent_complete Language indicates the problem

  37. Variables Variables

  38. Variable clustering Variable clustering Cluster based on interactions: operations

  39. Variable clustering Variable clustering Cluster based on language: variable names

  40. Variable clustering Variable clustering Cluster based on interactions: operations Cluster based on language: variable names Problem Actual algorithm: 1. Cluster based on operations 2. Sub-cluster based on names 3. Rank an operation cluster as suspicious if it contains well-defined name sub-clusters

  41. Clustering based on Clustering based on operations operations Abstract type inference [ISSTA 2006] int totalCost(int miles, int price, int tax) { int year = 2016; if ((miles > 1000) && (year > 2000)) { int shippingFee = 10; return price + tax + shippingFee; } else { return price + tax; } }

  42. Clustering based on Clustering based on operations operations Abstract type inference [ISSTA 2006] int totalCost(int miles, int price, int tax) { int year = 2016; if ((miles > 1000) && (year > 2000)) { int shippingFee = 10; return price + tax + shippingFee; } else { return price + tax; } }

  43. Clustering based on Clustering based on variable names variable names Compute variable name similarity for var1 and var2 1. Tokenize each variable into dictionary words in_authskey15 { in , authentications , key } Expand abbreviations, best-effort tokenization 2. Compute word similarity For all w1 var1 and w2 var2, use WordNet (or edit distance) 3. Combine word similarity into variable name similarity maxwordsim(w1, var2) = max wordsim(w1, w2) w2 var2 varsim(var1, var2) = average maxwordsim(w1, var2) w1 var1

  44. Results Results Ran on grep and Exim mail server Top-ranked mismatch indicates an undesired variable interaction in grep if (depth < delta[tree->label]) delta[tree->label] = depth; Loses top 3 bytes of depth Not exploitable because of guards elsewhere in program, but not obvious here

  45. Related work Related work Reusing identifier names is error-prone [Lawrie 2007, Deissenboeck 2010, Arnaoudova 2010] Identifier naming conventions [Simonyi] Units of measure [Ada, F#, etc.] Tokenization of variable names [Lawrie 2010, Guerrouj 2012]

  46. Applying NLP to software engineering Applying NLP to software engineering Problems NL sources NLP techniques inadequate diagnostics error messages document similarity incorrect operations variable names word semantics missing tests code comments parse trees [ISSTA 2016] unimplemented functionality user questions translation

  47. Test oracles ( Test oracles (assertstatements) statements) A test consists of an input (for a unit test, a sequence of calls) an oracle (an assertstatement) Programmer-written tests often trivial oracles, or too few tests Automatic generation of tests: inputs are easy to generate oracles remain an open challenge Goal: create test oracles from what programmers already write

  48. Automatic test generation Automatic test generation Code under test: public class FilterIterator implements Iterator { public FilterIterator(Iterator i, Predicate p) { } public Object next() { } } * the iterator or predicate are null */ /** @throws NullPointerException if either Automatically generated test: public void test() { FilterIterator i = new FilterIterator(null, null); i.next(); } Did the tool discover a bug? Throws NullPointerException! It could be: 1. Expected behavior 2. Illegal input 3. Implementation bug

  49. Automatically Automatically generated generated tests tests A test generation tool outputs: Failing tests indicates a program bug Passing tests useful for regression testing Without a specification, the tool guesses whether a given behavior is correct False positives: report a failing test that was due to illegal inputs False negatives: fail to report a failing test because it might have been due to illegal inputs

  50. Programmers write code comments Programmers write code comments Javadoc is standard procedure documentation /** * Checks whether the comparator is now * locked against further changes. * * @throws UnsupportedOperationException * if the comparator is locked */ protected void checkLocked() {...}

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#