Homework Assignment #4 — Defect Detection

In this assignment you will use two different static analysis tools to automatically detect potential defects.

The first static analysis tool is GrammaTech's CodeSonar, which focuses on security issues, as well as memory, resource and concurrency defects. CodeSonar is a commercial tool used in activities such as DO-178B avionics certification; we have obtained an academic license for its use in this class.

The second static analysis tool is Facebook's Infer, which focuses on memory errors, leaks, race conditions, and API issues. Infer is open source.

You may work with a partner for this assignment. If you do you must use the same partner for all sub-components of this assignment. Use Gradescope's partner selection feature. Only one partner needs to submit the report on Gradescope, but if you both do, nothing fatal happens.

Installing, Compiling, Running and Analyzing Legacy Code

Warning: Infer Is Hard To Run

You should use the setup from HW0 to run Infer.

As an optional alternative, many users report that Facebook's Infer tool does not run on the Windows Subsystem for Linux (WSL) or similar shortcuts for using Ubuntu- or Linux-like interfaces. Headless Virtual Box configurations (instructions) are reported to work very well. Officially, however, the HW0 setup is the supported configuration for the class.

It is your responsibility to download, compile, run and analyze the subject program and associated tools (or use the precompiled one: we recommend using the precompiled version since it is known to work with the HW0 setup). Getting the code and tools to work in some manner is part of the assignment. You can post on the forum for help and compare notes bemoaning various architectures (e.g., windows vs. mac vs. linux, etc.). Ultimately, however, it is your responsibility to read the documentation for these programs and tools and use some elbow grease to make them work.

The `lighttpd` webserver

We will make use of the lighttpd webserver (pronounced "lighty"), version 1.4.17, as our primary subject program for this homework. A local mirror copy of lighttpd-1.4.17.tar.gz is available, but you can also get it from the original website. It is about 55,000 lines of code in about 90 files. While somewhat small for this class, some analysis tool licenses have LOC limits or scalability issues, so it was chosen as an indicative compromise.

While not as large or popular as apache, at various points lighttpd has been used by YouTube, xkcd and Wikimedia. Much like apache, old verisons of it have a number of known security vulnerabilities.

The Common Vulnerabilities and Exposures system is one approach for tracking security vulnerabilities. A CVE is basically a formal description, prepared by security experts, of a software bug that has security implications.

There are at least ten CVEs associated with lighttpd 1.4.17 tracked in various lists (such as cvedetails or mitre). For example, CVE-2014-2324 has the description "Multiple directory traversal vulnerabilities in (1) mod_evhost and (2) mod_simple_vhost in lighttpd before 1.4.35 allow remote attackers to read arbitrary files via a .. (dot dot) in the host name, related to request_check_hostname." You can dig into the information listed in, or linked from, a CVE (or just look at subsequent versions of the program where the bug is fixed!) to track down details. Continuing the above example, mod_evhost refers to source file mod_evhost.c, mod_simple_vhost refers to file mod_simple_vhost.c, and request_check_hostname is in file request.c. You will need such information when evaluating the whether or not the tools find these security bugs.

Facebook's `Infer`

The Infer tool is a static analyzer — it detects bugs in programs without running them. The primary website is fbinfer.com.

Unfortunately, some versions of Infer can be obnoxious to build and install, despite their handy installation guide. Also, many users report that Infer does not run on Windows Subsystem for Linux (WSL) or similar setups; a headless Virtual Box configuration (instructions) is recommended.

Instead (but see above about "your responsibility"), a precompiled, runs-on-the-HW0-setup (Ubuntu 16.04.2 LTS GNU/Linux 4.4.0-34-generic x86_64) version of Infer is available locally here (warning: 265 MB; you will likely want to use scp to transfer the .tar.gz file to your HW0 setup and unpack it there). Once you have transferred and unpacked it, the main binary can be found at infer-linux64-v0.13.0/infer/bin/infer. You can use either the pre-compiled one or compile it yourself for full credit (any version at all of Infer is full credit).

`Infer` on `lighttpd`

Once you have Infer built or downloaded, applying it to lighttpd should be as simple as:

$ sudo apt install make
$ sudo apt install python-minimal
$ cd lighttpd-1.4.17 
$ sh configure
$ /path/to/infer/bin/infer run -- make

That should produce output similar to (but everything is fine if you get very different numbers):

make[1]: Leaving directory '/home/weimer/src/lighttpd-1.4.17'
Found 88 source files to analyze in /home/weimer/src/lighttpd-1.4.17/infer-out
Starting analysis...

legend:
  "F" analyzing a file
  "." analyzing a procedure

FFFFFFFFFF.....F...FF....F..FF.F..F....................................................................................FF.................................................F...........F..................F..................F...........................................................................F....................................................................F........................................................F.......F.................F...............F.......FF.............F...................F.............F.........F...F.................F...................................F............FF.F.....F.......................F.....FF..............F..F........FF..........FF.............FF.......FF.F....F......F......FFF..............F.........F...F......F...........F.......FF..........F.F...........F...F..F.......F..F...F........................F..F.........F....F........F.....F..F..........F............F....F...................F................................................................................................................................................

Found 308 issues

src/joblist.c:19: error: NULL_DEREFERENCE
  pointer `srv->joblist->ptr` last assigned on line 16 could be null and is dereferenced at line 19, column 2.
  17.           }
  18.
  19. >         srv->joblist->ptr[srv->joblist->used++] = con;
  20.
  21.           return 0;

	...

Summary of the reports

      NULL_DEREFERENCE: 145
            DEAD_STORE: 94
           MEMORY_LEAK: 65
         RESOURCE_LEAK: 3
  QUANDARY_TAINT_ERROR: 1

(Before you worry about getting different numbers, double-check the prose above: it is fine to get different numbers. Similarly, it is common for this tool to only report a few "types" of defects: if you only see a few "types" of defects, you are running the tool correctly, even if CodeSonar reports more "types" of defects.) You will have to read through the output carefully and analyze the reported defects. Some will be true positives (i.e., real bugs in the code) and some will be false positives (i.e., spurious warnings that do not correspond to real bugs).

`Infer` on `jfreechart`

Running Infer on jfreechart-1.5.0 is similarly direct.

$ cd jfreechart-1.5.0
$ /path/to/infer/bin/infer run -- mvn compile
Capturing in maven mode...
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building JFreeChart 1.5.0

	...

Found 640 source files to analyze in /home/weimer/src/jfreechart-1.5.0/infer-out
Starting analysis...

	...

Found 69 issues

src/main/java/org/jfree/data/xml/DatasetReader.java:73: error: RESOURCE_LEAK
  resource of type `java.io.FileInputStream` acquired to `in` by call to `FileInputStream(...)` at line 72 is not released after line 73.
  71.           throws IOException {
  72.           InputStream in = new FileInputStream(file);
  73. >         return readPieDatasetFromXML(in);
  74.       }

...

Summary of the reports

  THREAD_SAFETY_VIOLATION: 43
         NULL_DEREFERENCE: 22
            RESOURCE_LEAK: 4

While times will vary, some students have reported that running Infer on jfreechart can take five hours.

You can find Infer's output in the infer-out folder.

GrammaTech's `CodeSonar`

GrammaTech's CodeSonar static analyzer is a commercial (not open source) tool for finding defects in program source code or binaries.

GrammaTech has generously provided an academic license so that students in this class can make use of their tool on a limited basis. This license includes a lines-of-code limit, so we have pre-run the analysis and made the results available for everyone to share (running it is very similar to Infer, see below). Analyzing additional code or trying to subvert this license runs the risk of ruining our relationship with that company and thus preventing me from giving future students this experience in subsequent semesters — doing so is thus a significant academic integrity violation.

CodeSonar's output is designed to be shared among an organization's developers. As a result, the analysis is carried out once and then the reports are made available to everyone via a web interface. In this model the team might work together to triage and prioritize the defect reports, assigning some to one developer and some to another. We have already run CodeSonar for you; all students share the analysis results.

Running CodeSonar is as simple as running Infer — basically, instead of running make one runs codesonar make. The actual commands are listed below, but you do not run them (we have already run them for you):

$ cd lighttpd-1.4.17 
$ sh configure 
$ # DO NOT: codesonar analyze lighttpd-1.4.17 host:port make

Similarly, a Java project would be analyzed by (do not run this):

$ cd jfreechart-1.5.0/
$ mvn compile
$ # DO NOT: codesonar analyze jfreechart-1.5.0 host:port cs-java-scan src/main/java/

GrammaTech's `CodeSonar` — Report Locations

Warning: CodeSonar Login

If you do not follow the CodeSonar login procedures properly, including having it email you a password, it will appear to almost work but you won't be able to see any warnings.

FAQ: Secure Connection Problem

If you receive a warning message like this:

Use one of these solutions to resolve it:

Carefully retype the URL so that it says http instead of https — remove the "s"! Even if you type http, some browsers will "correct" it to https.
Use Chrome instead of another browser like Firefox. (Some students report success with, horrors, Microsoft Edge.)

The CodeSonar reports can be found at this location — but you must connect with the UM VPN (students report Nord VPN does not work, etc.) or be on campus!. If you are registered with the course (e.g., via Wolverine Access), click "Forgot Password", enter the username in the email you received, and click the Send Email button. Login information will be emailed to your @umich.edu address.

For example, your instructor would enter weimerw (or weimerx or whatever name CodeSonar assigned; note: no @ or numbers typed in this form field) to have an email with a set-your-password login code sent to weimerw@umich.edu.

(The license agreement places certain restrictions on who can use or see the service.)

Additional Subject Programs

We also make available the CodeSonar analyses of "fan favorites" such as:

cpython-2.7.5 — Python interpreter
curl-7.55.0 — data transfer tool
fastjson-1.2.60 — JSON parser/generator for Java
jsoup-1.11.2 — Java HTML parser
libpng-1.6.34 — portable network graphics library

Reminder that you use "make" and friends for C programs but other build processes, like "mvn", for other languages like Java.

Note that the report requires you to choose an additional program (such as one of those listed above) and analyze it.

FAQ and Troubleshooting

In this section we detail previous student issues and resolutions:

Question: When I run infer.exe run -- make or infer run -- mvn compile I get errors like InferModules__SqliteUtils.Error or Maven command failed.

Answer: The most common issue is that Infer does not always run well on Windows Subsystem for Linux (WSL) or similar shortcuts to get a Linux- or Ubuntu-like interface on another OS. We strongly recommend a headless Virtual Box setup (instructions).
Question: When I try to run Infer, I get cannot execute binary file: Exec format error..

Answer: One student reports: "Finally got it. Turns out I was using a 32 bit processor (i386) so even when I set up my vm as 64 bit, it couldn’t run any x86-64 binaries. Fixed it by installing a 64 bit vdi. https://appuals.com/fix-cannot-execute-binary-file-exec-format-error-ubuntu/
Question: I see Maven command failed: *** mvn compile -P infer-capture when I try to run Infer.

Answer: Some students have seen success with:
```
sudo apt-get install cobertura maven
sudo apt-get install openjdk-8-jdk
```
Others reported that "I ended up having to setup an Ubuntu 16.04 VM in VirtualBox".
Question: I see zero active warnings when I look at CodeSonar.

Answer: When more than five users are logged in anonymously, CodeSonar allows you to navigate but not to view any warnings (this is likely a security feature). You can resolve this issue by logging on using the directions above (which involves having CodeSonar mail you a password). You should see 2,082 warnings for cpython, for example.

Written Report

You must write a detailed PDF report reflecting on your experiences with these static analysis defect detection tools. In particular, all of the following are required:

([Reminder] You can include screenshots from anything associated with CodeSonar, even if you had to log in to get it. Yes, please use screenshots from CodeSonar.)
[Framing] Choose either "large software development organization" (e.g., the SQL Server group at Microsoft) or "small software development organization" (e.g., a dozen-person mobile app tech startup) — indicate your choice. You have been asked by your supervisor to evaluate these two tools and prepare a recommendation: which one, if any, should our organization use?
[Setup] In a few sentences, describe your setup experiences with each applicable tool. (Yes, we know you did not directly set up CodeSonar.) This might include dependencies, installing it, runtime, etc.
[1 point for description]
[Usability] In a few sentences, compare and contrast your usability experiences with each tool. This might include locating the reports, navigating the report or documentation website, etc.
[1 point for infer, 1 point for codesonar, 1 point for contrast, 1 point for other details]
[Overall] Compare and contrast the quality and details of the reports generated by Infer and CodeSonar. At a high level, what did each tool do well? How might each tool be improved? Comment on defect report categorizations (e.g., Reliability, NULL_DEREFERENCE, Security, etc.). Did you observe any "duplicate" defect reports (i.e., the same underlying issue was reported in terms of multiple different symptoms) within the same tool? How much overlap did you observe between the issues reported by the two tools? What are the costs (in general, including developer time, monetary cost, risks, training, etc., and anything else mentioned at any point in class) associated with each tool?
[4 points for infer, 4 points for codesonar, 1 point for categories, 1 point for duplicates, 1 point for overlap, 2 points for costs]
[CVE] Choose two of the CVEs associated with lighttpd. For each tool, describe whether or not that tool reported the issue associated with the CVE (or would otherwise have pointed you to it). You should choose one CVE such that at least one tool points out the CVE in some manner (if you find one); then, separately, you should choose one CVE such that at least one tool misses the CVE in some manner (if you find one). Overall, how effective are these tools at finding security defects?
- Students are sometimes anxious about the intended requirement for this aspect of the report. Here is an alternative explanation. If you pick one CVE that Infer finds, and one CVE that CodeSonar finds, and you describe those, you can get full credit. If you pick one CVE that Tool A finds, and one CVE that neither tool finds, and you describe those, you can get full credit. If you only pick one CVE, instead of the two required by the spec, you will not get full credit. If you choose three CVEs, you will probably not get full credit (regretfully, we have limited grading resources).
[2 point for each cve/tool pairing, 1 point for conclusion]
[lighttpd] Compare and contrast the defect reports produced by the tools for the lighttpd program. Which did you find more useful? Consider false positives, false negatives, and issues that you would consider to have high priority or severity. Include (copy-and-paste, screenshot, etc.) part of one report you found particularly noteworthy (good, bad, complex: your choice) and explain it.
[3 point for compare/contrast, 1 point for inlined report and analysis, 2 point for other insights]
[jfreechart] Compare and contrast the defect reports produced by the tools for the jfreechart program. (6 points, as above.)
[additional] Choose an additional subject program. Compare and contrast the defect reports produced by the tools for that program. (6 points, as above.)
[Conclusion] Conclude your report with an overall recommendation for your supervisor. Identify three important metrics or evaluation criteria and make your recommendation based on them.
[1 point for clear statement of recommendation, 1 point for clear definition of criteria, 4 points for logical support]

The grading staff will select a small number of excerpts from particularly high-quality or instructive reports and share them with the class. If your report is selected you will receive extra credit.

Students are often anxious about a particular length requirement for this report. Unfortunately, some students include large screenshots and others do not, so raw length counts are not as useful as one might hope. Instead, I will say that in HW4 (and HW6, upcoming) we often see varying levels "insight" or "critical thinking" from students. I know that's the sort of wishy-washy phrasing that students hate to hear ("How can I show insight?"). But some of the questions (e.g., "what does cost mean in this report?") highlight places where some students give one direct answer and some students consider many nuances. Often considering many nuances is a better fit (but note that if you make things too long you lose points for not being verbose or readable -- yes, this is tough).

Let us consider an example from the previous homework. Suppose we had asked you whether mutation testing worked or not. Some students might be tempted to respond with something like "Yes, mutation testing worked because it put test suite A ahead of test suite B, and we know A is better than B because it has more statement coverage." That's a decent answer ... but it overlooks the fact that statement coverage is not actually the ground truth. (It is somewhat akin to saying "yes, we know the laser range finder is good because it agrees with my old bent ruler".) Students who give that direct answer get most of the credit, but students who explain that nuance, perhaps finding some other ways to indicate whether mutation testing worked or not, and what that even means, would get the most credit (and will also have longer reports). Students are often concerned about length, but from a grading perspective, the real factor is the insight provided.

Submission

Submit a single PDF report via Gradescope. You must include your name and UM email id (as well as your partner's name and email id, if applicable).

There is no explicit format (e.g., for headings or citations) required. For example, you may either use an essay structure or a point-by-point list of question answers.

FAQ and Troubleshooting

In this section we detail previous student issues and resolutions:

Question: When I try to run infer on lighttpd, it dies when trying to build the first file with an error like:

External Error: *** capture command failed:
*** make
*** existed with code 2
Run the command again with `--keep-going` to try and ignore this error.

Answer: Some students have reported that being careful to run all of the commands, such as with this exact sequences, works:

wget https://web.eecs.umich.edu/~weimerw/481/hw4/infer-linux64-v0.13.0.tar.gz

wget https://web.eecs.umich.edu/~weimerw/481/hw4/lighttpd-1.4.17.tar.gz

tar xzf infer-linux64-v0.13.0.tar.gz
tar xzf lighttpd-1.4.17.tar.gz

cd lighttpd-1.4.17
sh configure
../infer-linux64-v0.13.0/infer/bin/infer run -- make

Question: When I try to run infer, I get some output but then Fatal error: out of memory. What can I do?

Answer: You may need to assign your virtual machine more memory (see HW0 for setup). You may also need to choose a different subject progam. Some students have reported this when analyzing cpython — perhaps a different program would work for you.
Question: When I try to run infer on libpng, it dies when trying to build the first file with an error like:
```
External Error: *** capture command failed:
*** make
*** existed with code 2
Run the command again with `--keep-going` to try and ignore this error.
```
Answer: One student reported that being careful to install all of the required build utilities, such as with this exact sequences, resolved the issue:
```
sudo apt install make
sudo apt install python-minimal
```
Question: When I try to run infer on a program (e.g., lighttpd), it seems to produce no reports or output when I run infer run -- make. Instead, if I look very carefully at the output, hidden near the bottom is a warning like:
```
** Error running the reporting script: 
```
Answer: You must have your directories set up so that infer/bin/infer is "next to" other files like infer/lib/python/report.py. Infer uses those extra scripts to actually generate human-readable reports. If you tried to copy the infer binary somewhere else, it won't work. Make sure you have all of the components of infer in consistent locations.
Question: I'm not certain why "false positives" and "false negatives" are relevant for comparing the tools. I'm also not certain how we tell if something is a false positive or a false negative. Can you elaborate?

Answer: We can elaborate a bit, but I will note that this aspect of the assignment is assessing your mastery of course concepts. That is, why false positives and false negative might be important, and how to distinguish between them, are critical software engineering concepts and might come up on the exam as well. You may want to double-check your notes on these, including on the readings. Now for more detail:

Suppose you are able to determine the false positive rate of one tool — or approximate it. For example, suppose you find that Tool #1 produces twice as many false positives as Tool #2. Well, then you might combine that with some of the reading for the class. For example, the FindBugs reading notes "Our ranking and false positive suppression mechanisms are crucial to keeping the displayed warnings relevant and valuable, so that users don’t start ignoring the more recent, important war" (among other comments on false alarms), while the Coverity reading notes "False positives do matter. In our experience, more than 30% easily cause problems. People ignore the tool. True bugs get lost in the false. A vicious cycle starts where ..." among other comments on false alarms. You might also check out the Parnin and Orso reading, and so on.

Something similar could be considered for false negatives. To give a prose example rather than a reading list this time, a report might include a claim like: "Many developers will dislike a tool that claims to find Race Conditions but actually misses 99% of them. If the tool has that many false negatives, developers will feel they cannot gain confidence in the quality of the software and will instead turn to other techniques, such as testing, that increase confidence in quality assurance." I'm not saying that is a good or a bad argument, but it is an example of the sort of analytic text or line of reasoning that might be applicable here.

Students often wonder: "How do I know if the tool is missing a bug?" Unfortunately, that's a real challenge. There are at least two ways students usually approach that problem, and both require labor or effort. Similarly, determining if a report is a false alarm usually requires reading it and comprehending the code nearby.

I can't really say much more here without giving away too much of what we expect from you on this part of the assignment, but I can reiterate the soundness and completeness (false positives and false negatives) are significant concepts in EECS 481 and that you should include them, coupled with your knowledge of the human element of such tools, in your assessment of the tools.

Homework Assignment #4 — Defect Detection

Installing, Compiling, Running and Analyzing Legacy Code

The lighttpd webserver

Facebook's Infer

Infer on lighttpd

Infer on jfreechart

GrammaTech's CodeSonar

GrammaTech's CodeSonar — Report Locations

Warning: CodeSonar Login

FAQ: Secure Connection Problem

Additional Subject Programs

FAQ and Troubleshooting

Written Report

Submission

FAQ and Troubleshooting

The `lighttpd` webserver

Facebook's `Infer`

`Infer` on `lighttpd`

`Infer` on `jfreechart`

GrammaTech's `CodeSonar`

GrammaTech's `CodeSonar` — Report Locations