In this assignment you will use tools to automatically create high-coverage test suites for different programs.
You may work with a partner for this assignment. If you do you must use the same partner for all sub-components of this assignment.
Professors often exhort students to start assignments early. Many students wait until the night before and then complete the assignments anyway. Students thus learn to ignore "start early" suggestions. This is not that sort of suggestion.
However, once running, the tools are completely automated. Thus, you can start them running overnight, sleep and ignore them, and wake up to results. (Unfortunately, there is no way to resume an interrupted EvoSuite session without restarting it from the beginning, so be careful about laptop power and the like.)
This means that even though the assignment may not take hours of your active personal attention, you must start it days before the due date to be able to complete it in time.
It is your responsibility to download, compile, and run the subject programs and associated tools. Getting the code to work is part of the assignment. You can post on the forum for help and compare notes bemoaning various architectures (e.g., windows vs. mac vs. linux, etc.). Ultimately, however, it is your responsibility to read the documentation for these programs and utilities and use some elbow grease to make them work.
There are two subject programs for this assignment. The programs vary in language, desired test type, and associated tooling.
The first subject program is libpng's pngtest program, unchanged from Homework 1. This reuse has two advantages. First, since you are already familiar with the program, it should not take long to get started. Second, you will be able to compare the test cases produced by the black-box tool to the white-box test cases you made manually.
The associated test input generation tool is American Fuzzy Lop, version 2.52b. A mirror copy of the AFL tarball is here, but you should visit the project webpage for documentation. As the (punny? bunny?) name suggests, it is a fuzz testing tool.
You will use AFL to create a test suite of png images to exercise the pngtest program, starting from all of the png images provided with the original tarball. Note that you can terminate AFL as soon as you reach 510 paths_total — AFL does not stop on its own.
AFL claims that one of its key advantages is ease of use. Just follow along with their quick start guide. Extract the AFL tarball and run "make". Then change back to the libpng directory and re-compile libpng with a configure line like:
$ CC=/REPLACE/THIS/TEXT/WITH/YOUR/PARTICULAR/PATH/TO/afl-gcc/(don't_just_copy_this_in_unchanged) ./configure --disable-shared CFLAGS="-static"
Note that you are not using "coverage" or gcov for this homework assignment. I recommend re-extracting the libpng tarball into a new, separate directory just to make sure that you are not mistakenly leaving around any gcov-instrumented files. We only want AFL instrumentation for this assignment.
After that I recommend making a new subdirectory and copying pngtest and all of the test images (including those in subdirectories) to it. You can either seed AFL with all of the default images or with all of your manually-created test images (from HW1); both are full-credit options. Move those images into a further "testcase_dir" subdirectory and then run something like:
$ /REPLACE/THIS/TEXT/WITH/YOUR/path/to/afl-fuzz -i testcase_dir -o findings_dir -- /path/to/pngtest_(not_.c_nor_.png_but_the_executable_you_built) @@
Note that findings_dir is a new folder you make up: afl-fuzz will puts its results there.
Note that you must stop afl-fuzz yourself, otherwise it will run forever — it does not stop on its own. Read the Report instructions below for information on the stopping condition and knowing "when you are done".
Note also that you can resume afl-fuzz if it is interrupted or stopped in the middle (you don't "lose your work"). When you try to re-run it, it will give you a helpful message like:
To resume the old session, put '-' as the input directory in the command line ('-i -') and try again.Just follow its directions.
Note that afl-fuzz will likely abort the first few times you run it and ask you to change some system settings (e.g., echo core | sudo tee /proc/sys/kernel/core_pattern, echo core >/proc/sys/kernel/core_pattern etc.). For example, on Ubuntu systems it often asks twice. Just become root and execute the commands. Note that sudo may not work for some of the commands (e.g., sudo echo core >/proc/sys/kernel/core_pattern will fail because bash will do the > redirection before running sudo so you will not yet have permissions, etc.) — so just become root (e.g., sudo sh) and then execute the commands in a root shell.
The produced test cases are in the findings/queue/ directory. They will not have the .png extension (instead, they will have names like 000154,sr...pos/36,+cov), but you can rename them if you like.
While AFL is running, read the technical whitepaper to learn about how it works and compare the techniques it uses to the basic theory discussed in class.
The second subject program is jsoup (v 1.11.2), a library for extracting real-world HTML data using DOM, CSS and jquery-like methods. A copy of the version of the source code known to work for this assignment is available here; you can also use git clone https://github.com/jhy/jsoup.git. It involves about 18,000 lines of code spread over 60 files. This program is a bit small for this course, but comes with a rich existing test suite. This existing test suite will serve as a baseline for comparison.
The associated test input (and oracle!) generation tool is EvoSuite, version 1.0.5. Mirror copies of evosuite-1.0.5.jar and evosuite-standalone-runtime-1.0.5.jar are available, but you should visit the project webpage for documentation.
EvoSuite generates unit tests (cf. JUnit) for Java programs.
You can install jsoup and use cobertura to assess the statement and branch coverage of its built-in test suite:
$ unzip jsoup-1.11.2.zip $ cd jsoup-master/ $ mvn cobertura:cobertura ... [INFO] Cobertura: Saved information on 253 classes. Results : Tests run: 648, Failures: 0, Errors: 0, Skipped: 11 ... [INFO] Cobertura Report generation was successful. [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS $ firefox target/site/cobertura/index.html
Note that the supplied test suite is of high quality, with 81% line coverage and 77% branch coverage overall.
Once you have EvoSuite installed you can invoke it on jsoup via:
$ cd jsoup-master $ $EVOSUITE -criterion branch -target target/classes/ ... * Writing JUnit test case 'ListLinks_ESTest' to evosuite-tests * Done! * Computation finished
You can now find the EvoSuite tests in evosuite-tests/org/jsoup. In the report you are asked to compare the coverage of the manually-created test suite to the EvoSuite-created test suite. One simple and common way to do this is by inspecting the evosuite-reports/statistics.csv file. However, you are also welcome to try integrating other coverage tools, such as clover or cobertura, with maven, but this is not required (and may be tricky).
You must create a written PDF report reflecting on your experiences with automatic test generation. You must include your name and UM email id (as well as your partner's name and email id, if applicable). In particular:
This does not have to be a formal report; you need only answer the questions in the rubric. However, nothing bad happens if you include extra formality (e.g., sections, topic sentences, etc.).
There is no explicit format (e.g., for headings or citations) required. For example, you may either use an essay structure or a point-by-point list of question answers.
The grading staff will select a small number of excerpts from particularly high-quality or instructive reports and share them with the class. If your report is selected you will receive extra credit.
For this assignment the written report is the primary artifact. There are no programmatic artifacts to submit (however, you will need to run the tools to create the required information for the report). (If you are working with a partner, you must select your partner on Gradescope and submit one copy of the report. However, nothing fatal happens if you mistakenly submit two copies.)
This assignment is perhaps a bit different than the usual EECS homework: instead of you, yourself, doing the "main activity" (i.e., creating test suites), you are asked to invoke tools that carry out that activity for you. This sort of automation (from testing to refactoring to documentation, etc.) is indicative of modern industrial software engineering.
Asking you to submit the generated tests is, in some sense, uninteresting (e.g., we could rerun the tool on the grading server if we just wanted tool-generated tests). Instead, you are asked to write a report that includes information and components that you will only have if you used the tools to generate the tests. Writing reports (e.g., for consumption by your manager or team) is also a common activity in modern industrial software engineering.
In this section we detail previous student issues and resolutions:
Question: Using AFL, I get:
ERROR: PROGRAM ABORT : Test case 'xxxxxx/pngtest' is too big (2.25 MB, limit is 1.00 MB)
Answer: You are mistakenly passing the pngtest executable in as a testcase to itself. Try putting your pngtest executable one directory above from your testcase_dir. In other words, rather than having it in the same folder as your test images (testcase_dir), put it in the directory that testcase_dir is in, and adjust /path/to/pngtest accordingly.
Question: My AFL session has 0 cycles done but the total paths counter does increment. I am worried.
Answer: Everything is fine. It is entirely possible to complete the assignment with 0 cycles done. (AFL can enumerate quite a few candidate test cases — enough for this assignment — before doing a complete cycle.)
Question: My ssh sessions keep getting disconnected. How can I avoid losing my work from a long-running job?
Question: Using AFL, I get:
[-] SYSTEM ERROR : Unable to create './findings_dir/queue/id:000000,orig:pngbar.png'
Answer: This is apparently a WSL issue, but students running Linux who ran into it were able to fix things by making a new, fresh VM.
Question: Using AFL, I get:
[-] PROGRAM ABORT : Program 'pngtest' not found or not executableor
[-] PROGRAM ABORT : Program 'pngnow.png' is not an ELF binary
Answer: You need to use the right /path/to/pngtest instead of just pngtest. You must point to the pngtest executable (produced by "make") and not, for example, pngtest.png.
Question: When I try to run AFL, I get:
[-] PROGRAM ABORT : No instrumentation detected
Answer: You are pointing AFL to the wrong pngtest executable. Double-check the instructions near $ CC=/path/to/afl-gcc ./configure --disable-shared CFLAGS="-static" , rebuild pngtest using "make", and then point to exactly that executable and not a different one.
Question: When I am running AFL, it gets "stuck" at 163 paths.
Answer: In one instance, the student had forgotten the @@ at the end of the AFL command. Double check the command you are running!
Question: When trying to use AFL on Amazon EC2, I get:
[ec2-user@ip-172-31-19-147 afl-2.52b]$ make [*] Checking for the ability to compile x86 code... /bin/sh: cc: command not found Oops, looks like your compiler can't generate x86 code.
Answer: One student reported resolving this via sudo yum groupinstall "Development Tools".
Question: When I try to compile libpng with AFL, I get:
configure: error: C compiler cannot create executables
Answer: You need to provide the full path to the afl-gcc executable, not just the path to hw2/afl-2.52b/.
Question: Some of the so-called "png" files that AFL produces cannot be opened by my image viewer and may not even be valid "png" files at all!
Answer: Yes, you are correct. (Thought question: why are invalid inputs sometimes good test cases for branch coverage?)
Question: Cobertura suggests that the project has 3,726 branches, but EvoSuite seems to think they sum up to 5,149. What gives?
Answer: Good observation! Everything is fine. Double check the lecture slides. What are some ways in which two tools could disagree about the number of "branches" in the same Java classes?
Question: When trying to run EvoSuite, I get:
-criterion: command not found
Answer: This almost always indicates some sort of typo in your export EVOSUITE=... setup line.
Question: Can I terminatie EvoSuite and resume it later?
Answer: Unfortunately, no. I emailed the author who indicated that this is not currenltly possible.
Question: What does "interesting" mean for the report? Similarly, how should we "elaborate" or "reflect"?
Answer: We sympathize with students who are concerned that their grades may not reflect their mastery of the material. Being conscientious is a good trait for CS in general and SE in particular. However, this is not a calculus class. Software engineering involves judgment calls. I am not asking you to compute the derivatives of various polynomials (for which there is one known right answer). You are carrying out activities that are indicative of SE practices.
Suppose you are tasked with evaluating a test generation tool for your company. You are asked to do a pilot study evaluating such a tool and prepare a report for your boss. One of the things the boss wants to know is: "What are the risks associated with using such a tool?" Similarly for the benefits or rewards.
Question: Can I use free cloud computing, like Amazon EC2, for this assignment?
Answer: Sure. Here's what one student had to say:
If you can get over the hump of setting up AWS (pro-tip they have lots of documentation, use google. also here you go), their free-tier EC2 instances can get the AFL job done in a blink. Using their free-tier EC2 Ubuntu instance, I was able to run AFL up to >500 paths in 5 minutes. Setup would probably take less than 30 minutes for a new user. IMO that more than balances the headache of having to run AFL for hours and hours and hours and hours.