Lectures and Details

Meetings

Monday/Wednesday, 3:00-4:30pm, 1303 EECS

Instructor

Westley Weimer

Office Hours

To Be Determined (Monday or Wednesday)

Course Forum

Piazza

Prerequisites

Students are expected to have enough background in software engineering, programming languages, compilers, security, and theory to be able to understand research papers from SE and PL conferences. One previous SE or PL course at the undergraduate or graduate level should suffice for the motivated student.

Overview

This special topics course is a research seminar covering the intersection of course software engineering topics (e.g., program comprehension and maintenance) and associated human factors (particularly those revealed by medical imaging). The course will focus on active research areas in software engineering.

Special emphasis will be placed on instructor-led in-person discussions of contextual aspects of papers that may be less evident to students (e.g., based on the instructor's first-hand knowledge of the authors or research situation). These may include:

The impact or influence the paper had on subsequent work
Whether some aspects of the paper were accidental or planned
Why certain reasonable-seeming approaches were not taken
How difficult certain types of the work were to carry out

Such discussions can help students to view papers as part of a collaborative, human activity takes place over a span of time. Conversations about how to construct well-designed follow-on work (e.g., balancing risk and rewarding, lifting input assumptions, etc.), at the level of the graduate Preliminary Exam, will be encouraged.

Software Engineering and Cognition

This class will include papers that explicitly cover human cognitive aspects of software engineering and programming languages, such as studies that make use of eye tracking or medical imaging.

It is not expected that incoming students have any expertise in such cognitive aspects. Instead, relevant cognitive aspects will be discussed (including in the third paper, below) such that students can interpret their relevance to computer science.

Graduate Depth Area

As of January 20, 2023, it is confirmed that this course satisfies the Software Depth area requirement. (It does not satisfy the breadth requirement.)

Structure and Presentations

Randomized Presentations

For each paper discussion I will choose up to three students at random to give a five-minute presentation.

Each selected student will give a five-minute in-person presentation that at least (1) summarizes the work, (2) lists its strengths, and (3) lists ways in which it might be improved in a subsequent publication. Including other information, such as your opinion about the work or its relation to other projects, is recommended.

The goals of this randomized approach are to encourage all participants to read the material thoroughly in advance, to provide jumping-off points for detailed discussions, and to allow me to evaluate participation.

There is no project component to this course — your primary responsibility is to prepare five minutes of cogent discussion for each paper.

Grading Rubric

Grading will be based on in-person participation.

40% Paper Presentations
40% Discussions
20% Professionalism

In more detail, Paper Presentations will be graded on the summarization of the work (20%), the enumeration of strengths (30%), and areas and mechanisms for improvement (50%).

While paper summarizations are common in many settings, this course places a particular emphasis on identifying areas for improvement that may not be mentioned by the authors of a work. For example, a paper may present an analysis that only works on single-threaded programs, but the paper may not call out this weakness explicitly. Instead, readers may note that all of the benchmarks were single-threaded and that the analysis relies on the values of variables remaining constant. Readers might identify this area for improvement and also identify a potential mechanism for overcoming it (e.g., perhaps the analysis could be augmented with manual annotations about which variables are modified by other threads, or perhaps a dynamic lockset approach like Eraser could be used to provide missing information, etc.).
This identification of paper areas for improvement that are not explicit but are instead implicit (in the "negative space" of the paper, as it were) is an important skill, and one that we will practice together. Similarly, we will work together on identifying how to bring together topics and ideas from computer science to address such weaknesses.

The Discussion component will be assessed by noting when students contribute to the conversation and analysis of a paper (outside of their presentations).

Student contributions will be noted on a three-point scale, with two points for a typical discussion element, three points for a more insightful analysis, and one point for a less fruitful interjection.
Discussion points are graded on insight and relevant, rather than on length. A theme of the course is that sometimes a single sentence or observation can significantly influence how a paper is interpreted. The focus is not on long-winded reiterations of the main paper points, but on identifying "interactions" (e.g., "given that they mention in Section 3 that they are recruiting only students, their use of this industrial task in Section 5 may not be indicative ...", etc.) or areas for improvement.
Particularly insightful contributions will be described as such by the course staff (e.g., to help students who may not be as familiar with this sort of course to identify the desired approach).
Each student can earn at most three points per class meeting. This is to prevent one student from dominating a discussion while also encouraging students to speak multiple times per meeting.
If there are N course meetings, full discussion credit corresponds to N earned points, distributed in any manner.
Students can make up the discussion component in Office Hours, earning up to three points per session.
The relative simplicity of discussion grading (informally: check-plus, check, check-minus) also helps the professor track it without interrupting the flow of the conversation.

The Professionalism component will be assessed in terms of helping to maintain a welcoming environment for everyone and demonstrating our shared values. Participants are assumed to be professionals by default, but may lose points in certain negative circumstances. Examples of less-professional conduct include:

Rudely interrupting or speaking over someone else. Instead, we favor giving speakers time to complete their thoughts. (The professor will gently move along any stalled or overly-verbose discussion.)
Suggesting that the quality of a particular paper or statement is in any way related to the demographics of its authors. Instead, we assess papers on the merits of their presented ideas and evidence.
Intentionally using defamatory or belittling phrasing to describe people or groups. Some of the papers we will read and discuss tackle practitioner biases or expertise differences. We want to factually discuss such biases, with a goal of mitigating, rather than perpetuating them.

Informally, one sufficient condition for an "A" grade is to do a good job whenever you are called on to present a paper, to interject insightful comments into the discussions of other papers, and to treat others respectfully.

The grading cutoffs are:

70% = C-
73% = C
77% = C+
80% = B-
83% = B
87% = B+
90% = A-
93% = A
98% = A+

Reading List

Read Two Ahead

You are responsible for reading and preparing for the next two not-yet-discussed papers for any given class meeting.

On average, we will discuss three papers a week (devoting perhaps 50 minutes to each paper). However, some papers may merit more or less discussion.

It is often possible to find presentation slides or video recordings associated with a paper.

The next paper to discuss may be shown below a highlighted line. You should read and prepare at least the next two-to-three papers before the next meeting.

Software Research and Threats to Validity

Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, Peter F. Sweeney: Producing wrong data without doing anything obviously wrong! ASPLOS 2009: 265-276
Janet Siegmund, Norbert Siegmund, Sven Apel: Views on Internal and External Validity in Empirical Software Engineering. ICSE 2015: 9-19 (distinguished paper award)

Neuroscience and Language

Jingyuan E. Chen, Gary H. Glover: Functional Magnetic Resonance Imaging Methods. Neuropsychology Review, vol. 25, pages 289-313 (2015)
Ioulia Kovelman, Stephanie A. Baker, Laura-Ann Petitto: Bilingual and Monolingual Brains Compared: A Functional Magnetic Resonance Imaging Investigation of Syntactic Processing and a Possible "Neural Signature" of Bilingualism. J. Cogn. Neurosci. 20(1): 153-169 (2008)

Software Engineering and the Brain

Janet Siegmund, Christian Kästner, Sven Apel, Chris Parnin, Anja Bethmann, Thomas Leich, Gunter Saake, André Brechmann: Understanding understanding source code with functional magnetic resonance imaging. ICSE 2014: 378-389
Benjamin Floyd, Tyler Santander, Westley Weimer: Decoding the representation of code in the brain: an fMRI study of code review and expertise. ICSE 2017: 175-186 (distinguished paper award)
Yu Huang, Xinyu Liu, Ryan Krueger, Tyler Santander, Xiaosu Hu, Kevin Leach, Westley Weimer: Distilling neural representations of data structure manipulation using fMRI and fNIRS. ICSE 2019: 396-407 (distinguished paper award)
Ryan Krueger, Yu Huang, Xinyu Liu, Tyler Santander, Westley Weimer, Kevin Leach: Neurological Divide: An fMRI Study of Prose and Code Writing. International Conference on Software Engineering (ICSE): 2020
Zachary Karas, Andrew Jahn, Westley Weimer, Yu Huang: Connecting the Dots: Rethinking the Relationship between Code and Prose Writing with Functional Connectivity: Foundations of Software Engineering (ESEC/FSE): 2021

Software Expertise and the Brain

Norman Peitek, Annabelle Bergum, Maurice Rekrut, Jonas Mucke, Matthias Nadig, Chris Parnin, Janet Siegmund, Sven Apel: Correlates of programmer efficacy and their link to experience: a combined EEG and eye-tracking study. ESEC/SIGSOFT FSE 2022: 120-131
Ikutani Y, Kubo T, Nishida S, Hata H, Matsumoto K, Ikeda K, Nishimoto S. Expert Programmers Have Fine-Tuned Cortical Representations of Source Code. eNeuro. 2021 Jan 28. 8(1): ENEURO.0405-20.2020.

Implications — Code Comprehension

Sarah Fakhoury, Devjeet Roy, Yuzhan Ma, Venera Arnaoudova, Olusola O. Adesope: Measuring the impact of lexical and structural inconsistencies on developers' cognitive load during bug localization. Empir. Softw. Eng. 25(3): 2140-2178 (2020)
Norman Peitek, Sven Apel, Chris Parnin, André Brechmann, Janet Siegmund: Program Comprehension and Code Complexity Metrics: An fMRI Study. ICSE 2021: 524-536 (distinguished paper award)

Implications — Code Review and Trust

Tyler J. Ryan, Gene M. Alarcon, Charles Walter, Rose F. Gamble, Sarah A. Jessup, August A. Capiola, Marc D. Pfahler: Trust in Automated Software Repair - The Effects of Repair Source, Transparency, and Programmer Experience on Perceived Trustworthiness and Trust. HCI (29) 2019: 452-470
Yu Huang, Kevin Leach, Zohreh Sharafi, Nicholas McKay, Tyler Santander, Westley Weimer: Biases and Differences in Code Review using Medical Imaging and Eye-Tracking: Genders, Humans, and Machines: Foundations of Software Engineering (ESEC/FSE): 2020

Implications — Learning and Teaching

Ryan Shaun Joazeiro de Baker, Sidney K. D'Mello, Ma. Mercedes T. Rodrigo, Arthur C. Graesser: Better to be frustrated than bored: The incidence, persistence, and impact of learners' cognitive-affective states during interactions with three different computer-based learning environments. Int. J. Hum. Comput. Stud. 68(4): 223-241 (2010)
Naser Al Madi, Cole S. Peterson, Bonita Sharif, Jonathan I. Maletic: From Novice to Expert: Analysis of Token Level Effects in a Longitudinal Eye Tracking Study. ICPC 2021: 172-183
Nischal Shrestha, Colton Botta, Titus Barik, Chris Parnin: Here we go again: why is it difficult for developers to learn another programming language? Commun. ACM 65(3): 91-99 (2022) (distinguished paper award)
Madeline Endres, Zachary Karas, Xiaosu Hu, Ioulia Kovelman, Westley Weimer: Relating Reading, Visualization, and Coding for New Programmers: A Neuroimaging Study: International Conference on Software Engineering (ICSE): (2021)
- Optional: Madeline Endres, Madison Fansher, Priti Shah, Westley Weimer: To Read or To Rotate? Comparing the Effects of Technical Reading Training and Spatial Skills Training on Novice Programming Ability: Foundations of Software Engineering (ESEC/FSE): 2021
Chris Parnin, Alessandro Orso: Are automated debugging techniques actually helping programmers? ISSTA 2011: 199-209
Software Engineering and Deep Learning
Ru Zhang, Wencong Xiao, Hongyu Zhang, Yu Liu, Haoxiang Lin, Mao Yang: An empirical study on program failures of deep learning jobs. ICSE 2020: 1159-1170 (distinguished paper award)
Hung Viet Pham, Shangshu Qian, Jiannan Wang, Thibaud Lutellier, Jonathan Rosenthal, Lin Tan, Yaoliang Yu, Nachiappan Nagappan: Problems and Opportunities in Training Deep Learning Software Systems: An Analysis of Variance. ASE 2020: 771-783 (distinguished paper award)

Program Repair — Techniques and Criticism

Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, Westley Weimer: A Systematic Study of Automated Program Repair: Fixing 55 out of 105 bugs for $8 Each. International Conference on Software Engineering (ICSE) 2012: 3-13
- Optional: Westley Weimer, ThanhVu Nguyen, Claire Le Goues, Stephanie Forrest: Automatically finding patches using genetic programming. ICSE 2009: 364-374 (distinguished paper award, ten-year impact award)
Dongsun Kim, Jaechang Nam, Jaewoo Song, Sunghun Kim: Automatic patch generation learned from human-written patches. ICSE 2013: 802-811 (distinguished paper award)
Martin Monperrus: A critical review of "automatic patch generation learned from human-written patches": essay on the problem statement and the evaluation of automatic software repair. ICSE 2014: 234-242
Fan Long, Martin Rinard: Staged program repair with condition synthesis. ESEC/SIGSOFT FSE 2015: 166-178
Sergey Mechtaev, Jooyong Yi, Abhik Roychoudhury: Angelix: scalable multiline program patch synthesis via symbolic analysis. ICSE 2016: 691-701

OPTIONAL: We are done reading papers formally. You don't have to read any more.

Thomas Durieux, Fernanda Madeiral, Matias Martinez, Rui Abreu: Empirical review of Java program repair tools: a large-scale experiment on 2,141 bugs and 23,551 repair attempts. ESEC/SIGSOFT FSE 2019: 302-313 (distinguished paper award)

Joel Lehman, Jeff Clune, Dusan Misevic, Christoph Adami, Lee Altenberg, Julie Beaulieu, Peter J. Bentley, Samuel Bernard, Guillaume Beslon, David M. Bryson, Nick Cheney, Patryk Chrabaszcz, Antoine Cully, Stéphane Doncieux, Fred C. Dyer, Kai Olav Ellefsen, Robert Feldt, Stephan Fischer, Stephanie Forrest, Antoine Frénoy, Christian Gagné, Léni K. Le Goff, Laura M. Grabowski, Babak Hodjat, Frank Hutter, Laurent Keller, Carole Knibbe, Peter Krcah, Richard E. Lenski, Hod Lipson, Robert MacCurdy, Carlos Maestre, Risto Miikkulainen, Sara Mitri, David E. Moriarty, Jean-Baptiste Mouret, Anh Nguyen, Charles Ofria, Marc Parizeau, David P. Parsons, Robert T. Pennock, William F. Punch, Thomas S. Ray, Marc Schoenauer, Eric Schulte, Karl Sims, Kenneth O. Stanley, François Taddei, Danesh Tarapore, Simon Thibault, Richard A. Watson, Westley Weimer, Jason Yosinski: The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities. Artif. Life 26(2): 274-306 (2020)