Lectures and Details
Monday/Wednesday, 3:00-4:30pm, 1303 EECS
To Be Determined (Monday or Wednesday)
Students are expected to have enough background in software
engineering, programming languages, compilers, security, and
theory to be able to understand research papers from SE and PL
conferences. One previous SE or PL course at the undergraduate or
graduate level should suffice for the motivated student.
This special topics course is a research seminar covering the
intersection of course software engineering topics (e.g., program
comprehension and maintenance) and associated human factors (particularly
those revealed by medical imaging).
The course will focus on active research areas in software engineering.
Special emphasis will be placed on instructor-led in-person discussions
of contextual aspects of papers that may be less evident to students (e.g.,
based on the instructor's first-hand knowledge of the authors or research
situation). These may include:
- The impact or influence the paper had on subsequent work
- Whether some aspects of the paper were accidental or planned
- Why certain reasonable-seeming approaches were not taken
- How difficult certain types of the work were to carry out
Such discussions can help students to view papers as part of a
collaborative, human activity takes place over a span of time.
Conversations about how to construct well-designed follow-on work (e.g.,
balancing risk and rewarding, lifting input assumptions, etc.), at the
level of the graduate Preliminary Exam, will be encouraged.
Software Engineering and Cognition
This class will include papers that explicitly cover human cognitive
aspects of software engineering and programming languages, such as studies
that make use of eye tracking or medical imaging.
It is not expected that incoming students have any expertise in such
cognitive aspects. Instead, relevant cognitive aspects will be discussed
(including in the third paper, below) such that students can interpret
their relevance to computer science.
Graduate Depth Area
As of January 20, 2023, it is confirmed that this course satisfies the
area requirement. (It does not satisfy the breadth
Structure and Presentations
For each paper discussion I will choose up to three students at
random to give a five-minute presentation.
Each selected student will give a five-minute in-person presentation that
at least (1) summarizes the work, (2) lists its strengths, and (3) lists
ways in which it might be improved in a subsequent publication. Including
other information, such as your opinion about the work or its relation to
other projects, is recommended.
The goals of this randomized approach are to encourage all participants to
read the material thoroughly in advance, to provide jumping-off points for
detailed discussions, and to allow me to evaluate participation.
There is no project component to this course — your primary
responsibility is to prepare five minutes of cogent discussion for each
Grading will be based on in-person participation.
40% Paper Presentations
In more detail, Paper Presentations will be graded on the summarization
of the work (20%), the enumeration of strengths (30%), and areas and
mechanisms for improvement (50%).
- While paper summarizations are common in many settings, this course
places a particular emphasis on identifying areas for improvement that
may not be mentioned by the authors of a work. For example, a paper may
present an analysis that only works on single-threaded programs, but the
paper may not call out this weakness explicitly. Instead, readers may
note that all of the benchmarks were single-threaded and that the analysis
relies on the values of variables remaining constant. Readers might
identify this area for improvement and also identify a potential
mechanism for overcoming it (e.g., perhaps the analysis could be
augmented with manual annotations about which variables are modified by
other threads, or perhaps a dynamic lockset approach like Eraser could be
used to provide missing information, etc.).
- This identification of paper areas for improvement that are not
explicit but are instead implicit (in the "negative space" of the paper,
as it were) is an important skill, and one that we will practice
together. Similarly, we will work together on identifying how to bring
together topics and ideas from computer science to address such
The Discussion component will be assessed by noting when
students contribute to the conversation and analysis of a
paper (outside of their presentations).
- Student contributions will be noted on a three-point scale, with
two points for a typical discussion element, three points for a
more insightful analysis, and one point for a less fruitful
- Discussion points are graded on insight and relevant, rather than
on length. A theme of the course is that sometimes a single sentence or
observation can significantly influence how a paper is interpreted. The
focus is not on long-winded reiterations of the main paper points, but
on identifying "interactions" (e.g., "given that they mention in
Section 3 that they are recruiting only students, their use of this
industrial task in Section 5 may not be indicative ...", etc.) or
areas for improvement.
- Particularly insightful contributions will be described as such
by the course staff (e.g., to help students who may not be as familiar
with this sort of course to identify the desired approach).
- Each student can earn at most three points per class meeting. This
is to prevent one student from dominating a discussion while also
encouraging students to speak multiple times per meeting.
- If there are N course meetings, full discussion credit
corresponds to N earned points, distributed in any manner.
- Students can make up the discussion component in Office Hours,
earning up to three points per session.
- The relative simplicity of discussion grading (informally:
check-plus, check, check-minus) also helps the professor track
it without interrupting the flow of the conversation.
The Professionalism component will be assessed in terms of helping
to maintain a welcoming environment for everyone and demonstrating our
shared values. Participants are assumed to be professionals by default, but
may lose points in certain negative circumstances. Examples of
less-professional conduct include:
- Rudely interrupting or speaking over someone else. Instead, we favor
giving speakers time to complete their thoughts. (The professor will
gently move along any stalled or overly-verbose discussion.)
- Suggesting that the quality of a particular paper or statement is in
any way related to the demographics of its authors. Instead, we assess
papers on the merits of their presented ideas and evidence.
- Intentionally using defamatory or belittling phrasing to describe
people or groups. Some of the papers we will read and discuss tackle
practitioner biases or expertise differences. We want to factually
discuss such biases, with a goal of mitigating, rather than perpetuating
Informally, one sufficient
condition for an "A" grade is to do a good job whenever you are called on
to present a paper, to interject insightful comments into the
discussions of other papers, and to treat others respectfully.
The grading cutoffs are:
- 70% = C-
- 73% = C
- 77% = C+
- 80% = B-
- 83% = B
- 87% = B+
- 90% = A-
- 93% = A
- 98% = A+
Read Two Ahead
You are responsible for reading and preparing for the next
two not-yet-discussed papers for any given class meeting.
On average, we will discuss three papers a week (devoting perhaps 50
minutes to each paper). However, some papers may merit more or less
It is often possible to find presentation slides or video recordings
associated with a paper.
The next paper to discuss may be shown below a highlighted line. You should
read and prepare at least the next two-to-three papers before the next
Software Research and Threats to Validity
Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, Peter F. Sweeney:
wrong data without doing anything obviously wrong!
ASPLOS 2009: 265-276
Janet Siegmund, Norbert Siegmund, Sven Apel:
Views on Internal and External Validity in Empirical Software
ICSE 2015: 9-19
(distinguished paper award)
Neuroscience and Language
Jingyuan E. Chen, Gary H. Glover:
Functional Magnetic Resonance Imaging Methods.
Neuropsychology Review, vol. 25, pages 289-313 (2015)
Ioulia Kovelman, Stephanie A. Baker, Laura-Ann Petitto:
Bilingual and Monolingual Brains Compared: A Functional Magnetic Resonance
Imaging Investigation of Syntactic Processing and a Possible "Neural
Signature" of Bilingualism.
J. Cogn. Neurosci. 20(1): 153-169 (2008)
Software Engineering and the Brain
Janet Siegmund, Christian Kästner, Sven Apel, Chris Parnin, Anja Bethmann,
Thomas Leich, Gunter Saake, André Brechmann:
Understanding understanding source code with functional magnetic resonance
imaging. ICSE 2014: 378-389
Benjamin Floyd, Tyler Santander, Westley Weimer:
Decoding the representation of code in the brain: an fMRI study of code
review and expertise. ICSE 2017: 175-186 (distinguished paper award)
Yu Huang, Xinyu Liu, Ryan Krueger, Tyler Santander, Xiaosu Hu, Kevin Leach,
Distilling neural representations of data structure manipulation using fMRI
and fNIRS. ICSE 2019: 396-407 (distinguished paper award)
Ryan Krueger, Yu Huang, Xinyu Liu, Tyler Santander, Westley Weimer, Kevin
Neurological Divide: An fMRI Study of Prose and Code Writing.
International Conference on Software Engineering (ICSE): 2020
Zachary Karas, Andrew Jahn, Westley Weimer, Yu Huang:
Connecting the Dots:
Rethinking the Relationship between Code and Prose Writing with Functional
Connectivity: Foundations of Software Engineering (ESEC/FSE): 2021
Software Expertise and the Brain
Norman Peitek, Annabelle Bergum, Maurice Rekrut, Jonas Mucke, Matthias
Nadig, Chris Parnin, Janet Siegmund, Sven Apel:
Correlates of programmer efficacy and their link to experience: a combined
EEG and eye-tracking study. ESEC/SIGSOFT FSE 2022: 120-131
Ikutani Y, Kubo T, Nishida S, Hata H, Matsumoto K, Ikeda K, Nishimoto S.
Expert Programmers Have Fine-Tuned Cortical Representations of Source Code.
eNeuro. 2021 Jan 28. 8(1): ENEURO.0405-20.2020.
Implications — Code Comprehension
Sarah Fakhoury, Devjeet Roy, Yuzhan Ma, Venera Arnaoudova, Olusola O.
Measuring the impact of lexical and structural inconsistencies on
developers' cognitive load during bug localization. Empir. Softw. Eng.
25(3): 2140-2178 (2020)
Norman Peitek, Sven Apel, Chris Parnin, André Brechmann, Janet Siegmund:
Program Comprehension and Code Complexity Metrics: An fMRI Study. ICSE
(distinguished paper award)
Implications — Code Review and Trust
Tyler J. Ryan, Gene M. Alarcon, Charles Walter, Rose F. Gamble, Sarah A.
Jessup, August A. Capiola, Marc D. Pfahler:
Trust in Automated Software Repair - The Effects of Repair Source,
Transparency, and Programmer Experience on Perceived Trustworthiness and
Trust. HCI (29) 2019: 452-470
Yu Huang, Kevin Leach, Zohreh Sharafi, Nicholas McKay, Tyler Santander,
Biases and Differences in Code Review using Medical Imaging
and Eye-Tracking: Genders, Humans, and Machines: Foundations of Software
Engineering (ESEC/FSE): 2020
Implications — Learning and Teaching
Ryan Shaun Joazeiro de Baker, Sidney K. D'Mello, Ma. Mercedes T. Rodrigo,
Arthur C. Graesser:
Better to be frustrated than bored: The incidence, persistence, and impact
of learners' cognitive-affective states during interactions with three
different computer-based learning environments. Int. J. Hum. Comput. Stud.
68(4): 223-241 (2010)
Naser Al Madi, Cole S. Peterson, Bonita Sharif, Jonathan I. Maletic:
From Novice to Expert: Analysis of Token Level Effects in a Longitudinal
Eye Tracking Study. ICPC 2021: 172-183
Nischal Shrestha, Colton Botta, Titus Barik, Chris Parnin:
Here we go again: why is it difficult for developers to learn another
programming language? Commun. ACM 65(3): 91-99 (2022) (distinguished paper
Madeline Endres, Zachary Karas, Xiaosu Hu, Ioulia Kovelman, Westley Weimer:
Relating Reading, Visualization, and Coding for New Programmers: A
Neuroimaging Study: International Conference on Software Engineering
Chris Parnin, Alessandro Orso:
Are automated debugging techniques actually helping programmers?
Software Engineering and Deep Learning
Ru Zhang, Wencong Xiao, Hongyu Zhang, Yu Liu, Haoxiang Lin, Mao Yang:
An empirical study on program failures of deep learning jobs. ICSE 2020:
1159-1170 (distinguished paper award)
Hung Viet Pham, Shangshu Qian, Jiannan Wang, Thibaud Lutellier, Jonathan
Rosenthal, Lin Tan, Yaoliang Yu, Nachiappan Nagappan:
Problems and Opportunities in Training Deep Learning Software Systems: An
Analysis of Variance. ASE 2020: 771-783 (distinguished paper award)
Program Repair — Techniques and Criticism
Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, Westley Weimer:
A Systematic Study of Automated Program Repair: Fixing 55 out of 105 bugs
for $8 Each.
International Conference on Software Engineering (ICSE) 2012:
Dongsun Kim, Jaechang Nam, Jaewoo Song, Sunghun Kim:
Automatic patch generation learned from human-written patches.
2013: 802-811 (distinguished paper award)
A critical review of "automatic patch generation learned from
human-written patches": essay on the problem statement and the
evaluation of automatic software repair.
ICSE 2014: 234-242
Fan Long, Martin Rinard:
Staged program repair with condition synthesis.
ESEC/SIGSOFT FSE 2015:
Sergey Mechtaev, Jooyong Yi, Abhik Roychoudhury:
Angelix: scalable multiline program patch synthesis via symbolic analysis.
ICSE 2016: 691-701
- OPTIONAL: We are done reading papers formally. You don't have to read
Thomas Durieux, Fernanda Madeiral, Matias Martinez, Rui Abreu:
Empirical review of Java program repair tools: a large-scale experiment on
2,141 bugs and 23,551 repair attempts. ESEC/SIGSOFT FSE 2019: 302-313
(distinguished paper award)
Joel Lehman, Jeff Clune, Dusan Misevic, Christoph Adami, Lee Altenberg,
Julie Beaulieu, Peter J. Bentley, Samuel Bernard, Guillaume Beslon, David
M. Bryson, Nick Cheney, Patryk Chrabaszcz, Antoine Cully, Stéphane
Doncieux, Fred C. Dyer, Kai Olav Ellefsen, Robert Feldt, Stephan Fischer,
Stephanie Forrest, Antoine Frénoy, Christian Gagné, Léni K. Le Goff, Laura
M. Grabowski, Babak Hodjat, Frank Hutter, Laurent Keller, Carole Knibbe,
Peter Krcah, Richard E. Lenski, Hod Lipson, Robert MacCurdy, Carlos
Maestre, Risto Miikkulainen, Sara Mitri, David E. Moriarty, Jean-Baptiste
Mouret, Anh Nguyen, Charles Ofria, Marc Parizeau, David P. Parsons, Robert
T. Pennock, William F. Punch, Thomas S. Ray, Marc Schoenauer, Eric Schulte,
Karl Sims, Kenneth O. Stanley, François Taddei, Danesh Tarapore, Simon
Thibault, Richard A. Watson, Westley Weimer, Jason Yosinski:
The Surprising Creativity of Digital Evolution: A Collection of Anecdotes
from the Evolutionary Computation and Artificial Life Research
Artif. Life 26(2): 274-306 (2020)