Papers and Resources

We firmly believe in reproducible research and want to make our raw materials and de-identified data available to other researchers. We are quite interested in collaborations with other researchers, as well as having other researchers use our data for new analyses or baselines.

Paper Resources
Zachary Karas, Andrew Jahn, Westley Weimer, Yu Huang: Connecting the Dots: Rethinking the Relationship between Code and Prose Writing with Functional Connectivity: Foundations of Software Engineering (ESEC/FSE): 2021
Madeline Endres, Zachary Karas, Xiaosu Hu, Ioulia Kovelman, Westley Weimer: Relating Reading, Visualization, and Coding for New Programmers: A Neuroimaging Study: International Conference on Software Engineering (ICSE): (2021)
Zohreh Sharafi, Yu Huang, Kevin Leach, Westley Weimer: Toward an objective measure of developers' cognitive activities: ACM Trans. on Software Engineering and Methodology (TOSEM): Volume 30, Issue 3, pages 1-40 (2021)
  • Eye-tracking Data — raw data, formatted data, fixation and saccade metrics, and time and accuracy data
Zohreh Sharafi, Ian Bertram, Michael Flanagan, Westley Weimer: Eyes on Code: A Study on Developers' Code Navigation Strategies: IEEE Trans. Software Engineering: 10.1109/TSE.2020.3032064 (2021)
  • Analysis Scripts (or via GitHub) — modules and sample script for processing output from the iTrace plugin, particularly when used in tandem with the FLUORITE plugin
  • Replication Package — analysis code, raw data, and processed data necessary to replicate the study
Ian Bertram, Jack Hong, Yu Huang, Westley Weimer, Zohreh Sharafi: Trustworthiness Perceptions in Code Review: An Eye-tracking Study: Empirical Software Engineering and Measurement (ESEM) Emerging Results and Vision Papers: 2020
Yu Huang, Kevin Leach, Zohreh Sharafi, Nicholas McKay, Tyler Santander, Westley Weimer: Biases and Differences in Code Review using Medical Imaging and Eye-Tracking: Genders, Humans, and Machines: Foundations of Software Engineering (ESEC/FSE): 2020
  • IRB Protocol (HUM00161095) — this IRB protocol includes the explicit handling of deception (in which participants are told that a patch is written by one source but it is actually written by another) and the subsequent debriefing
  • Behavioral and Eye Gaze Data — per-participant keystroke, answer and eye gaze data
  • Task Stimuli — task stimuli images and "loading screen showing purported author gender" images
  • Additional Information — other relevant information, such as the mapping file between experiment block IDs and stimuli
  • De-Identified Medical Imaging Data — see below (internal: imaging files at dijkstra:/home/fmridata/fmri-codesynth2 and dijkstra:/home/fmridata/fmri-data-codereview/fmri-scans/ and separate backup)
Ryan Krueger, Yu Huang, Xinyu Liu, Tyler Santander, Westley Weimer, Kevin Leach: Neurological Divide: An fMRI Study of Prose and Code Writing: International Conference on Software Engineering (ICSE): 2020
  • Artifact Repository (or via GitHub) — repository permitting both (1) the reproduction and validation of our results and experimental protocol, as well as (2) the adaptation of this software for similar human studies of software engineering activities (with or without medical imaging)
  • Keystroke and Answer Data — all participant keystroke and answer data
  • IRB Protocol (HUM00138634) — "Understanding Code Synthesis via Functional Magnetic Resonance Imaging". Includes IRB protocol text, email recruitment, fMRI umbrella consent document, fMRI consent template, participant knowledge assessment document, recruitment poster, recruitment script, and safety screening document.
  • De-Identified Medical Imaging Data — see below (internal: imaging files at dijkstra:/home/fmridata/fmri-codesynth and separate backup)
Yu Huang, Xinyu Liu, Ryan Krueger, Tyler Santander, Xiaosu Hu, Kevin Leach and Westley Weimer: Distilling Neural Representations of Data Structure Manipulation using fMRI and fNIRS: International Conference on Software Engineering (ICSE): 2019 (distinguished paper award)
Benjamin Floyd, Tyler Santander, Westley Weimer: Decoding the representation of code in the brain: An fMRI study of code review and expertise: International Conference on Software Engineering (ICSE) 2017: 175-186 (distinguished paper award)

De-Identified fMRI and fNIRS Scans

We are significantly interested in other researchers making use of the de-identified fMRI and fNRSI scan data gathered at our institution. For example, other researchers might carry out different analyses on the same data or use this data as a baseline in subsequent experiments.

Brain scan data is covered by the Health Insurance Portability and Accountability Act (HIPAA).

Researchers making use of this data: