SIESTA 2023 – 3rd International Software Engineering Summer School

Biometrics in Software Engineering – the what, the why, and the how

In the last two decades, different types of biometric measures have been adopted by researchers to complement traditional research in software engineering. Starting with eye-tracking, electro dermal activity (EDA), and Electroencephalogram (EEG) – just to name a few, researchers have even adopted brain imaging techniques in the last decade using functional Near Infrared Spectroscopy (fNIRS) and functional Magnetic Resonance Imaging (fMRI). Why are we doing this? What devices are used? What data is recorded? What measures are calculated? And how do we analyze, visualize, and make sense of that data? Those are the type of questions that we will address in this talk.

Download the slides!

Venera Arnaoudova

Washington State University

Dr. Venera Arnaoudova is an Associate Professor in the School of Electrical Engineering and Computer Science (EECS) at Washington State University (WSU). Her research interest is in the domain of software engineering, and in particular, empirical software engineering, program comprehension, software evolution, and poor and good software practices. Her long-term research goal is to understand how human factors impact the cost and quality of the software they develop. In her research, she applies methods from Machine Learning, Natural Language Processing, Machine Translation, Neurocognitive Science, and others to Software Engineering.

Dr. Arnaoudova is part of the editorial board of the Empirical Software Engineering Journal (EMSE) and the Journal of Systems and Software (JSS) and part of the review board IEEE Transactions on Software Engineering (TSE). She serves or has served as a program committee member for ESEC/FSE, ICSE, ICPC, ICSME, MSR, SANER, and others.

At WSU, Dr. Arnaoudova is the Chair of the Computer Science Curriculum Committee and the Industry Engagement Fellow within the Innovation and Research Engagement Office (IREO) for Computer Science and Software Engineering.

From Code Review to Real-World Insights: Dissecting Empirical Research in Software Engineering

In this talk, we'll take an in-depth look at the award-winning study, 'First come first served: the impact of file position on code review.' The study investigates whether the order of files in code review tools can affect the review outcomes. This interactive talk will provide an extensive explanation of the research methods used in the study. We'll analyze each decision that shaped the study, the reasons behind these decisions, and the knowledge gained from this research approach. We'll explore the research paper's two-part process to demonstrate how the study's methods and their foundational principles can be applied to various empirical software engineering research scenarios. The goal of this session is to equip students with the hands-on experience necessary to plan, carry out, and evaluate their empirical studies, and help them approach real-world software development problems from a rigorous scientific perspective. First, we'll explain the two-step procedure that analyzed numerous Pull Requests from well-known Java projects on GitHub. We'll clarify how the data was gathered, managed, and understood. We'll also shed light on how to manage large-scale data effectively and control for variables that might distort the results. Then, we'll examine the design and implementation of a controlled experiment that involved 106 participants. We'll focus on elements like the selection of defects, experimental controls, assignment of treatments, and data analysis. We'll thoroughly discuss the process of transforming raw data into meaningful insights.

Download the slides!

Alberto Bacchelli

University of Zurich

Alberto Bacchelli is an associate professor of Empirical Software Engineering of the Department of Informatics in the Faculty of Business, Economics and Informatics at the University of Zurich, Switzerland. He received his bachelor’s and master’s degrees in computer science from the University of Bologna, Italy, and his Ph.D. in Software Engineering from the Università della Svizzera italiana, Switzerland.

His broader research vision is to innovate software engineering, through fundamental empirical research and software tools. His goal is to increase our scientific knowledge of today's software development and to design, based on strong empirical evidence and theory, the right tools, languages, and development environments for high-quality software engineering. He has received the MSR Ric Holt Early Career Achievement Award 2020 for his seminal contributions to modern code review. He has received the 10-year Most Influential Paper award from SANER for his work on extending IDEs with an AI-based recommendation tool. He is the recipient of in total eight Best Paper Awards and ACM SIGSOFT Distinguished Paper Awards, awarded from the top academic venues in software engineering and computer-supported collaborative work.

The Rise of the Stochastic Parrots for Developers*

As we surf the crest of the latest AI tidal wave, we developers find ourselves savoring our portion of the technological banquet. Enthusiasm is boiling over, with shiny toys such as GitHub Copilot and ChatGPT heralding promises of skyrocketing our productivity. At CodeLounge, we have been using Copilot for a year, while also collecting data about its usage by means of Tako, an IDE telemetry collector & analyzer that we developed. In this talk, starting from this internal data, we will reflect on the impact that such tools had and might have on the structure and daily activity of a development team. Through various programming and data analysis tasks, we will try to shed some light into the line between promise and reality, connecting the dots to incisive critiques from leading researchers who question the capabilities of Large Language Models in a more general context. Can these parrots be our perfect pirate companions, finally taking up the cudgels against the tedium of our craft, like copy-pasting from Stack Overflow? Which strategies and best practices can be adopted as developers to do work, but also as team lead to structure the development process?

*: An anonymous stochastic parrot may have or may have not contributed to this abstract.

Download the slides!

Marco D'Ambros

CodeLounge

Marco D’Ambros is the director of CodeLounge, the center for software research and development of the Software Institute, Università della Svizzera italiana, that combines expertise from academia and from industry. After obtaining a PhD in the area of mining software repositories in 2010, Marco worked at Palantir Technologies until 2018, a leading Silicon Valley data mining firm, helping government organizations and large enterprises making sense of their large and dispersed data, and leading the technical execution of projects around the globe. In 2020, he was awarded the MSR 2010 MIP award for his work on bug prediction.

The Hitchhiker’s Guide to the Ph.D.

This talk will not give the answer to “The Ultimate Question of Life, The Universe, and Everything”. We already know the answer to that question; it is “42”.

However, this talk will answer several other questions that Ph.D. students may ask themselves at various stages of their doctoral studies. Some of these questions are:

How to select a research topic and define a research agenda?
What type of research is (un)common in software engineering?
How should I conduct my research?
Why should I (not) consider working in academia after graduation?

Download the slides!

Andrian (Andi) Marcus

George Mason University

A former Fulbright Scholar, born and raised in Romania, Andrian Marcus is now a Professor in the Department of Computer Science at The George Mason University. He obtained his Ph.D. in Computer Science from Kent State University (US), and has prior degrees in Computer Science and European Studies from The University of Memphis (US) and Babes-Bolyai University (Cluj-Napoca, Romania). In 2021 he was named Distinguished Alumnus of the Department of Mathematics and Computer Science at Babes-Bolyai University.

His research interests are in software engineering, focusing on program understanding and software evolution. He is best known for his work on using text retrieval and analysis techniques on software corpora for supporting comprehension during software evolution. Professionally, he is most proud of his outstanding current and past doctoral students and finds mentoring to be the most rewarding part of the academic career. Over time, their joint research earned six Best/Distinguished Paper Awards and seven Most Influential Paper Awards at software engineering conferences.

His professional service includes serving on the Steering Committees of the IEEE International Conference on Software Maintenance and Evolution (ICSME) and of the IEEE Working Conference on Software Visualization (VISSOFT). He was the General Chair and the Program Co-chair of ICSME in 2011 and 2010, respectively, and Program Co-Chair for other conferences (ICPC'09, VISSOFT'13, SANER'17). He currently serves on the editorial board of the Journal of Software: Evolution and Process. He has also served on the editorial board of the IEEE Transactions on Software Engineering (2014-2018) and the Empirical Software Engineering Journal (2010-2021).

The Rise of the Stochastic Parrots for Developers*

As we surf the crest of the latest AI tidal wave, we developers find ourselves savoring our portion of the technological banquet. Enthusiasm is boiling over, with shiny toys such as GitHub Copilot and ChatGPT heralding promises of skyrocketing our productivity. At CodeLounge, we have been using Copilot for a year, while also collecting data about its usage by means of Tako, an IDE telemetry collector & analyzer that we developed. In this talk, starting from this internal data, we will reflect on the impact that such tools had and might have on the structure and daily activity of a development team. Through various programming and data analysis tasks, we will try to shed some light into the line between promise and reality, connecting the dots to incisive critiques from leading researchers who question the capabilities of Large Language Models in a more general context. Can these parrots be our perfect pirate companions, finally taking up the cudgels against the tedium of our craft, like copy-pasting from Stack Overflow? Which strategies and best practices can be adopted as developers to do work, but also as team lead to structure the development process?

*: An anonymous stochastic parrot may have or may have not contributed to this abstract.

Download the slides!

Andrea Mocci

CodeLounge

Andrea Mocci is a Junior Group Leader at CodeLounge, a R&D group headed by Dr. Marco D’Ambros and Prof. Dr. Michele Lanza. His main responsibilities include being the tech lead for CodeLounge’s team and projects, and doing some development, mostly on the backend side, including machine learning and natural language processing. He is passionate about software design, software quality, and functional programming in many flavors and languages. In the past, Andrea has been a postdoctoral researcher at USI Lugano and at MIT. He got his B.Sc., M.Sc. and PhD at Politecnico di Milano, where he has been advised by Prof. Carlo Ghezzi.

Embarking On a Journey to Conduct Disruptive Research in Software Engineering: Who, What, How

Together, we will take a deep dive and follow a less traveled path through the landscape of research methods in software engineering research. We will explore who our research aims to impact, what kinds of contributions we can expect from our research, and how we can use innovative research methods. Some of the topics we will dive into include design science as a frame for software engineering research, the benefits and challenges of using mixed methods in software engineering, and how to uncover the potential but not always obvious or positive disruptive impacts of novel technologies (such as generative AI and VR) on software engineering practice. After this talk, you should feel more empowered to pursue ambitious and impactful research using innovative research methods.

Download the slides!

Margaret-Anne Storey

University of Victoria

Dr. Margaret-Anne Storey is a Professor of Computer Science at the University of Victoria, Canada and a Canada Research Chair in Human and Social Aspects of Software Engineering. Together with her students and collaborators, she seeks to understand how software tools, communication media, data visualizations, and social theories can be leveraged to improve how software engineers and knowledge workers explore, understand, analyze, create and share complex information and knowledge. She collaborates extensively with large and small software companies to ensure real-world applicability of her research contributions and tools. She is passionate about improving developer productivity and developer experience using novel and insightful research methods.

AI for SE

In recent years, Artificial Intelligence (AI) has become increasingly popular in the field of Software Engineering (SE), where it has been used to automate and improve various SE tasks. One of the most exciting developments in this area is the use of Large Language Models (LLMs) to solve complex SE problems. In this talk, we will explore the different techniques employed to use LLMs in AI for SE, including finetuning, prompt engineering, prompt augmentation with retrieval, and reinforcement learning with human feedback (RLHF) for SE tasks. Will examine the benefits and limitations of each approach and discuss real-world examples of their applications, such as in Automated Program Repair (APR). By the end of the talk, you will have a better understanding of how LLMs can be used in AI for SE and the potential impact they can have on the industry.

Download the slides!

Michele Tufano

Microsoft

Michele Tufano is a Senior Research Scientist in the Data & AI group at Microsoft. With a focus on automating software engineering tasks, Michele designs, trains, and evaluates models and algorithms for tasks such as Automated Test Generation, Program Repair, Software Maintenance, and more. Currently, Michele works towards improving developers' productivity through AI-based tools that use data to understand code, program semantics, and developers' intentions. Before joining Microsoft, Michele earned a Ph.D. degree at William & Mary. His thesis on Neural Machine Translation applied to Software Engineering tasks.

Seeing a Language

Software languages are more than their (in)formal specifications. They are also cultural artefacts that reflect the preferences and idioms of their users. In this session, we will explore some of the challenges and opportunities of seeing a language beyond its grammar. We will start by discussing how to determine the language of a program before parsing it, using various kinds of cues. Then, we will move on to capturing and analysing implicit rules and traditions that govern the style and structure of code, such as naming conventions, interfaces, idioms, implementation patterns, etc. Finally, we will address the issue of measuring and improving the quality of code in a specific language, taking into account its features and best practices. We will contemplate some examples of idiomatic and non-idiomatic code, and some methods for detecting and correcting them. The goal of this session is to demonstrate that seeing a language as a rich and diverse phenomenon can lead to new insights and applications for software engineering.

Download the slides!

Vadim Zaytsev

University of Twente

I am an Associate Professor of software evolution at the University of Twente, working in software analysis, modelling and restructuring since 2004; before that I was a machine code hacker and a railway engineer. My past affiliations include Dutch, Belgian, German and Russian companies and research institutions, as well as volunteer participation at Wikimedia activities. My research interests gravitate towards elicitation of structure in software and improving it by taking advantage of whatever structure is present. At my previous job as a Chief Science Officer, my day to day activities involved developing compilers, writing metaprograms and analysing migration projects. My current focus is on doing industrially relevant research from the academia, teaching several courses, supervising students and developing prototype software. I am also a Programme Director, managing computer science educational programmes of several levels and specialisations, spanning over around 2000 students.

Program

Speakers

Venera Arnaoudova

Alberto Bacchelli

Marco D'Ambros

Andrian (Andi) Marcus

Andrea Mocci

Margaret-Anne Storey

Michele Tufano

Vadim Zaytsev

Students' Talks

Program

Speakers

Venera Arnaoudova

Alberto Bacchelli

Marco D'Ambros

Andrian (Andi) Marcus

Andrea Mocci

Margaret-Anne Storey

Michele Tufano

Vadim Zaytsev

Students' Talks

Social Events

Evening Meet-Up (Extra Activity)

Hike (Extra Activity)

Reception

Social Dinner

Closing Lunch