SIESTA 2018 International Summer School on Software Engineering

Invited lecturers

The Road Ahead for Mining Software Repositories and Software Analytics

Ahmed E. Hassan

Queen's University,
Kingston, Canada

Big Data With Small Budget in Practice

Marco D'Ambros

CodeLounge,
Lugano, Switzerland

A weak evaluation has killed my research work! What should I do next?

Massimiliano Di Penta

University of Sannio,
Benevento, Italy

Search Based Test-case Generation

Paolo Tonella

USI,
Lugano, Switzerland

Ahmed E. Hassan

Ahmed E. Hassan is the Canada Research Chair in Software Analytics and an NSERC Industrial Research Chair in Software Engineering for Ultra Large Scale (ULS) systems at Queen’s University in Canada. Dr. Hassan spearheaded the organization and creation of the Mining Software Repositories (MSR) conference and its research community. He also serves on the editorial boards of IEEE Transactions on Software Engineering, Springer Journal of Empirical Software Engineering, and PeerJ Computer Science. Early tools and techniques developed by Dr. Hassan’s team are already integrated into products used by millions of users worldwide. Dr. Hassan industrial experience includes helping architect the Blackberry wireless platform at Blackberry, and working for IBM Research at the Almaden Research Lab and the Computer Research Lab at Nortel Networks. Dr. Hassan is the named inventor of patents at several jurisdictions around the world including the United States, Europe, India, Canada, and Japan

The Road Ahead for Mining Software Repositories and Software Analytics

Source control repositories, bug repositories, archived communications, deployment logs, and code repositories are examples of software repositories that are commonly available for most software projects. The Mining Software Repositories (MSR) field analyzes and cross-links the rich data available in these repositories to uncover interesting and actionable information about software systems. By transforming these static record-keeping repositories into active ones, we can guide decision processes in modern software projects. For example, data in source control repositories, traditionally used to archive code, could be linked with data in bug repositories to help practitioners propagate complex changes and to warn them about risky code based on prior changes and bugs. In this talk, I will present a brief history of the MSR field and discuss several recent achievements and results of using MSR techniques to support software research and practice. I will then discuss the various opportunities and challenges that lie in the road ahead for this important and emerging field. I will also discuss some common pitfalls when performing analytical modelling.

Marco D'Ambros

Marco D'Ambros is the director of CodeLounge, the center for research and development, part of the Software Institute at Università della Svizzera italiana (Lugano, Switzerland). Before joining CodeLounge, Marco worked for more than 5 years as a Forward Deployed Engineer at Palantir Technologies, helping goverment organizations and large enterprises making sense of their large and dispersed data, and leading the technical execution of projects around the globe. Marco received MSc degrees from both Politecnico di Milano, Italy (cum laude) and the University of Illinois at Chicago. He earned his PhD in software engineering from Università della Svizzera italiana, Switzerland.

Big Data With Small Budget in Practice

You got this large dataset potentially full of insights that can dramatically help your customer retention or boost your research. However, analyzing -or even ingesting- it on your laptop is impossible and you don't have a cluster at your disposal, or you have it but setting up hadoop/spark & co will take a significant amount of time that you don't have. In this tutorial we will see how to spin up an arbitrary large spark cluster on AWS, how to quickly configure the various components (e.g., network rules, proxies) to use it right away, and which options are available to tune the resources/cost tradeoff. We will then perform an analysis of a large dataset of NYC taxi data, using Zeppelin, a web-base notebook supporting a number of interpreters. During the analysis we will pay attention to performance implications of different parts of the analysis, discussing best practices of working with large datasets.

Massimiliano Di Penta

Massimiliano Di Penta is an associate professor at the University of Sannio, Italy. His research interests include software maintenance and evolution, mining software repositories, empirical software engineering, and search-based software engineering. He authored over 250 papers appeared in international journals, conferences, and workshops. He serves and has served in the organizing and program committees of more than 100 conferences, including ICSE, FSE, ASE, ICSME, and MSR. He is in the editorial board of the Empirical Software Engineering Journal edited by Springer, and of the Journal of Software: Evolution and Processes edited by Wiley, and has served the editorial board of the IEEE Transactions on Software Engineering.

A weak evaluation has killed my research work! What should I do next?

Software engineering is nowadays considered a quite mature scientific discipline, for which a rigorous application of empirical methods is as a key component. At the same time, software engineering has similarities with other scientific disciplines such as physics or other engineering fields, but also with social sciences, being software development quite a human-centric activity. This requires a particular attention when selecting and combining empirical evaluation methodologies to be applied in different circumstances. By describing patterns and anti patterns, and following a 'learn-by-example' approach, this tutorial will overview quantitative and qualitative methods suitable to be applied in the context of software engineering research. The tutorial will be combined with a laboratory session in which the participants will have the opportunity to apply the learned techniques and report the results of their analyses.

Paolo Tonella

Paolo Tonella is Full Professor at the Faculty of Informatics and at the Software Institute of Università della Svizzera Italiana (USI) in Lugano, Switzerland. He is also Honorary Professor at University College London, UK. Until mid 2018 he has been Head of Software Engineering at Fondazione Bruno Kessler, Trento, Italy. He received his PhD degree in Software Engineering from the University of Padova, in 1999, with the thesis "Code Analysis in Support to Software Maintenance". In 2011 he was awarded the ICSE 2001 MIP (Most Influential Paper) award, for his paper: "Analysis and Testing of Web Applications". He is the author of "Reverse Engineering of Object Oriented Code", Springer, 2005. He participated in several industrial and EU projects on software analysis and testing. He wrote over 150 peer reviewed conference/workshop papers and over 50 journal papers. His H-index (according to Google scholar) is 43. Paolo Tonella was Program Chair of ICSM 2011 and ICPC 2007. He was General Chair of ISSTA 2010, ICSM 2012 and he is General Chair of SSBSE 2015. Among the others, he served in the program committees of ICSE, FSE, ICSM, ISSTA, ICST, ICPC. In 2007, Paolo Tonella was ranked among the top-50 Software Engineering scholars in an article published by the Communications of the ACM (vol. 50, n. 6, pp. 81-85, June 2007). He is associate editor of TSE and he is in the editorial board of EMSE and JSPE. His current research interests include code analysis, web and object oriented testing, search based test case generation.

Search Based Test-case Generation

Search based algorithms are used to address the problem of automatically generating the test data necessary to ensure a given test adequacy level is met (e.g., branch coverage). Search algorithms resort to a fitness function in order to select the most promising solutions during search space exploration. This makes them quite robust with respect to program properties (e.g., infeasible paths) that are hard to deal with using static analysis. In this talk, I will introduce the basic principles behind the search based algorithms most widely used in software testing. I will describe how they are instantiated to solve the test data generation problem and I will present the extensions required when the program under test is an object oriented program. I will conclude with an overview of recent attempts to combine search based test case generation and dynamic symbolic execution.

Program

Registration
- Time: 8:00 - 8:30
Opening
- Time: 8:30 - 9:00
Massimiliano
Di Penta
A weak evaluation has killed my research work! What should I do next?

Software engineering is nowadays considered a quite mature scientific discipline, for which a rigorous application of empirical methods is as a key component. At the same time, software engineering has similarities with other scientific disciplines such as physics or other engineering fields, but also with social sciences, being software development quite a human-centric activity. This requires a particular attention when selecting and combining empirical evaluation methodologies to be applied in different circumstances. By describing patterns and anti patterns, and following a 'learn-by-example' approach, this tutorial will overview quantitative and qualitative methods suitable to be applied in the context of software engineering research. The tutorial will be combined with a laboratory session in which the participants will have the opportunity to apply the learned techniques and report the results of their analyses.
- Time: 9:00 - 10:30
Coffee Break
- Time: 10:30 - 11:00
Massimiliano
Di Penta
A weak evaluation has killed my research work! What should I do next?

Software engineering is nowadays considered a quite mature scientific discipline, for which a rigorous application of empirical methods is as a key component. At the same time, software engineering has similarities with other scientific disciplines such as physics or other engineering fields, but also with social sciences, being software development quite a human-centric activity. This requires a particular attention when selecting and combining empirical evaluation methodologies to be applied in different circumstances. By describing patterns and anti patterns, and following a 'learn-by-example' approach, this tutorial will overview quantitative and qualitative methods suitable to be applied in the context of software engineering research. The tutorial will be combined with a laboratory session in which the participants will have the opportunity to apply the learned techniques and report the results of their analyses.
- Time: 11:00 - 12:30
Lunch Break
- Time: 12:30 - 14:00
Massimiliano
Di Penta
[LAB] A weak evaluation has killed my research work! What should I do next?

Software engineering is nowadays considered a quite mature scientific discipline, for which a rigorous application of empirical methods is as a key component. At the same time, software engineering has similarities with other scientific disciplines such as physics or other engineering fields, but also with social sciences, being software development quite a human-centric activity. This requires a particular attention when selecting and combining empirical evaluation methodologies to be applied in different circumstances. By describing patterns and anti patterns, and following a 'learn-by-example' approach, this tutorial will overview quantitative and qualitative methods suitable to be applied in the context of software engineering research. The tutorial will be combined with a laboratory session in which the participants will have the opportunity to apply the learned techniques and report the results of their analyses.
- Time: 14:00 - 15:30
Coffee Break
- Time: 15:30 - 16:00
Massimiliano
Di Penta
[LAB] A weak evaluation has killed my research work! What should I do next?

Software engineering is nowadays considered a quite mature scientific discipline, for which a rigorous application of empirical methods is as a key component. At the same time, software engineering has similarities with other scientific disciplines such as physics or other engineering fields, but also with social sciences, being software development quite a human-centric activity. This requires a particular attention when selecting and combining empirical evaluation methodologies to be applied in different circumstances. By describing patterns and anti patterns, and following a 'learn-by-example' approach, this tutorial will overview quantitative and qualitative methods suitable to be applied in the context of software engineering research. The tutorial will be combined with a laboratory session in which the participants will have the opportunity to apply the learned techniques and report the results of their analyses.
- Time: 16:00 - 17:30

Ahmed
E. Hassan
The Road Ahead for Mining Software Repositories and Software Analytics

Source control repositories, bug repositories, archived communications, deployment logs, and code repositories are examples of software repositories that are commonly available for most software projects. The Mining Software Repositories (MSR) field analyzes and cross-links the rich data available in these repositories to uncover interesting and actionable information about software systems. By transforming these static record-keeping repositories into active ones, we can guide decision processes in modern software projects. For example, data in source control repositories, traditionally used to archive code, could be linked with data in bug repositories to help practitioners propagate complex changes and to warn them about risky code based on prior changes and bugs. In this talk, I will present a brief history of the MSR field and discuss several recent achievements and results of using MSR techniques to support software research and practice. I will then discuss the various opportunities and challenges that lie in the road ahead for this important and emerging field. I will also discuss some common pitfalls when performing analytical modelling.
- Time: 9:00 - 10:30
Coffee Break
- Time: 10:30 - 11:00
Ahmed
E. Hassan
The Road Ahead for Mining Software Repositories and Software Analytics

Source control repositories, bug repositories, archived communications, deployment logs, and code repositories are examples of software repositories that are commonly available for most software projects. The Mining Software Repositories (MSR) field analyzes and cross-links the rich data available in these repositories to uncover interesting and actionable information about software systems. By transforming these static record-keeping repositories into active ones, we can guide decision processes in modern software projects. For example, data in source control repositories, traditionally used to archive code, could be linked with data in bug repositories to help practitioners propagate complex changes and to warn them about risky code based on prior changes and bugs. In this talk, I will present a brief history of the MSR field and discuss several recent achievements and results of using MSR techniques to support software research and practice. I will then discuss the various opportunities and challenges that lie in the road ahead for this important and emerging field. I will also discuss some common pitfalls when performing analytical modelling.
- Time: 11:00 - 12:30
Lunch Break
- Time: 12:30 - 13:30
SIESTA Panel
- Time: 13:30 - 14:00
Students
Student Talks
Students' session (15 min. each talk, chair: Csaba Nagy)

Daniela Girardi: Sensing Developers' Emotion Using Biometric Sensors

Pingfan Kong: Fix Recommendation for Crashed Android Apps

Daniel Russo: Socio-Technical Software Engineering: a Quality-Architecture-Process Perspective

Luca Traini: A Multi-objective Framework for Effective Performance Fault Injection

Anna-Katharina Wickert: Automated Rule Inference for Cryptographic API

Nan Yang: Modular software design for high tech systems: context and possible directions
- Time: 14:00 - 15:30
Social Dinner
- Time: 19:30 -

Paolo Tonella
Search Based Test-case Generation

Search based algorithms are used to address the problem of automatically generating the test data necessary to ensure a given test adequacy level is met (e.g., branch coverage). Search algorithms resort to a fitness function in order to select the most promising solutions during search space exploration. This makes them quite robust with respect to program properties (e.g., infeasible paths) that are hard to deal with using static analysis. In this talk, I will introduce the basic principles behind the search based algorithms most widely used in software testing. I will describe how they are instantiated to solve the test data generation problem and I will present the extensions required when the program under test is an object oriented program. I will conclude with an overview of recent attempts to combine search based test case generation and dynamic symbolic execution.
- Time: 9:00 - 10:30
Coffee Break
- Time: 10:30 - 11:00
Paolo Tonella
Search Based Test-case Generation

Search based algorithms are used to address the problem of automatically generating the test data necessary to ensure a given test adequacy level is met (e.g., branch coverage). Search algorithms resort to a fitness function in order to select the most promising solutions during search space exploration. This makes them quite robust with respect to program properties (e.g., infeasible paths) that are hard to deal with using static analysis. In this talk, I will introduce the basic principles behind the search based algorithms most widely used in software testing. I will describe how they are instantiated to solve the test data generation problem and I will present the extensions required when the program under test is an object oriented program. I will conclude with an overview of recent attempts to combine search based test case generation and dynamic symbolic execution.
- Time: 11:00 - 12:30
Lunch Break
- Time: 12:30 - 14:00
Marco D'Ambros
Big Data With Small Budget in Practice

You got this large dataset potentially full of insights that can dramatically help your customer retention or boost your research. However, analyzing -or even ingesting- it on your laptop is impossible and you don't have a cluster at your disposal, or you have it but setting up hadoop/spark & co will take a significant amount of time that you don't have. In this tutorial we will see how to spin up an arbitrary large spark cluster on AWS, how to quickly configure the various components (e.g., network rules, proxies) to use it right away, and which options are available to tune the resources/cost tradeoff. We will then perform an analysis of a large dataset of NYC taxi data, using Zeppelin, a web-base notebook supporting a number of interpreters. During the analysis we will pay attention to performance implications of different parts of the analysis, discussing best practices of working with large datasets.
- Time: 14:00 - 15:30
Coffee Break
- Time: 15:30 - 16:00
Marco D'Ambros
Big Data With Small Budget in Practice

You got this large dataset potentially full of insights that can dramatically help your customer retention or boost your research. However, analyzing -or even ingesting- it on your laptop is impossible and you don't have a cluster at your disposal, or you have it but setting up hadoop/spark & co will take a significant amount of time that you don't have. In this tutorial we will see how to spin up an arbitrary large spark cluster on AWS, how to quickly configure the various components (e.g., network rules, proxies) to use it right away, and which options are available to tune the resources/cost tradeoff. We will then perform an analysis of a large dataset of NYC taxi data, using Zeppelin, a web-base notebook supporting a number of interpreters. During the analysis we will pay attention to performance implications of different parts of the analysis, discussing best practices of working with large datasets.
- Time: 16:00 - 17:30

Printable version of the program is available here .

Program

Invited lecturers

Ahmed E. Hassan

Marco D'Ambros

Massimiliano Di Penta

Paolo Tonella

Ahmed E. Hassan

The Road Ahead for Mining Software Repositories and Software Analytics

Marco D'Ambros

Big Data With Small Budget in Practice

Massimiliano Di Penta

A weak evaluation has killed my research work! What should I do next?

Paolo Tonella

Search Based Test-case Generation

Program

Registration

Opening

MassimilianoDi Penta

A weak evaluation has killed my research work! What should I do next?

Coffee Break

MassimilianoDi Penta

A weak evaluation has killed my research work! What should I do next?

Lunch Break

MassimilianoDi Penta

[LAB] A weak evaluation has killed my research work! What should I do next?

Coffee Break

MassimilianoDi Penta

[LAB] A weak evaluation has killed my research work! What should I do next?

AhmedE. Hassan

The Road Ahead for Mining Software Repositories and Software Analytics

Coffee Break

AhmedE. Hassan

The Road Ahead for Mining Software Repositories and Software Analytics

Lunch Break

SIESTA Panel

Students

Student Talks

Social Dinner

Paolo Tonella

Search Based Test-case Generation

Coffee Break

Paolo Tonella

Search Based Test-case Generation

Lunch Break

Marco D'Ambros

Big Data With Small Budget in Practice

Coffee Break

Marco D'Ambros

Big Data With Small Budget in Practice

Massimiliano
Di Penta

Massimiliano
Di Penta

Massimiliano
Di Penta

Massimiliano
Di Penta

Ahmed
E. Hassan

Ahmed
E. Hassan