Authors: Ehsan Zabardast - Kwabena Ebo Bennin - Javier Gonzalez Huerta
This package contains the scripts used to conduct the study in the paper Further Investigation of the Survivability of Code Technical Debt Items submitted to the Journal of Software: Evolution and Process.
Specifically, the package includes the following:
- The
Python
scripts used to collect the raw data extracted from Sonarqube web API.- The script is available in
01_sonarqube_data_collection.py
- The script is available in
- The
Python
scripts used to pre-process the raw data extracted from Sonarqube web API.- The script is available in
02_survival_data_process.py
- The script is available in
- The
R
scripts used to perform the survival analysis and create survival curves.- The script is available in
03_survival_analysis.R
- The script is available in
- The pre-processed data from 31 open-source systems from the Apache Foundation that we analyzed in the paper.
The raw data is also available at:
The raw data for the open-source systems is collected via the web API from https://sonarcloud.io
The data used for the analysis of the paper is provided in this git repository
under opensource_data
directory.
Each system has three csv
files:
SYSTEM_gitlot.csv
for the git log collected from the repository. For more information refer to Git Log for the RepositorySYSTEM_issues_data.csv
for the raw data extracted from the web API from sonarcloud.SYSTEM_survival_data.csv
for the processed data before the survival analysis. Note that this file will be generated as a result of02_survival_data_process.py
Start by collecting the raw data from the web API using 01_sonarqube_data_collection.py
.
Note that there are variables in the script that you have to modify. You also need the git log from the git repository
before processing the data. Follow the instructions in Git Log for the Repository
for fetching the git log data.
Once you obtained the data, you can run 02_survival_data_process.py
. Note that similar to the previous
step, you need to change some variables in the script.
Finally, you can use the generated survival data to run the survival analysis in 03_survival_analysis.R
Using the terminal application, go to the repository's directory on the computer. Use the following command in the repository's directory to fetch the git log in the required format for the script.
git log --all --pretty='format:%H,%ci,%ae'
In order to save the results as a csv
file in a certain directory, user the following command:
git log --all --pretty='format:%H,%ci,%ae' > intended/directory/for/the/data/gitlog.csv
Depending on the version of the sonarqube, the format of the dates might change. You need to adjust the script
in 02_survival_data_process.py
accordingly.
The manuscript of the paper is submitted to the Journal of Software: Evolution and Process.
Paper: The citation will be updated when published.
Data: Zabardast, Ehsan, Bennin, Kwabena Ebo, & Gonzalez Huerta, Javier. (2021). Raw Data for Further Investigation of the Survivability of Code Technical Debt Items (Version 2.0.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4650072