Skip to content

Latest commit

 

History

History
97 lines (68 loc) · 4.22 KB

README.md

File metadata and controls

97 lines (68 loc) · 4.22 KB

The Survival Analysis for Code Technical Debt Items

Authors: Ehsan Zabardast - Kwabena Ebo Bennin - Javier Gonzalez Huerta

Content


Overview

This package contains the scripts used to conduct the study in the paper Further Investigation of the Survivability of Code Technical Debt Items submitted to the Journal of Software: Evolution and Process.

Specifically, the package includes the following:

  • The Python scripts used to collect the raw data extracted from Sonarqube web API.
    • The script is available in 01_sonarqube_data_collection.py
  • The Python scripts used to pre-process the raw data extracted from Sonarqube web API.
    • The script is available in 02_survival_data_process.py
  • The R scripts used to perform the survival analysis and create survival curves.
    • The script is available in 03_survival_analysis.R
  • The pre-processed data from 31 open-source systems from the Apache Foundation that we analyzed in the paper.

The raw data is also available at: DOI

Data

The raw data for the open-source systems is collected via the web API from https://sonarcloud.io

The data used for the analysis of the paper is provided in this git repository under opensource_data directory.

Each system has three csv files:

  • SYSTEM_gitlot.csv for the git log collected from the repository. For more information refer to Git Log for the Repository
  • SYSTEM_issues_data.csv for the raw data extracted from the web API from sonarcloud.
  • SYSTEM_survival_data.csv for the processed data before the survival analysis. Note that this file will be generated as a result of 02_survival_data_process.py

Instructions

Running the Scripts

Start by collecting the raw data from the web API using 01_sonarqube_data_collection.py. Note that there are variables in the script that you have to modify. You also need the git log from the git repository before processing the data. Follow the instructions in Git Log for the Repository for fetching the git log data.

Once you obtained the data, you can run 02_survival_data_process.py. Note that similar to the previous step, you need to change some variables in the script.

Finally, you can use the generated survival data to run the survival analysis in 03_survival_analysis.R

Git Log for the Repository

Using the terminal application, go to the repository's directory on the computer. Use the following command in the repository's directory to fetch the git log in the required format for the script.

git log --all --pretty='format:%H,%ci,%ae'

In order to save the results as a csv file in a certain directory, user the following command:

git log --all --pretty='format:%H,%ci,%ae' > intended/directory/for/the/data/gitlog.csv

Known Issues

Depending on the version of the sonarqube, the format of the dates might change. You need to adjust the script in 02_survival_data_process.py accordingly.

Useful Links

References

The manuscript of the paper is submitted to the Journal of Software: Evolution and Process.

Paper: The citation will be updated when published.

Data: Zabardast, Ehsan, Bennin, Kwabena Ebo, & Gonzalez Huerta, Javier. (2021). Raw Data for Further Investigation of the Survivability of Code Technical Debt Items (Version 2.0.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4650072