Skip to content

Latest commit

 

History

History
93 lines (74 loc) · 4.35 KB

usages_with_example.md

File metadata and controls

93 lines (74 loc) · 4.35 KB

Usage and example

There are overall three parts for different purposes.

1. Overall

The main.py file in directory "example" contain a complete example for using the package. The following sections specifically describe the usage for each part.

2. commits2sql

"commits2sql" is for mining the git repository and then outputting the result into a sqlite file for links predicting. The following code section shows how to use this part:

output_path = 'path/to/store/the/sqlite/file'
input_path = 'path/to/the/target/repository'
miner = DataMiner(output_path, input_path)
miner.mining()

The DataMiner class wraps the PyDriller, so that it is able to mine a github repository. The two parameters of its constructor are the path for expected output, and the path for target repository.

After invoking the mining() method, the repository mining will start. Besides, it is possible to control the range for mining by parameters of the methods. For example, it is possible to specify the time range by parameter start_date and end_date.

3. sql2link

"sql2link" implement the CoEv strategy with various different optimisation. It can predict the traceability links by the sqlite file from the "commits2sql" The following code section shows how to use it:

db_path = path/to/sqlite/file
predictor = TraceabilityPredictor(db_path)
// Predict without filter
predictor.run(LinkStrategy.COCHANGE, LinkBase.FOR_COMMITS)
// Predict with filter on commits 
predictor.run_with_filter(LinkStrategy.COCHANGE, LinkBase.FOR_COMMITS)

The only parameter for the TraceabilityPredictor class specifies the path to the sqlite file. The class provides two methods: run and run_with_filter. Compared with the first one, the second one filters out the abnormal commits during predicting.

Both of them accept three parameters: strategy, base and parameters. The strategy specifies the strategies for predicting, and the following table shows all supported strategies of the package:

Strategy Value Description
CoEv LinkStrategy.COCHANGE Establishing links by co-change relations in commits
Co-Creation LinkStrategy.COCREATION Establishing links by co-creation relations in commits
Apriori LinkStrategy.APRIORI Establishing links using APRIORI algorithm

As to the base, it has two possible values: LinkBase.FOR_COMMITS and LinkBase.FOR_WEEKS. If we use the first one, the predicting is based on the co-change relations in each commit. If we use the second one, that is based on the co-changes in each week.

Base Value Description
week based LinkBase.FOR_WEEKS Methods changed in the same weeks are identified as co-changed
commit based LinkBase.FOR_COMMITS Methods changed in the same commits are identified as co-changed

The final one, parameters, is a dictionary, which is used to specify the optional parameters and those for different strategies. For example, the path of the test codes and the source codes are customisable by delivering the following dictionary with two key-value pairs

strategy = LinkStrategy.COCHANGE
base = LinkBase.FOR_COMMITS
parameters = {
    'tested_path': 'path/to/source/codes', 
    'test_path': 'path/to/test/codes'
}
predictor.run(strategy, base, parameters)

4. evaluator4link

'evaluator4link' is used to evaluate the precision and recall of the predicted links. The following code section shows how to use it:

db_path = path/to/db
gt_path = path/to/ground/truth
evaluator = LinkEvaluator(db_path, gt_path)
report = evaluator.precision_recall_and_f1_score_of_strategy('links_commits_based_cochanged')
print(report)

The two parameter of the LinkEvaluator class specifies the paths of the database and the ground truth file respectively. The precision_recall_and_f1_score_of_strategy is the method for calculate the precision and recall value. The parameter of the method is name of the table generated by the "sql2link".