There are overall three parts for different purposes.
The main.py file in directory "example" contain a complete example for using the package. The following sections specifically describe the usage for each part.
"commits2sql" is for mining the git repository and then outputting the result into a sqlite file for links predicting. The following code section shows how to use this part:
output_path = 'path/to/store/the/sqlite/file'
input_path = 'path/to/the/target/repository'
miner = DataMiner(output_path, input_path)
miner.mining()
The DataMiner
class wraps the PyDriller, so that it is able to mine a github repository. The two parameters of its
constructor are the path for expected output, and the path for target repository.
After invoking the mining()
method, the repository mining will start. Besides, it is possible to control the range for
mining by parameters of the methods. For example, it is possible to specify the time range by parameter start_date
and
end_date
.
"sql2link" implement the CoEv strategy with various different optimisation. It can predict the traceability links by the sqlite file from the "commits2sql" The following code section shows how to use it:
db_path = path/to/sqlite/file
predictor = TraceabilityPredictor(db_path)
// Predict without filter
predictor.run(LinkStrategy.COCHANGE, LinkBase.FOR_COMMITS)
// Predict with filter on commits
predictor.run_with_filter(LinkStrategy.COCHANGE, LinkBase.FOR_COMMITS)
The only parameter for the TraceabilityPredictor
class specifies the path to the sqlite file. The class provides two
methods: run
and run_with_filter
. Compared with the first one, the second one filters out the abnormal commits
during predicting.
Both of them accept three parameters: strategy
, base
and parameters
. The strategy
specifies the strategies for
predicting, and the following table shows all supported strategies of the package:
Strategy | Value | Description |
---|---|---|
CoEv | LinkStrategy.COCHANGE | Establishing links by co-change relations in commits |
Co-Creation | LinkStrategy.COCREATION | Establishing links by co-creation relations in commits |
Apriori | LinkStrategy.APRIORI | Establishing links using APRIORI algorithm |
As to the base
, it has two possible values: LinkBase.FOR_COMMITS
and LinkBase.FOR_WEEKS
. If we use the first one,
the predicting is based on the co-change relations in each commit. If we use the second one, that is based on the
co-changes in each week.
Base | Value | Description |
---|---|---|
week based | LinkBase.FOR_WEEKS | Methods changed in the same weeks are identified as co-changed |
commit based | LinkBase.FOR_COMMITS | Methods changed in the same commits are identified as co-changed |
The final one, parameters
, is a dictionary, which is used to specify the optional parameters and those for different
strategies. For example, the path of the test codes and the source codes are customisable by delivering the following
dictionary with two key-value pairs
strategy = LinkStrategy.COCHANGE
base = LinkBase.FOR_COMMITS
parameters = {
'tested_path': 'path/to/source/codes',
'test_path': 'path/to/test/codes'
}
predictor.run(strategy, base, parameters)
'evaluator4link' is used to evaluate the precision and recall of the predicted links. The following code section shows how to use it:
db_path = path/to/db
gt_path = path/to/ground/truth
evaluator = LinkEvaluator(db_path, gt_path)
report = evaluator.precision_recall_and_f1_score_of_strategy('links_commits_based_cochanged')
print(report)
The two parameter of the LinkEvaluator
class specifies the paths of the database and the ground truth file
respectively. The precision_recall_and_f1_score_of_strategy
is the method for calculate the precision and recall
value. The parameter of the method is name of the table generated by the "sql2link".