-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LGVISIUM-66: refactor computation of metrics #76
LGVISIUM-66: refactor computation of metrics #76
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the computations of the metrics is much clearer and I am happy to see that the code became much lighter when it comes to compute the metrics. I added a few minor comments and questions that we can try to address together.
I have this think a bit about which metrics (micro or macro) are the most relevant for us but I do tend to agree with what you wrote on JIRA at the moment.
# Conflicts: # src/stratigraphy/benchmark/score.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few minor comments that would be nice to address. Thank you very much for incorporating my previous comments.
else: | ||
return 0 | ||
|
||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this could be a one lineer
return self.tp / (self.tp + self.fn) if (self.tp + self.fn) > 0 else 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, but in my humble opinion, this would actually make the code less easily readable/understandable, but that's of course just a subjective opinion.
If you don't feel strongly about this, I would keep the current code style.
else: | ||
return 0 | ||
|
||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also be a one liner
|
||
def macro_recall(self) -> float: | ||
"""Compute the macro recall score.""" | ||
if self.metrics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also be a one liner
|
||
def macro_precision(self) -> float: | ||
"""Compute the macro precision score.""" | ||
if self.metrics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also be a one liner
|
||
def macro_f1(self) -> float: | ||
"""Compute the macro F1 score.""" | ||
if self.metrics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also be a one liner
"""Keeps track of a particular metrics for all documents in a dataset.""" | ||
|
||
def __init__(self): | ||
self.metrics: dict[str, Metrics] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense for the metric to carry the filename as an attribute? This would make it possible here only to use a file of metrics that already know their filename and avoid creating the dict just to keep track of the filenames. I also had that issue in the refactoring of the metadata metrics and was also not sure about the best solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd like to keep the filename separate from the Metrics class for two reasons:
- Currently the Metrics class is also used to compute the metrics for the entire dataset (
DatasetMetrics.overall_metrics()
); in this case there is no specific filename that can be associated with the Metrics object - In many places where a Metrics object is instantiated, we currently don't have access to the filename. I don't really want to pass additional parameters around, just for this.
That does not mean that we cannot improve the coupling between metrics and filename, but I think it should probably happen at a different level in the code. Maybe we can have a look at it in the context of your refactoring PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I tend to share your opinion. Then all good for me!
@stijnvermeeren-swisstopo while running the code in the |
@dcleres On purpose, as discussed on Teams ;) |
My first attempt at making everything a bit more coherent.
Through the refactoring, it became apparent that metrics are not computed very consistently.
For metadata and groundwater, we use the micro average, i.e. we count the TP, FP and FN across all documents, and compute the overall F1, recall and precision metrics based on those counts.
However, for now, I've left all the metrics exactly as they were computed previously. The code in the evaluate method in score.py now at least gives a clear overview of how each metric is computed.