Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confidence scores #1349

Open
BramVanroy opened this issue Feb 22, 2024 · 4 comments
Open

Confidence scores #1349

BramVanroy opened this issue Feb 22, 2024 · 4 comments

Comments

@BramVanroy
Copy link
Contributor

Is your feature request related to a problem? Please describe.
It could be fruitful to have access to the confidence of the model in its predictions.

Describe the solution you'd like
In an ideal world, confidence scores for everything (i.e. for POS, DEP, MORPH, etc.) on the token level. But I am mostly interested in sentence-level measures, e.g. dependency tree confidence scores for the whole tree.

Describe alternatives you've considered
I have not found alternative parsers that give easy access to this feature.

Additional context
This is a continuation of our discussion on Twitter: https://twitter.com/stanfordnlp/status/1760026653370331647

@AngledLuffa
Copy link
Collaborator

I do not believe this will make it into the next release, which is imminent, but I will work on it next week or the week after if possible.

@barrelltech
Copy link

Did this ever get worked on? Is the data available somehow?

@AngledLuffa
Copy link
Collaborator

so i found a use case for POS on my end, at least

but i'm just not sure what would the interface look like. would this be something attached to all tokens? maybe there would need to be a flag supplied to the pipeline which says, add confidence scores for whichever model? i think it is unlikely most people would have a use for this.

and then, if that's supplied, would that be an additional comment in the 10th column of the conllu output and an additional field in the json output?

@AngledLuffa
Copy link
Collaborator

for dependencies, it's an even trickier question. upos & xpos can at least be judged by how close in score other possible taggings were. (not sure that works for features, since it outputs multiple features at once.) for dependencies, it scores each possible arc & tag, O(n^2) scores, then takes the highest scoring spanning tree from that complete graph. so what's the sentence confidence score here? the score of that spanning tree over the sum of the scores of all the spanning trees? that could work... i suppose we could also make some sort of local confidence score for a single arc by weighting the score of the final arc chosen over all the other possible arcs. we'd just have to be aware that there's a very high potential that one bad arc gets chosen somewhere in the sentence so as to allow better arcs to get chosen elsewhere in the sentence

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants