Confidence scores #1349

BramVanroy · 2024-02-22T11:06:38Z

Is your feature request related to a problem? Please describe.
It could be fruitful to have access to the confidence of the model in its predictions.

Describe the solution you'd like
In an ideal world, confidence scores for everything (i.e. for POS, DEP, MORPH, etc.) on the token level. But I am mostly interested in sentence-level measures, e.g. dependency tree confidence scores for the whole tree.

Describe alternatives you've considered
I have not found alternative parsers that give easy access to this feature.

Additional context
This is a continuation of our discussion on Twitter: https://twitter.com/stanfordnlp/status/1760026653370331647

AngledLuffa · 2024-02-25T00:24:32Z

I do not believe this will make it into the next release, which is imminent, but I will work on it next week or the week after if possible.

barrelltech · 2024-09-26T07:29:01Z

Did this ever get worked on? Is the data available somehow?

AngledLuffa · 2024-12-14T02:31:30Z

so i found a use case for POS on my end, at least

but i'm just not sure what would the interface look like. would this be something attached to all tokens? maybe there would need to be a flag supplied to the pipeline which says, add confidence scores for whichever model? i think it is unlikely most people would have a use for this.

and then, if that's supplied, would that be an additional comment in the 10th column of the conllu output and an additional field in the json output?

AngledLuffa · 2024-12-14T02:37:56Z

for dependencies, it's an even trickier question. upos & xpos can at least be judged by how close in score other possible taggings were. (not sure that works for features, since it outputs multiple features at once.) for dependencies, it scores each possible arc & tag, O(n^2) scores, then takes the highest scoring spanning tree from that complete graph. so what's the sentence confidence score here? the score of that spanning tree over the sum of the scores of all the spanning trees? that could work... i suppose we could also make some sort of local confidence score for a single arc by weighting the score of the final arc chosen over all the other possible arcs. we'd just have to be aware that there's a very high potential that one bad arc gets chosen somewhere in the sentence so as to allow better arcs to get chosen elsewhere in the sentence

BramVanroy added the enhancement label Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confidence scores #1349

Confidence scores #1349

BramVanroy commented Feb 22, 2024

AngledLuffa commented Feb 25, 2024

barrelltech commented Sep 26, 2024

AngledLuffa commented Dec 14, 2024

AngledLuffa commented Dec 14, 2024

Confidence scores #1349

Confidence scores #1349

Comments

BramVanroy commented Feb 22, 2024

AngledLuffa commented Feb 25, 2024

barrelltech commented Sep 26, 2024

AngledLuffa commented Dec 14, 2024

AngledLuffa commented Dec 14, 2024