Incorporate komi into ratings (particularly for small boards) #46

dexonsmith · 2023-12-12T01:35:47Z

Add komi integration to the ratings math. This path can be turned on using:

--compute-handicap-via-komi-small for 9x9 and 13x13 boards
--compute-handicap-via-komi-19x19 for 19x19 boards

This requires passing extra info to get_handicap_adjustment:

komi
size
rules

Unfortunately the rules don't seem to be in the database (starting out by hardcoding to "japanese"). Maybe we can fix that?

Relates to #45

dexonsmith · 2023-12-12T01:37:07Z

For now, only analysis/analyze_glicko2_one_game_at_a_time.py has been updated. I don't seem to have access to turning this into a "draft" pull request.

Also, there are some MASSIVE small board handicaps in there, such as 8-stones on a 9x9 board, which is probably about equivalent to 52 stones on a 19x19 board.

dexonsmith · 2023-12-12T03:16:07Z

(I've put most updates in the bug, #45)

Add komi integration to the ratings math (optionally). - This slightly adjusts 19x19 games, since the "first" handicap stone (dropping komi) only gives black a 1/2-rank head start. - Big changes for 9x9 and 13x13 games. The effective rank (after adjustment) is bounded between 0 and 39. This path can be turned on using: - `--compute-handicap-via-komi-small` for 9x9 and 13x13 boards - `--compute-handicap-via-komi-19x19` for 19x19 boards

This is more accurate, especially for small boards.

Use `get_handicap_rank_difference` for skipping games. This is a WIP because I've only updated analysis/analyze_glicko2_one_game_at_a_time.py, but it's a problem everywhere.

dexonsmith · 2023-12-13T19:27:14Z

Did a force-push after a big rebase which cleans up the commit history substantially.

…ratings

…incorporate-komi-into-ratings And centralize the check for rank difference > 9.

dexonsmith · 2023-12-14T08:56:39Z

After #49 lands, this will be ready as well. The code is now fairly clean. Still probably needs some iteration (and definitely more analysis).

…into-ratings

dexonsmith · 2023-12-14T21:50:34Z

@anoek, I think this might be ready to merge now. Happy to split up further into multiple PRs though if you'd prefer to review pieces separately. Still a few things I want to do (add configuration options for) but I figure it can be a different PR.

Here's what's in the branch (doesn't have clean commits in this order, but could rewrite this way if it's useful for reviewing):

Refactor RatingsMath.get_handicap_adjustment to depend on get_handicap_rank_difference (putting most logic there).
Change ignoring-based-on-handicap to use get_handicap_rank_difference, which by default just returns the handicap.
- TODO: add command-line option to control the threshold (currently ignored if >1).
Add skip-based-on-handicap to SkipLogic, relying on get_handicap_rank_difference.
- TODO: add command-line option to control the threshold (currently ignored if >9, filtering out a bunch of large-handicap small-board games). Might want to control this separately for small boards.
Add --compute-handicap-via-komi-small and --compute-handicap-via-komi-19x19, which add parameters to get_handicap_rank_difference and get_handicap_adjustment; when turned on, use the komi and board size and ruleset to help determine the effective handicap rank difference, rather than just returning the handicap parameter untouched.

dexonsmith · 2023-12-14T21:51:35Z

(Most of the "noise" in the diff is from adding parameters to get_handicap_adjustment.)

anoek · 2023-12-15T15:33:21Z

analysis/util/RatingMath.py

+
+def get_handicap_adjustment(rating: float, handicap: int, size: int, komi: float, rules: str) -> float:
+    rank_difference = get_handicap_rank_difference(handicap, size, komi, rules)
+    effective_rank = min(39, max(0, rating_to_rank(rating) + rank_difference))


I'm trying to ponder here, I think the reason why I didn't clamp the rank between 30k-9d for this was intentional (but not necessarily correct, that decision is certainly open to discussion).

My gut says that by clamping this we're going to introduce some unnecessary noise. There are legitimately terrible players (including purposefully terrible beginner bots), and there are some legitimately >9d "players" (think every strong KataGo bot). I call out the bots in specific because they have a whole lot of games and can actually move the needle when it comes to where a lot of players land with their ratings. While we clamp those for display to the users, that's more out of convention and UX than a good mathematical reason. I think for the purposes of doing good ratings I think it does a bit of harm to pretend like everyone is squished within those ranks.

Again, open to discussion if you have a good reason to clamp them.

Main reasoning here was practical, to avoid a MathDomainError (can't remember if it was in rank_to_rating or in the subsequent ratings update calculation). I did NOT play around with the bounds of the clamping... maybe -10 (40k) to 45 would be sufficient to avoid the MathDomainError. Maybe these clamp values could be parameterized? Maybe we should depend on SkipLogic to preclude the need for this?

In terms of handling players with true ranks outside the UX, I'm not sure how reasonable results can be if the range (output) of rating_to_rank and the domain (input) of rank_to_rating don't match. But that's the kind of data I think I was seeing (after applying small board handicap).

Note that I saw the math coming up with effective handicap as high as 52, from an 8-stone 9x9 game early in OGS history (not sure there was only one....). (Maybe that error predated the SkipLogic change and wouldn't be necessary now...)

BTW, IMO,:

For rank_to_rating/rating_to_rank calculations, we should choose a domain and range for ranks that match. Either we clamp for things outside of that, or don't rate games that handicaps put outside of that, or something.

For UX, we can choose a smaller set of ranks to clamp to. So, the rating_to_rank might have a player down as 0 or -5, but the UX would show 25k (not 30k or 35k).

Not sure how that matches what we have right now...

(Feels a bit like it would be worthwhile to consider more of these changes in isolation... maybe I should split up this PR more after all... not today though)

Main reasoning here was practical, to avoid a MathDomainError (can't remember if it was in rank_to_rating or in the subsequent ratings update calculation).

Don't have much time today, but my initial attempts to reproduce this (by commenting out the mitigation and the new SkipLogic code) haven't been successful...

Converted back to a draft for now— since I can't reproduce the MathDomainError anymore it seems better to remove the clamping for now, and if we add it back it should be parameterized.

Aha, found it, when I temporarily reverted the clamping, I inserted a bug in get_handicap_adjustment, which is why I couldn't reproduce.

The domain error is on tallygameanalytics:

Processing OGS data 2,739,321 / 30,811,698 games processed. 192.3s remainingTraceback (most recent call last): File "/Users/dexonsmith/Repos/online-go/goratings/analysis/analyze_glicko2_one_game_at_a_time.py", line 99, in <module> tally.add_glicko2_analytics(analytics) File "/Users/dexonsmith/Repos/online-go/goratings/analysis/util/TallyGameAnalytics.py", line 102, in add_glicko2_analytics self.prediction_cost[size][speed][rank][handicap] += - math.log(result.expected_win_rate if black_won else 1 - result.expected_win_rate) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: math domain error

Anyway, I still think the options can land without fixing the MathDomainError, and then we can decide whether to fix with clamping or to update the TalylGameAnalytics to handle an expected_win_rate of 0 (which I assume is what's causing the domain error). Sending a new PR soon.

Makes sense, though honestly if we're hitting the domain error it's because our rating code is busted somewhere, so having it blow up isn't necessarily a bad thing (although having a more appropriate error would probably be better in an ideal world).

Fix an off-by-one bug in the value of extra moves in get_handicap_rank_difference, and rewrite the math using extra variables so it's easier to visually verify. The bug was in the calculation of the number of extra moves. The correct value is: num_extra_moves = handicap - 1 if handicap > 1 else 0 but the old code forgot the `- 1`.

dexonsmith · 2023-12-16T18:02:40Z

Closing in favour of #51, which is a bit cleaner.

dexonsmith mentioned this pull request Dec 12, 2023

Ratings should consider komi on small boards #45

Open

dexonsmith mentioned this pull request Dec 13, 2023

Include all boards in compact stats #47

Merged

Include all boards in compact stats

f1c7a13

dexonsmith mentioned this pull request Dec 13, 2023

Infer komi for AGA and EGF games and read rules for OGS data #48

Merged

dexonsmith added 4 commits December 13, 2023 11:20

Export get_handicap_rank_difference from RatingMath

a6d8ba5

Use get_handicap_rank_difference when ignoring games

55dad80

This is more accurate, especially for small boards.

WIP: Use get_handicap_rank_difference for skipping games

2197d9e

Use `get_handicap_rank_difference` for skipping games. This is a WIP because I've only updated analysis/analyze_glicko2_one_game_at_a_time.py, but it's a problem everywhere.

dexonsmith force-pushed the incorporate-komi-into-ratings branch from 2e260f7 to 2197d9e Compare December 13, 2023 19:26

dexonsmith marked this pull request as draft December 13, 2023 23:01

dexonsmith added 3 commits December 13, 2023 16:14

Merge branch 'infer-komi-for-aga-and-egf' into incorporate-komi-into-…

9895341

…ratings

Merge branch 'infer-komi-for-aga-and-egf' into incorporate-komi-into-…

0d239bc

…ratings

Merge remote-tracking branch 'origin/centralize-skipping-logic' into …

65d3dc0

…incorporate-komi-into-ratings And centralize the check for rank difference > 9.

dexonsmith mentioned this pull request Dec 14, 2023

Centralize skipping logic in analysis.util.SkipLogic #49

Merged

Merge remote-tracking branch 'upstream/master' into incorporate-komi-…

ce5796d

…into-ratings

dexonsmith marked this pull request as ready for review December 14, 2023 21:50

dexonsmith requested a review from anoek December 14, 2023 21:50

anoek reviewed Dec 15, 2023

View reviewed changes

dexonsmith marked this pull request as draft December 16, 2023 17:32

dexonsmith closed this Dec 16, 2023

dexonsmith mentioned this pull request Dec 16, 2023

Add --handicap-rank-difference-{small,19x19} options #51

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorporate komi into ratings (particularly for small boards) #46

Incorporate komi into ratings (particularly for small boards) #46

dexonsmith commented Dec 12, 2023

dexonsmith commented Dec 12, 2023

dexonsmith commented Dec 12, 2023

dexonsmith commented Dec 13, 2023

dexonsmith commented Dec 14, 2023

dexonsmith commented Dec 14, 2023

dexonsmith commented Dec 14, 2023

anoek Dec 15, 2023

dexonsmith Dec 15, 2023

dexonsmith Dec 15, 2023

dexonsmith Dec 16, 2023

dexonsmith Dec 16, 2023

anoek Dec 16, 2023

dexonsmith commented Dec 16, 2023

Incorporate komi into ratings (particularly for small boards) #46

Incorporate komi into ratings (particularly for small boards) #46

Conversation

dexonsmith commented Dec 12, 2023

dexonsmith commented Dec 12, 2023

dexonsmith commented Dec 12, 2023

dexonsmith commented Dec 13, 2023

dexonsmith commented Dec 14, 2023

dexonsmith commented Dec 14, 2023

dexonsmith commented Dec 14, 2023

anoek Dec 15, 2023

Choose a reason for hiding this comment

dexonsmith Dec 15, 2023

Choose a reason for hiding this comment

dexonsmith Dec 15, 2023

Choose a reason for hiding this comment

dexonsmith Dec 16, 2023

Choose a reason for hiding this comment

dexonsmith Dec 16, 2023

Choose a reason for hiding this comment

anoek Dec 16, 2023

Choose a reason for hiding this comment

dexonsmith commented Dec 16, 2023