Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate komi into ratings (particularly for small boards) #46

Closed

Conversation

dexonsmith
Copy link
Contributor

Add komi integration to the ratings math. This path can be turned on using:

  • --compute-handicap-via-komi-small for 9x9 and 13x13 boards
  • --compute-handicap-via-komi-19x19 for 19x19 boards

This requires passing extra info to get_handicap_adjustment:

  • komi
  • size
  • rules

Unfortunately the rules don't seem to be in the database (starting out by hardcoding to "japanese"). Maybe we can fix that?

Relates to #45

@dexonsmith
Copy link
Contributor Author

For now, only analysis/analyze_glicko2_one_game_at_a_time.py has been updated. I don't seem to have access to turning this into a "draft" pull request.

Also, there are some MASSIVE small board handicaps in there, such as 8-stones on a 9x9 board, which is probably about equivalent to 52 stones on a 19x19 board.

@dexonsmith
Copy link
Contributor Author

(I've put most updates in the bug, #45)

Add komi integration to the ratings math (optionally).

- This slightly adjusts 19x19 games, since the "first" handicap stone (dropping
  komi) only gives black a 1/2-rank head start.
- Big changes for 9x9 and 13x13 games.

The effective rank (after adjustment) is bounded between 0 and 39.

This path can be turned on using:

- `--compute-handicap-via-komi-small` for 9x9 and 13x13 boards
- `--compute-handicap-via-komi-19x19` for 19x19 boards
This is more accurate, especially for small boards.
Use `get_handicap_rank_difference` for skipping games. This is a WIP because
I've only updated analysis/analyze_glicko2_one_game_at_a_time.py, but it's a
problem everywhere.
@dexonsmith dexonsmith force-pushed the incorporate-komi-into-ratings branch from 2e260f7 to 2197d9e Compare December 13, 2023 19:26
@dexonsmith
Copy link
Contributor Author

Did a force-push after a big rebase which cleans up the commit history substantially.

@dexonsmith dexonsmith marked this pull request as draft December 13, 2023 23:01
@dexonsmith
Copy link
Contributor Author

After #49 lands, this will be ready as well. The code is now fairly clean. Still probably needs some iteration (and definitely more analysis).

@dexonsmith dexonsmith marked this pull request as ready for review December 14, 2023 21:50
@dexonsmith dexonsmith requested a review from anoek December 14, 2023 21:50
@dexonsmith
Copy link
Contributor Author

@anoek, I think this might be ready to merge now. Happy to split up further into multiple PRs though if you'd prefer to review pieces separately. Still a few things I want to do (add configuration options for) but I figure it can be a different PR.

Here's what's in the branch (doesn't have clean commits in this order, but could rewrite this way if it's useful for reviewing):

  • Refactor RatingsMath.get_handicap_adjustment to depend on get_handicap_rank_difference (putting most logic there).
  • Change ignoring-based-on-handicap to use get_handicap_rank_difference, which by default just returns the handicap.
    • TODO: add command-line option to control the threshold (currently ignored if >1).
  • Add skip-based-on-handicap to SkipLogic, relying on get_handicap_rank_difference.
    • TODO: add command-line option to control the threshold (currently ignored if >9, filtering out a bunch of large-handicap small-board games). Might want to control this separately for small boards.
  • Add --compute-handicap-via-komi-small and --compute-handicap-via-komi-19x19, which add parameters to get_handicap_rank_difference and get_handicap_adjustment; when turned on, use the komi and board size and ruleset to help determine the effective handicap rank difference, rather than just returning the handicap parameter untouched.

@dexonsmith
Copy link
Contributor Author

(Most of the "noise" in the diff is from adding parameters to get_handicap_adjustment.)


def get_handicap_adjustment(rating: float, handicap: int, size: int, komi: float, rules: str) -> float:
rank_difference = get_handicap_rank_difference(handicap, size, komi, rules)
effective_rank = min(39, max(0, rating_to_rank(rating) + rank_difference))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to ponder here, I think the reason why I didn't clamp the rank between 30k-9d for this was intentional (but not necessarily correct, that decision is certainly open to discussion).

My gut says that by clamping this we're going to introduce some unnecessary noise. There are legitimately terrible players (including purposefully terrible beginner bots), and there are some legitimately >9d "players" (think every strong KataGo bot). I call out the bots in specific because they have a whole lot of games and can actually move the needle when it comes to where a lot of players land with their ratings. While we clamp those for display to the users, that's more out of convention and UX than a good mathematical reason. I think for the purposes of doing good ratings I think it does a bit of harm to pretend like everyone is squished within those ranks.

Again, open to discussion if you have a good reason to clamp them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main reasoning here was practical, to avoid a MathDomainError (can't remember if it was in rank_to_rating or in the subsequent ratings update calculation). I did NOT play around with the bounds of the clamping... maybe -10 (40k) to 45 would be sufficient to avoid the MathDomainError. Maybe these clamp values could be parameterized? Maybe we should depend on SkipLogic to preclude the need for this?

In terms of handling players with true ranks outside the UX, I'm not sure how reasonable results can be if the range (output) of rating_to_rank and the domain (input) of rank_to_rating don't match. But that's the kind of data I think I was seeing (after applying small board handicap).

Note that I saw the math coming up with effective handicap as high as 52, from an 8-stone 9x9 game early in OGS history (not sure there was only one....). (Maybe that error predated the SkipLogic change and wouldn't be necessary now...)

BTW, IMO,:

  • For rank_to_rating/rating_to_rank calculations, we should choose a domain and range for ranks that match. Either we clamp for things outside of that, or don't rate games that handicaps put outside of that, or something.
  • For UX, we can choose a smaller set of ranks to clamp to. So, the rating_to_rank might have a player down as 0 or -5, but the UX would show 25k (not 30k or 35k).

Not sure how that matches what we have right now...

(Feels a bit like it would be worthwhile to consider more of these changes in isolation... maybe I should split up this PR more after all... not today though)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main reasoning here was practical, to avoid a MathDomainError (can't remember if it was in rank_to_rating or in the subsequent ratings update calculation).

Don't have much time today, but my initial attempts to reproduce this (by commenting out the mitigation and the new SkipLogic code) haven't been successful...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converted back to a draft for now— since I can't reproduce the MathDomainError anymore it seems better to remove the clamping for now, and if we add it back it should be parameterized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, found it, when I temporarily reverted the clamping, I inserted a bug in get_handicap_adjustment, which is why I couldn't reproduce.

The domain error is on tallygameanalytics:

Processing OGS data
   2,739,321 /   30,811,698 games processed.  192.3s remainingTraceback (most recent call last):
  File "/Users/dexonsmith/Repos/online-go/goratings/analysis/analyze_glicko2_one_game_at_a_time.py", line 99, in <module>
    tally.add_glicko2_analytics(analytics)
  File "/Users/dexonsmith/Repos/online-go/goratings/analysis/util/TallyGameAnalytics.py", line 102, in add_glicko2_analytics
    self.prediction_cost[size][speed][rank][handicap] += - math.log(result.expected_win_rate if black_won else 1 - result.expected_win_rate)
                                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: math domain error

Anyway, I still think the options can land without fixing the MathDomainError, and then we can decide whether to fix with clamping or to update the TalylGameAnalytics to handle an expected_win_rate of 0 (which I assume is what's causing the domain error). Sending a new PR soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, though honestly if we're hitting the domain error it's because our rating code is busted somewhere, so having it blow up isn't necessarily a bad thing (although having a more appropriate error would probably be better in an ideal world).

Fix an off-by-one bug in the value of extra moves in
get_handicap_rank_difference, and rewrite the math using extra variables so
it's easier to visually verify.

The bug was in the calculation of the number of extra moves. The correct value is:

    num_extra_moves = handicap - 1 if handicap > 1 else 0

but the old code forgot the `- 1`.
@dexonsmith dexonsmith marked this pull request as draft December 16, 2023 17:32
@dexonsmith
Copy link
Contributor Author

Closing in favour of #51, which is a bit cleaner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants