Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for a “Delta Compression + Neural Network” Approach in Nine Men’s Morris #4

Open
calcitem opened this issue Jan 12, 2025 · 1 comment

Comments

@calcitem
Copy link
Contributor

Dear Mr. Gévay and Mr. Danner,

I hope this message finds you well. I have been greatly inspired by your work on Nine Men’s Morris, Morabaraba, and Lasker Morris, particularly regarding extended and ultrastrong solutions. Building on the foundations you established, I am developing a preliminary research plan to integrate perfect database knowledge with traditional Alpha-Beta search in a more streamlined way, ultimately aiming to train a lightweight neural network for efficient, near-optimal play.

Below is the detailed design of my intended approach. While none of these steps have been implemented yet, I would be grateful for any guidance or suggestions you might share.


1. Overall Goal

Create a “delta-compressed” subset of the tablebase that captures only those states in which Alpha-Beta search fails to replicate the optimal solution. This compressed database should be dramatically smaller yet still ensure overall near-perfect decision-making.


2. Methodology in Detail

2.1 Segmenting the Database

  1. Phased or Subset Handling:

    • We plan to split the full tablebase by game phase (e.g., opening/placing, midgame/moving, endgame/flying) or other meaningful partitions (piece counts, forced lines, etc.).
    • By working on smaller subsets, we can systematically test and refine our methods before extending them to the entire database.
  2. Sampling Strategies:

    • For each subset, we may combine random sampling for broad coverage and stratified sampling to focus on higher-complexity states.
    • This ensures we do not overlook “rare but critical” positions.

2.2 Alpha-Beta Search Validation

  1. Running Alpha-Beta:

    • We will use a minimax framework with iterative deepening, heuristic evaluation, and basic pruning.
    • The search depth or time budget can be adjusted per subset to balance thoroughness and computation.
  2. Mismatch Detection:

    • For every selected state (s), we compare the Alpha-Beta best move (and outcome, if applicable) with the tablebase result.
    • If Alpha-Beta matches the database, we need not store additional information.
    • If Alpha-Beta differs, we mark (s) as a conflict and record essential data (optimal move, value, or short continuation).

2.3 Building the “Delta” Compressed Database

  1. Core Principle:

    • Only states where Alpha-Beta conflicts with the tablebase are retained in the “delta” set.
    • Each entry contains:
      • State ID (e.g., Zobrist hash or another representation).
      • Correct move/value according to the perfect database.
      • Optional follow-up for forced lines, ensuring correct transitions in deeper sequences.
  2. Iterative Construction:

    • We plan to add entries to the delta set incrementally, continuing until the main search engine rarely encounters undiscovered discrepancies.
  3. Use in Gameplay:

    • During a live match, the system would run Alpha-Beta . For state (s):
      • If (s) is not in the delta set, Alpha-Beta is presumed correct.
      • If (s) is in the delta set, we override Alpha-Beta with the tablebase’s recommended move.

2.4 Neural Network Training

  1. Data Sources:

    • Consistent States: (where Alpha-Beta aligns with the tablebase) used to train on “typical” or “straightforward” positions.
    • Conflict States: (from the delta table) used to highlight trickier scenarios in which Alpha-Beta needed correction.
  2. Network Structure:

    • Potentially a convolutional or graph-based architecture to encode Nine Men’s Morris boards (24 points, occupant info, etc.).
    • Outputs could be a value (win/draw/loss or numerical) and/or a policy (move probabilities).
  3. Training Workflow:

    • Supervised Phase: We label each sampled position with the tablebase’s best move or value.
    • Iterative Refinement: If the network remains uncertain in certain sub-regions, we may sample more states or refine heuristics for those areas, always referencing the delta table for ground truth.

3. Potential Refinements

  1. Symmetry and Transformation:
    • Exploit rotational/reflective symmetries to reduce storage and training data repetition.
  2. Further Distillation & Self-Play:
    • After the initial supervised stage, it could be beneficial to integrate minimal self-play or a reinforcement step, referencing the delta table whenever the network encounters new or uncertain positions.
  3. Coverage vs. Table Size:
    • We aim to keep the compressed database small, but must ensure that all critical lines—especially forced sequences—are captured.
  4. Practical Heuristics:
    • If certain states are extremely deep or complex, we plan to refine our Alpha-Beta heuristics or incrementally increase search depth to catch hidden traps.

4. Invitation for Your Guidance

As we have not yet begun implementation, your expertise could help us refine the initial strategy and prioritize our efforts. We welcome any broad or specific suggestions, such as:

  • Ensuring comprehensive coverage of intricate states without over-expanding the delta table.
  • Efficient indexing (e.g., hashing strategies) to guarantee fast lookups in both the tablebase and the compressed data.
  • Integrating knowledge of specialized lines (e.g., punishing suboptimal moves) in the style of ultrastrong solutions.

We would be honored to incorporate any insights or recommendations you might share.

Thank you for taking the time to review this proposal. Your research has been a significant inspiration, and I look forward to any thoughts or advice you may be willing to offer.

Warm regards,

Calcitem

@ggevay
Copy link
Owner

ggevay commented Jan 13, 2025

Hello!

These are interesting ideas!

One potential difficulty that I can see is that if there is a time limit for the Alpha-Beta search, then it's not deterministic: even if it finds an optimal move in a certain game state on your machine, maybe it doesn't find an optimal move in the same game state on a user's machine. To circumvent this problem, you could use a node limit instead of a time limit: during searches, you could keep track of the number of nodes explored in the game tree, and stop the iterative deepening when this number reaches a limit. This has the drawback that the program will spend different times on different computers, but it might be worth it. You might be able to set the limit low enough that it plays reasonably fast on most computers.

Another difficulty could be finding a representative sample of important game states: If you play games with your program, it might not wander into many game states that human users or other programs would wander into. One thing you could do is to play games with many different engine configurations, and maybe even with different tweaks to the evaluation function.

In win/loss positions, will you consider optimality with taking into account the number of moves to the end of the game (depth-to-win)? If yes, then the Alpha-Beta search will have a tougher time to find an optimal move: finding the quickest way to win is often harder than just finding any win. This means that not taking into account depth-to-win could potentially make the delta database much smaller. However, not taking into account depth-to-win (i.e., just going with any winning move), can cause a program to get into a cycle where every position of the cycle is a winning position, but the program is not actually winning, because of being in a cycle. A middle ground between not taking into account depth-to-win at all and taking it into account at every win/loss position could be to just pick one position of a cycle of winning positions that the Alpha-Beta runs into, and include just that one position in the delta database, so that the cycle is broken. This will prevent the program from running around in such cycles instead of winning, but at the same time would make the database much smaller than simply taking into account the depth-to-win when determining optimality in all win/loss positions.

I'm curious to see how your research turns out! It will be interesting to see statistics about how often the Alpha-Beta search can find an optimal move.

One more thing: if the delta database fits in RAM, you could use it not just to look up the value of the game state that is on the board (i.e., the root of a potential Alpha-Beta search), but to look up every game state that you encounter during the Alpha-Beta search, and then not explore it with the Alpha-Beta search if it's present in the delta database as a win/loss (if it's just present in the database as a draw, then it's less useful for the Alpha-Beta search; in that case, it's just a very loose bound on the result that the Alpha-Beta search would return: it's somewhere between the best loss and the worst win). It seems hard to estimate how big of a boost this would give to the Alpha-Beta search, but it seems like an interesting experiment at the very least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants