Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommendation for CCC Engines #214

Open
AndyGrant opened this issue Jun 14, 2024 · 0 comments
Open

Recommendation for CCC Engines #214

AndyGrant opened this issue Jun 14, 2024 · 0 comments

Comments

@AndyGrant
Copy link
Owner

AndyGrant commented Jun 14, 2024

To all CCC engine authors,

Due to the nature of the CCC events running on 250-threads, I would recommend the following to everyone. My reasons will follow below.

  1. As soon as your engine knows its processing a uci "go ..." command, you should set the start time for your internal timers. You should not wait until you've allocated all your data, or until you've spawned all your helper threads, ..
  2. You may want to consider using a Threading solution where your helper threads sit dormant until there is work todo, instead of having to { allocate, start, run, terminate } your helpers on every search. There is non-negligible overhead associated with this. But please be careful that your helper threads are not slamming the CPU when they are SUPPOSED to be sleeping.
  3. For CCC, TCEC, and your own sanity, you should always report a final "info ..." command right before you report the best move.

It should be noted that not all of these things are done in Ethereal. So I'm not suggesting for anyone to look at Ethereal as an example of how to do it.


The reason for [1] is that I'm seeing engines in CCC that in the Cutechess logs, there is a discrepancy between the engine's understanding of the time ( as per the final "info ... time " ) output, compared to Cutechess's time stamps associated with the initial go command and receiving the info line. Some engines this time looks to be only a few milliseconds ( Torch, Stockfish, Arasan, Revenge, Akimbo, Igel to name a few ). But others appear to have excessive gaps, sometimes multiple hundreds of milliseconds ( BlackMarlin, Equisetum, Minic, Willow to name a few ).


The reason for [2] is related to [1]. Generally speaking, it should not take that much time to startup your search once you see the "go" command. I imagine most of that time for the engines with excessive gaps is stemming from having to allocate/start all threads every time. This chips into your time by a bit. But also, it can cause some weird bugs.

Lets take an imaginary engine, Weiss. Weiss' search startup looks like this:

for (int i = 1; i < num_threads; i++)
   create_and_start_search_thread();
start_search_thread(0);

Weiss' main thread will create and start all of the helper threads, before starting its own search. Lets say that this takes 500ms. Now lets imagine that Weiss was told it only had 520ms on the clock. As soon as Weiss finishes depth 1, it will check the clock and realize that it is about to flag, and then it will stop searching. This might lead to you playing a move only searched to depth 1. Although this is avoided if you employ the concept of thread voting.


The reason for [3] is mostly that it is just good practice. For most engines, most of the time, [3] probably seems like a guarantee. You finish a depth, you report it, you check the clock, you decide to stop, you print the bestmove. But imagine the following scenarios instead, where you can suddenly get a large gap in time between the last info line, and the bestmove report.

Common Case:
You are using only 1 search thread.
You finish depth 14, report the info, and you decide you want to continue searching.
While you are searching, you realize you are getting too close to flagging, or whatever concept your engine has of "max time" to spend.
You abort your depth 15 search in the middle.
You don't print anything, since you aborted the search.
You report the bestmove.

In Torch, I make sure that if the search ended in some weird way...

  • Decided to stop in the middle of a search due to time
  • Decided to stop in the middle of a search due to another thread saying we're done
  • Stopped in the middle of a search due to hitting the "go nodes " that we were sent
  • ETC

Then I make sure to report another info line.

mhouppin added a commit to mhouppin/stash-bot that referenced this issue Jul 3, 2024
…180)

This rules out the first bullet point in
AndyGrant/Ethereal#214, where Stash would previously
start the timer from the main search thread after starting all the worker
threads. We now instead start the timer from the UCI thread, after ensuring the
possible previous search has completed.

As a side-effect, this should allow us to handle stricter time constraints and
machine loads without burning the clock (most notably on noobpwnftw's machines
on Grantnet and the 256-core server at CCC), so the Move Overhead default value
is reduced back to its historical 30ms. No timeouts were observed during testing
for now, but I might adjust the overhead later if I start observing time losses
again.

Passed non-regression STC:

Elo   | 4.92 +- 4.39 (95%)
SPRT  | 8.0+0.08s Threads=1 Hash=16MB
LLR   | 2.95 (-2.94, 2.94) [-4.00, 1.00]
Games | N: 6494 W: 1272 L: 1180 D: 4042
Penta | [56, 593, 1872, 655, 71]
http://chess.grantnet.us/test/37474/

Bench: 3,844,164
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant