Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple clients in parallel #383

Closed
nwatson22 opened this issue Nov 22, 2023 · 9 comments · Fixed by #427
Closed

Support multiple clients in parallel #383

nwatson22 opened this issue Nov 22, 2023 · 9 comments · Fixed by #427
Assignees
Labels

Comments

@nwatson22
Copy link
Member

Currently the kore-rpc-booster server is able to handle clients connecting to it and making requests from different processes, but at some point there is a bottleneck where only one connection at a time can make progress, preventing any speedup from using multiple clients to make independent requests in parallel to one server. From what I understood, there are at least 2 issues to be dealt with for this limitation to be overcome: opening multiple z3 processes, and the fact that the servers contain state from add_module, because of which @tothtamas28 had said we might consider different ways of handling the modules that get used in the other request types, e.g. sessions or cookies.

@tothtamas28
Copy link

tothtamas28 commented Nov 22, 2023

A few design alternatives for add-module:

  1. Store nothing on server, send modules on each request ("cookies")
  2. Store modules on server, send modules IDs on each request ("sessions")
    1. IDs are external (provided by the user as part of the request)
    2. IDs are internal (generated by the server and sent in the response)

The advantage of Option 2.ii (compared to 2.i) is that it prevents name clashes between concurrent clients. As an extra, requests in that case can be idempotent, i.e. replaying a request results in the same ID.


A few notes from a discussion with @ehildenb:

  • We need the full flexibility of modules to leave room for later extensions.
  • Unless ID computation logic is part of the interface, (i) has the advantage that users can easily run different KORE-RPC implementations in parallel with the same requests (e.g. Maude and Booster).
  • For human interaction (i) is more convenient.
  • On the other hand (ii) provides better compartmentalization of state, i.e. there's no way for clients to clash on module names.
  • In case (ii) if ID computation is content-based (e.g. normalize module, then compute hash), that gives idempotency for free.
  • Maybe let the user optionally provide the module name and refer to a module by it?

@tothtamas28
Copy link

tothtamas28 commented Nov 22, 2023

We should also consider sending structured data as opposed to KORE text like add-module does currently. This might as well be implemented in a new endpoint to avoid a breaking change.

Example for Option 2.ii (session with internal ID):

Request:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "add-rules",
  "params": {
    "rules": [
      ["<rule0_lhs>", "<rule0_rhs>"],
      ["<rule1_lhs>", "<rule1_rhs>"],
      "...",
      ["<ruleN_lhs>", "<ruleN_rhs>"]
    ]
  }
}

Response:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "id":  "47d3a45f-8aa3-41f0-9111-2fb940423e9b"
  }
}

execute and other endpoints take an optional list of rule set IDs:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "execute",
  "params": {
    "max-depth": 4,
    "...": "...",
    "log-timing": true,
    "rules": ["47d3a45f-8aa3-41f0-9111-2fb940423e9b"]
  }
}

@tothtamas28
Copy link

@dwightguth, @ehildenb, @PetarMax

@goodlyrottenapple
Copy link
Contributor

@ehildenb
Copy link
Member

Some notes (mostly covered by Tamas above):

  • Different backends will have different restrictions on what they allow in new modules. The Haskell backend may not allow new sorts/syntax, but the Maude backend won't care, for example. So the transport shouldn't care about what the user is sending (only rules, no syntax, etc...), should just allow full modules. Later, we may have uses for declaring new sorts/syntax on the fly as we learn about things during symbolic execution (new constants that are contract specific, etc...).
  • A use case for CSE (compositional symbolic execution) will be to create enclosing modules for deeper nested functions (and the rules we learn for them by doing symbolic execution), then import those enclosing modules selectively for proving larger functions. For example, we have two functions test_1 and test_2, where test_1 calls g, f, and h, and test_2 calls g, h, and i. Then we want to symbolically execute all of g, f, h, i, and make modules RULES-G, RULES-F, RULES-H, RULES-I. Then we make module DEPENDENCIES-TEST-1 imports RULES-G imports RULES-F imports RULE-H endmodule, and symbolically execute test_1 in that context. Similarly, we make a module DEPENDENCIES-TEST-2, which imports the appropriate sub-modules for executing test_2 in. So we want modules specifically because they give us modularity.
  • User may have multiple backends started up, each which they have sent all the modules. Those backends will likely return different unique IDs for the modules they store. So the user will probably have an easier time referring to the module they want to execute in by name, rather than by unique ID, in order to dispatch queries to the correct backend. So I think we should at least allow both.

@tothtamas28
Copy link

tothtamas28 commented Nov 29, 2023

Proposal:

  • Endpoint add-module should return a unique ID for the added module.
  • A field modules: list[string] should be added to execute and other endpoints.
  • A field name-as-id: boolean (or similar) should be added to the add-module endpoint.
    • This indicates that the client would like to have the option to refer to the given module by name.
    • If the field is set to true, an error response should be sent if a module with the same name has been added earlier with "name-as-id": true.
    • If the value is not given or is false, the module name should be disregarded.
    • Once the module field gets JSON structure, the field can be deprecated, since the presence of an explicit name field can indicate the same.
  • ID computation should be part of the interface so that different backends can return the same ID.
    • The ID may be content-based. This gives idempotency for free.
  • The set of valid IDs should be disjoint from the set of valid module names.
    • This ensures that add-module with "name-as-id": true does not break a subsequent request with "name-as-id": false
    • Until the module field gets JSON structure, a convention should be worked out so that import works with an ID.

@jberthold
Copy link
Member

Backend parts:

  • build with threaded runtime system
  • prevent users from overwriting added modules
  • do this in kore-rpc-booster only for now.

@goodlyrottenapple
Copy link
Contributor

goodlyrottenapple commented Dec 11, 2023

#414 and runtimeverification/haskell-backend#3702 implement a very simple version of concurrent back-ends, which should be safe for multiple clients to connect to without module clashes. This is done by confiding state to the session, which means that every client that connect gets a session where they can add modules and execute within said context. The session is destroyed when the client disconnects. The upside to this is that there is very little change necessary on both the client and server side. The downside is some possible duplication of add module requests, if the pyk front-end maintains several connections which should all have the same state.

If this implementation is unsatisfactory and we want to go with the full module IDs proposal, we need to clarify how exactly we distinguish between module names and IDs and will have to modify the kore parser and internalisation procedures, which will be a somewhat more involved change to both code bases.

@tothtamas28
Copy link

tothtamas28 commented Dec 12, 2023

As a simplifying assumption, to avoid modifications in the parser, we can let generated IDs be valid module names for now. We'll probably move from KORE text to JSON in the module field eventually, so we can still revisit this assumption then.

Would this make implementation simpler?

goodlyrottenapple added a commit to runtimeverification/haskell-backend that referenced this issue Dec 14, 2023
Implements the proposal from
runtimeverification/hs-backend-booster#383

Namely:

`add-module`
-------------

* now returns the id of a given module, of the form
`m<sha256_of_given_module>`
* computes the SHA256 hash of the unparsed module string and saves the
internalised module under this key
* takes an optional `name-as-id: true` argument which implements the
previous behaviour of `add-module` (still returns the new id), i.e. adds
the module to the module map under the module name as well as the id.
* if the same module is added twice with `name-as-id: true`, the second
request will fail with a `Duplicate module` error
* if the same module is sent twice with `name-as-id: false` or without
`name-as-id`, the second request is idempotent and will succeed

As discussed on the issue, the IDs and original names are not disjoint
for ease of implementation. the `m` pre-pended to the hash is necessary
to make the name a valid kore identifier.

I have also not added the `modules` field to other methods as I'm unsure
whether this should be replacing the current `module` parameter, which
would be a backwards incompatible change to the API that would break a
lot of our integration tests, or if it should be an additional field,
@tothtamas28?

---------

Co-authored-by: github-actions <[email protected]>
goodlyrottenapple added a commit that referenced this issue Jan 24, 2024
Fixes #383

---------

Co-authored-by: devops <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Jost Berthold <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants