-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow sqlite to be accessed from different thread #714
Conversation
Hi @tkrabel-db, thanks for making this PR. I'm not quite sure that exposing this attribute to the editor-side API would be the best way to fix this issue. The editor-side of the integration (i.e. python-lsp-server in this instance) ideally shouldn't have to care about the internal concurrency requirement of rope. Whether or not an API calls the database or not is not part of the rope API, it'll complicate the API if only some calls can be made safely from multiple threads and especially with nothing documenting what is or is not safe. I did a little bit of reading on this, from my understanding, with the appropriate compile-time options, sqlite3 actually does support multithreading and For the highest compatibility, we would need to implement our own serialization mechanism to proxy all calls into a single worker thread. This means every API would need to queue requests to a single worker thread. This is doable, but unless a lot of modern OSes/IDEs that have compiled sqlite3 for Python with multithreading disabled, it seems like it'll be unnecessarily complicated solution. If we drop support for configuration where sqlite3 is not compiled with multithreading support, then instead of exposing I'll need to have a bit more thinking and research on this. |
@lieryan thanks for the investigation and the detailed explanation. I am short on time myself right now, but I wanted to add some data points that may help. Also, I wanted to see if we can solve the issue on
|
With threadsafety = 1, yes, they'll need to have one AutoImport per thread. But another thing to be careful about is when using in-memory database, IIUC, whenever you create an unnamed in memory database in sqlite, that creates a new, empty database. So one AutoImport per thread will mean that additional memory usage and additional scanning time for each thread. That would be undesirable. If we want to recommend one AutoImport per thread as the best way to multithreading, we'll likely want to change AutoImport to use an explicitly named in-memory database so that they'll all connect to the same database. Creating named in memory database seems to be quite simple using a filename like so "file:memdb1?mode=memory&cache=shared", if we name the database with something that uniquely identify the rope Project, maybe the hash of I think I would have preferred one AutoImport per thread better than allowing AutoImport to be used from multiple threads, it'll be a much simpler threading model to use and understand, and I think the performance or memory usage difference with creating multiple AutoImport is likely going to be negligible. |
Hm, I'm also thinking through this right now, and I am wondering: what is the problem exposing the |
I was checking multi-threading behaviour on python 3.11, where I think From my POV, it should be up to the user of a sqlite db to determine if they run into thread safety issues or not, but at least they should be able to turn checks off if they deem their use case safe. Write now, In case you are open to add |
Can we confirm that we're talking about the same thing here? If by "users" you meant end users, I don't think end users should have to care about the implementation details of rope, 99% of them isn't going to have the contextual understanding to know whether they should or shouldn't use this flag. If instead by users you meant authors of editors/lsp servers, then I can see the merit for exposing the flag, though I still don't think it's the best way to solve the issue. If we require each thread to create one AutoImport, all we need to do to handle concurrency is just to use standard sqlite transactions; OTOH, if we reuse a single connection in multiple threads, we will also need to implement a way to control access to the connection. Calling AutoImport from multiple threads will open up a whole new class of failure scenarios. While doing individual database operations should work fine with thread safe sqlite, if there are operations that depends on the atomicity of a sequence of reads and writes and which will require using transaction, reusing the same connection for multiple threads will not really make sense anyway. If rope is the end application and have full control over how the connection is used from different threads, this might be a-ok, but as rope is a library that can be called from many different clients we don't control, trying to get every implementor to handle concurrency correctly is going to be tricky. On the other hand, the concurrency issues that can happen if we require one AutoImport per thread rule is mostly going to be the same as the concurrency issues caused by having multiple instances of rope running. For example if you have multiple editors/IDE, or if you run multiple instances of pylsp for the same project. This is a use case we necessarily have to support anyway, so it doesn't create any new failure scenarios. Also, the sqlite authors stated that while sqlite is threadsafe with the appropriate compile options, they also stated that "Threads are evil. Avoid them." and recommends avoiding threading with sqlite. I would be inclined to heed to the warning. |
Ok, I think I have a better idea now. Rather than one AutoImport per thread, I think we can make it so that AutoImport will keep the sqlite connection in a threadlocal. If the editor application always calls rope from a single thread, this has negligible impact for them, everything will continue to work as they currently do. But if they call AutoImport methods from multiple threads, AutoImport will initialise an sqlite connection for each thread. In either case, the editor still uses a single instance of AutoImport so there's minimal change needed on the editor side code, but each threads will internally be served by their own sqlite connection, which avoids needing multi threading access to the same connection. |
By "users" I mean everyone that uses that library, but I think this mostly likely will be application developers that want to expose rope's features to an end user, so you can say I mean application / editor developers.
Noted. |
I like that approach! It's clean, i.e. the APIs won't change, and we handle the "complexity" of ensuring one connection per thread in the background. I opened a PR #720 |
@tkrabel IIUC, with the one connection per thread, changing |
@lieryan that is correct. Thanks for cleaning this up! :) |
Description
Remove checking if sqlite db is accessed from a different thread.
Fixes #713
Checklist (delete if not relevant):