Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow sqlite to be accessed from different thread #714

Closed
wants to merge 5 commits into from

Conversation

tkrabel-db
Copy link

@tkrabel-db tkrabel-db commented Oct 23, 2023

Description

Remove checking if sqlite db is accessed from a different thread.

Fixes #713

Checklist (delete if not relevant):

  • I have added tests that prove my fix is effective or that my feature works
  • I have updated CHANGELOG.md
  • I have made corresponding changes to user documentation for new features
  • I have made corresponding changes to library documentation for API changes

@lieryan
Copy link
Member

lieryan commented Oct 26, 2023

Hi @tkrabel-db, thanks for making this PR. I'm not quite sure that exposing this attribute to the editor-side API would be the best way to fix this issue. The editor-side of the integration (i.e. python-lsp-server in this instance) ideally shouldn't have to care about the internal concurrency requirement of rope. Whether or not an API calls the database or not is not part of the rope API, it'll complicate the API if only some calls can be made safely from multiple threads and especially with nothing documenting what is or is not safe.

I did a little bit of reading on this, from my understanding, with the appropriate compile-time options, sqlite3 actually does support multithreading and check_same_thread=False just bypasses a cpython-level check? According to https://discuss.python.org/t/is-sqlite3-threadsafety-the-same-thing-as-sqlite3-threadsafe-from-the-c-library/11463, Python 3.11 seems to expose this compile time option now using the DBAPI 2.0 attribute sqlite3.threadsafety but it should have already been safe to use sqlite3 from multiple threads in earlier versions of python if you do the compile options detection yourself.

For the highest compatibility, we would need to implement our own serialization mechanism to proxy all calls into a single worker thread. This means every API would need to queue requests to a single worker thread. This is doable, but unless a lot of modern OSes/IDEs that have compiled sqlite3 for Python with multithreading disabled, it seems like it'll be unnecessarily complicated solution.

If we drop support for configuration where sqlite3 is not compiled with multithreading support, then instead of exposing check_same_thread to python-lsp-server, rope could instead set a value for check_same_thread after checking the compile time options. I think this would simplify supporting multithreading. This means that if an editor wants to use rope with a Python without sqlite3 multithreading support, it's their responsibility to wrap autoimport in a single worker thread.

I'll need to have a bit more thinking and research on this.

@tkrabel
Copy link
Contributor

tkrabel commented Oct 28, 2023

@lieryan thanks for the investigation and the detailed explanation. I am short on time myself right now, but I wanted to add some data points that may help. Also, I wanted to see if we can solve the issue on python-lsp-server with minimal effort.

sqlite3.threadsafety on python 3.8 returns 1, which means that the module can be shared across threads, but not connections. IIUC, AutoImport is the main entry point for any CRUD operation, and it holds one connection to the data base. So if a user wanted to use rope with threadsafety = 1, she'd need to have one AutoImport object per thread, right?

@lieryan
Copy link
Member

lieryan commented Oct 28, 2023

With threadsafety = 1, yes, they'll need to have one AutoImport per thread. But another thing to be careful about is when using in-memory database, IIUC, whenever you create an unnamed in memory database in sqlite, that creates a new, empty database. So one AutoImport per thread will mean that additional memory usage and additional scanning time for each thread. That would be undesirable.

If we want to recommend one AutoImport per thread as the best way to multithreading, we'll likely want to change AutoImport to use an explicitly named in-memory database so that they'll all connect to the same database. Creating named in memory database seems to be quite simple using a filename like so "file:memdb1?mode=memory&cache=shared", if we name the database with something that uniquely identify the rope Project, maybe the hash of id(project) or the path to the project root, I believe that should work.

I think I would have preferred one AutoImport per thread better than allowing AutoImport to be used from multiple threads, it'll be a much simpler threading model to use and understand, and I think the performance or memory usage difference with creating multiple AutoImport is likely going to be negligible.

@tkrabel
Copy link
Contributor

tkrabel commented Oct 28, 2023

Hm, I'm also thinking through this right now, and I am wondering: what is the problem exposing the check_same_thread to the user? We pick the default value which makes sure the behaviour doesn't change, and we could inform the user to only change it at their own risk. Or am I missing something from the links you provided?

@tkrabel
Copy link
Contributor

tkrabel commented Oct 28, 2023

I was checking multi-threading behaviour on python 3.11, where sqlite3.threadsafety = 3, essentially telling that threads can share the module, connectors, and cursors. However, even from python 3.11, I can still replicate the error in #713.

I think check_same_thread=False needs to be set to avoid that error, independent of what sqlite3.threadsafety says. That said, I interpret sqlite3.threadsafety to say if the OSes python version can guarantee threadsafety, while check_same_thread is an extra safeguard by sqlite3 to prevent users from shooting themselves in the foot.

From my POV, it should be up to the user of a sqlite db to determine if they run into thread safety issues or not, but at least they should be able to turn checks off if they deem their use case safe. Write now, AutoImport blocks users from that, forcing them to proxy request through a single worker thread. I won't push back if that is your intention, but pls let me know so.

In case you are open to add check_same_thread to AutoImport, we could add a warning to check_same_thread, saying that users should know what they do if they turn it off. WDYT @lieryan ?

@lieryan
Copy link
Member

lieryan commented Oct 29, 2023

Can we confirm that we're talking about the same thing here? If by "users" you meant end users, I don't think end users should have to care about the implementation details of rope, 99% of them isn't going to have the contextual understanding to know whether they should or shouldn't use this flag. If instead by users you meant authors of editors/lsp servers, then I can see the merit for exposing the flag, though I still don't think it's the best way to solve the issue.

If we require each thread to create one AutoImport, all we need to do to handle concurrency is just to use standard sqlite transactions; OTOH, if we reuse a single connection in multiple threads, we will also need to implement a way to control access to the connection. Calling AutoImport from multiple threads will open up a whole new class of failure scenarios.

While doing individual database operations should work fine with thread safe sqlite, if there are operations that depends on the atomicity of a sequence of reads and writes and which will require using transaction, reusing the same connection for multiple threads will not really make sense anyway.

If rope is the end application and have full control over how the connection is used from different threads, this might be a-ok, but as rope is a library that can be called from many different clients we don't control, trying to get every implementor to handle concurrency correctly is going to be tricky.

On the other hand, the concurrency issues that can happen if we require one AutoImport per thread rule is mostly going to be the same as the concurrency issues caused by having multiple instances of rope running. For example if you have multiple editors/IDE, or if you run multiple instances of pylsp for the same project. This is a use case we necessarily have to support anyway, so it doesn't create any new failure scenarios.

Also, the sqlite authors stated that while sqlite is threadsafe with the appropriate compile options, they also stated that "Threads are evil. Avoid them." and recommends avoiding threading with sqlite. I would be inclined to heed to the warning.

@lieryan
Copy link
Member

lieryan commented Oct 29, 2023

Ok, I think I have a better idea now. Rather than one AutoImport per thread, I think we can make it so that AutoImport will keep the sqlite connection in a threadlocal. If the editor application always calls rope from a single thread, this has negligible impact for them, everything will continue to work as they currently do. But if they call AutoImport methods from multiple threads, AutoImport will initialise an sqlite connection for each thread. In either case, the editor still uses a single instance of AutoImport so there's minimal change needed on the editor side code, but each threads will internally be served by their own sqlite connection, which avoids needing multi threading access to the same connection.

@tkrabel
Copy link
Contributor

tkrabel commented Oct 29, 2023

Can we confirm that we're talking about the same thing here?

By "users" I mean everyone that uses that library, but I think this mostly likely will be application developers that want to expose rope's features to an end user, so you can say I mean application / editor developers.

Also, the sqlite authors stated that while sqlite is threadsafe with the appropriate compile options, they also stated that "Threads are evil. Avoid them." and recommends avoiding threading with sqlite. I would be inclined to heed to the warning.

Noted.

@tkrabel
Copy link
Contributor

tkrabel commented Oct 29, 2023

Rather than one AutoImport per thread, I think we can make it so that AutoImport will keep the sqlite connection in a threadlocal

I like that approach! It's clean, i.e. the APIs won't change, and we handle the "complexity" of ensuring one connection per thread in the background.

I opened a PR #720

@lieryan
Copy link
Member

lieryan commented Nov 4, 2023

@tkrabel IIUC, with the one connection per thread, changing check_same_thread should no longer be needed, correct? I'm closing this PR under that assumption, please let me know/reopen if otherwise.

@lieryan lieryan closed this Nov 4, 2023
@tkrabel-db
Copy link
Author

@lieryan that is correct. Thanks for cleaning this up! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Calling Autoimport.connection.execute from differnt thread causes error
3 participants