Understanding dataset objects and long running queries #2229
Replies: 3 comments 1 reply
-
If this is a regular view, and things are not customized per request, your best approach would be a background job at a given frequency that does the query you want, and stores the information in a table/cache, and have the results use that table/cache. You can produce a background job to store the results somewhere, and have your webpage poll until the results are ready. That's a possible solution if the users are OK with waiting a long time for the results. Alternatively, the background job could send the results to the user (via email), assuming you have the user's email address and are comfortable with that approach. If you can speed up the query, that would help. COUNT DISTINCT is in general difficult to optimize for, so you may want some database denormalization to maintain a counter cache (which is refreshed by a background job or a trigger), assuming such an approach will work. |
Beta Was this translation helpful? Give feedback.
-
This is very helpful, thank you! |
Beta Was this translation helpful? Give feedback.
-
It sounds like you're using limit/offset pagination (called many other things as well), which is always going to be slow for large datasets, since it will have to count the rows of the query (there are 10k total items), and also find rows on earlier pages (the last page of 100 items will have to find and sort the first 9900 items to know what the last 100 are). You can consider using cursor-based pagination instead, which isn't as flexible but is extremely fast and doesn't slow down as the dataset gets larger. Ie, instead of There are lots of good articles about limit/offset vs. cursor-based pagination but it's been a while since I've reviewed them so I can't recommend a specific one. |
Beta Was this translation helpful? Give feedback.
-
I'm struggling with long running queries (I'm on Heroku so everything gets killed after 30 seconds). I added pagination which helps a lot, but on huge databases COUNT DISTINCT queries are timing out.
Right now I have a
@results
object which is of class:Sequel::Postgres::Dataset
.I'm currently decomposing this in the view using:
Would you recommend doing something like
results = @results.all
in my controller and kicking it to a background job if it takes more than 20 seconds? Is there a more "sequel friendly" way to handle this?Beta Was this translation helpful? Give feedback.
All reactions