Understanding dataset objects and long running queries #2229

leenyburger · 2024-10-03T17:33:07Z

leenyburger
Oct 3, 2024

I'm struggling with long running queries (I'm on Heroku so everything gets killed after 30 seconds). I added pagination which helps a lot, but on huge databases COUNT DISTINCT queries are timing out.

Right now I have a @results object which is of class: Sequel::Postgres::Dataset.
I'm currently decomposing this in the view using:

<%= render "components/table", headers: @results.columns do %>`
   <% @results.each do |row| %>
     <%= render "components/table_row", values: row.values, headers: @results.columns %>
<% end %>

Would you recommend doing something like
results = @results.all in my controller and kicking it to a background job if it takes more than 20 seconds? Is there a more "sequel friendly" way to handle this?

jeremyevans · 2024-10-03T18:03:56Z

jeremyevans
Oct 3, 2024
Maintainer

If this is a regular view, and things are not customized per request, your best approach would be a background job at a given frequency that does the query you want, and stores the information in a table/cache, and have the results use that table/cache.

You can produce a background job to store the results somewhere, and have your webpage poll until the results are ready. That's a possible solution if the users are OK with waiting a long time for the results. Alternatively, the background job could send the results to the user (via email), assuming you have the user's email address and are comfortable with that approach.

If you can speed up the query, that would help. COUNT DISTINCT is in general difficult to optimize for, so you may want some database denormalization to maintain a counter cache (which is refreshed by a background job or a trigger), assuming such an approach will work.

0 replies

leenyburger · 2024-10-03T19:42:19Z

leenyburger
Oct 3, 2024
Author

This is very helpful, thank you!

0 replies

rgalanakis · 2024-10-03T20:55:24Z

rgalanakis
Oct 3, 2024

It sounds like you're using limit/offset pagination (called many other things as well), which is always going to be slow for large datasets, since it will have to count the rows of the query (there are 10k total items), and also find rows on earlier pages (the last page of 100 items will have to find and sort the first 9900 items to know what the last 100 are). You can consider using cursor-based pagination instead, which isn't as flexible but is extremely fast and doesn't slow down as the dataset gets larger.

Ie, instead of SELECT * from mytable WHERE user_id = 10 ORDER BY id DESC LIMIT 100 OFFSET 9900 (which has to find and sort 10000 items), cursor based pagination does SELECT * FROM mytable WHERE user_id = 10 AND id > 123456 LIMIT 100 (the lower bound id, 123456, was the highest id of the previous page).

There are lots of good articles about limit/offset vs. cursor-based pagination but it's been a while since I've reviewed them so I can't recommend a specific one.

1 reply

leenyburger Oct 3, 2024
Author

ahhh I didn't know that. I'll do some research - ty!
The software isn't really designed for someone to paginate through hundreds of pages, so I might just make them download a .csv for large datasets.
Maybe I can .count the records and prompt the user to find out what they really want

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding dataset objects and long running queries #2229

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Understanding dataset objects and long running queries #2229

leenyburger Oct 3, 2024

Replies: 3 comments · 1 reply

jeremyevans Oct 3, 2024 Maintainer

leenyburger Oct 3, 2024 Author

rgalanakis Oct 3, 2024

leenyburger Oct 3, 2024 Author

leenyburger
Oct 3, 2024

Replies: 3 comments 1 reply

jeremyevans
Oct 3, 2024
Maintainer

leenyburger
Oct 3, 2024
Author

rgalanakis
Oct 3, 2024

leenyburger Oct 3, 2024
Author