Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added multicolumn aggregation to DBA and improved three essential parts which suffer from many chunks #1069

Merged
merged 4 commits into from
Nov 21, 2024

Conversation

s3inlc
Copy link
Member

@s3inlc s3inlc commented Jun 20, 2024

This PR solves some of the issues with loading/handling of systems with large amount of tasks or larger number of agents:

  • When creating chunks, so far there was a global lock file in place which was locked no matter for which task a chunk was about to be created. It is enough to lock by each task, as creating two chunks for two different tasks at the same time poses no problem.
  • When having active tasks with very large amount of chunks, the chunk assignment on the server may take so long, that the agent thinks the connection has timed out (30s). This can lead to situations where all agents end up in the loop of requesting chunks/tasks and the server is not able to respond fast enough anymore due to an active task it has to check where it looped over all chunks (which obviously took time the larger the number of chunks). This is solved by the changes in TaskUtils where instead of looping over all existing chunks, it is only looping over the non-completed ones and uses SUM in sql to determine the other needed values.
  • In general there are a lot of places in the code, where multiple queries are done summing/counting/maxing over columns (or even worse, in most of the places, the entries are loaded and looped over in the code). When having a lot of chunks, the biggest issue happened in the getTaskInfo() function, which is used by the current old UI, causing very long loading times of the tasks page if tasks with many chunks were listed. In order to tackle this, the DBA was extended slightly to be able to sum/count/max/.. over multiple columns with the same query (as long as the where conditions were the same for all them). In the specific case of the getTaskInfo() function, this reduced the loading times by 5-6 times approximately.

The DBA function change was already adapted for the getChunkInfo() function as well, but there may be many places still around where potentially using either a aggregation function alone or using the newly added multicolumn aggregation could give potential additional speedups.

zyronix and others added 3 commits April 22, 2024 10:48
Adding loops to scan through lines to support importing hashes longer then 1024 bytes
@s3inlc s3inlc requested a review from zyronix June 20, 2024 12:08
@s3inlc s3inlc changed the base branch from master to dev September 27, 2024 13:46
@s3inlc s3inlc requested review from jessevz and removed request for zyronix September 27, 2024 13:47
src/inc/Util.class.php Outdated Show resolved Hide resolved
src/inc/Util.class.php Outdated Show resolved Hide resolved
@jessevz jessevz self-assigned this Oct 14, 2024
@s3inlc s3inlc requested a review from jessevz November 20, 2024 10:41
@jessevz jessevz merged commit fe16f39 into dev Nov 21, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🎉 Done
Status: 🎉 Done
Development

Successfully merging this pull request may close these issues.

3 participants