[META] Indexing and agent management UI #410

msm-cert · 2024-09-29T19:34:18Z

This is a very ambitious project, that is planned as a main feature of v1.5 release (and the last mquery feature).

The idea is to make it possible to configure and manage every (well, almost every) aspect of mquery from the web UI. Especially indexing, which was the most repeatedly asked feature request since mquery was created.

To do this we'll have to heavily redesign... a lot of things in the UI. In return we'll get a more user-friendly interface overall, I hope.

Another thing we have to do is to add a mquery "manager" agent to ursadb. This is unfortunately not avoidable, for reasons described below. This comment is a meta-comment for tracking the index management UI. I'll add more detailed tasks later.

msm-cert · 2024-09-29T19:46:57Z

I'll just use screenshots from draw.io, sorry about that.

First of all, we need a new kind of agent, because regular agents are not guaranteed to be on the same machine as ursadb. From the beginning it was supported (and, AFAIK, used in production in most cases) to have mquery workers on a different machine than ursadb. So the architecture will look like this (grey rectangles represent separate machines):

To achieve this we need to:

Create a new type of agent, called "ursadb manager". Implementation-wise it may be a regular rq deamon, but it must listen to another queue (with index management tasks)
Build a managed-ursadb docker image in mquery (a ursadb docker image that will start ursadb and ursadb manager agent)

Technically we don't need to support "unmanaged" ursadb instances after this release, but it shouldn't be hard so I don't see why not.

msm-cert · 2024-09-29T20:02:30Z

So now the interface. I've sketched some rough ideas in draw.io. I'm not overly attached to them, but they contain most of the things I think are important, divided into subpages in a logical way, while also making sure mquery stays flexible (see above for various weird configurations we support).

This main status page contains the configuration view of current mquery, but also extend it with a manageable list of agent groups (instead of a flat list of agent groups, like previously)

While designing this I've realised that we also need to implement queues (otherwise reindexing automatically may create extremely inefficient database). This is also necessary to support s3 indexing.

So yeah, indexing queue feature:

A list of tuples (timestamp, agent group name, file path) in the database
A way to add to this list by making a HTTP POST to mquery web
"Index queue" command to the agent, that will index all files from a given queue (this should be easy to implement - just send a ursadb command via socket)

msm-cert · 2024-09-29T20:15:25Z

The agent page. Exposes most of the things from the old status page, and:

A list of configured storage locations (of course, for the storage management feature). This included adding them (adding probably needs a model UI, to get a name of a new storage location?)
I think it would be neat to show a list of connected agents (even though it's not super important)

"Unmanaged agent group" page should be the same, since "managed agent groups" are just agent groups with one (and exactly one) manager active. But it's entirely possible that a group has some sources configured, but management agent is temporarily down.

msm-cert · 2024-09-29T20:23:08Z

Finally, storage page. I've prepared two sketches when thinking about this. First, for s3 storage it would look like this:

Here when indexing files, we process them using S3Plugin goes very roughly like this:

"Hey ursadb manager, index files /s3/alpha and /s3/beta"
Ursadb manager executes all filter plugins in order. S3Plugin takes path /s3/alpha, downloads file "alpha" from a configured bucket into /s3/alpha, and /s3/beta similarly. /s3/ base directory is just an example, it could be /mnt/samples for example.
Ursadb manager finally sends the index command to ursadb, and asks to index downloaded files

Sadly we also need a sub-UI to add indexing rules and plugins.

The process is divided into collecting and indexing:

scanning a file source adds files to the queue
or adding files via a web endpoint adds files to the queue
scanning is executed periodically (as configured) or on user request

Indexing sends a ursadb index request, and happens in one of three cases:

number of files in the queue is larger than index threshold
the oldest file in the queue waits for more than max queue wait time
user sends indexing request manually

It's unclear how to implement scheduling functionality, but I think the work can be handled by manager agent, and scheduling by rq-scheduler.

msm-cert · 2024-09-29T20:39:38Z

Finally, this is how this UI coukd look for a local directory:

In this case we have two indexing rules and two plugins, because we want to index small and medium files separately.

This will cause a small performance regression in comparison to the current index.py script (because a file tree needs to be traversed twice), but the slowest operation is indexing anyway.

msm-cert mentioned this issue Sep 30, 2024

[Meta] Release v1.5 #398

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[META] Indexing and agent management UI #410

[META] Indexing and agent management UI #410

msm-cert commented Sep 29, 2024 •

edited

Loading

msm-cert commented Sep 29, 2024

msm-cert commented Sep 29, 2024 •

edited

Loading

msm-cert commented Sep 29, 2024 •

edited

Loading

msm-cert commented Sep 29, 2024 •

edited

Loading

msm-cert commented Sep 29, 2024

[META] Indexing and agent management UI #410

[META] Indexing and agent management UI #410

Comments

msm-cert commented Sep 29, 2024 • edited Loading

msm-cert commented Sep 29, 2024

msm-cert commented Sep 29, 2024 • edited Loading

msm-cert commented Sep 29, 2024 • edited Loading

msm-cert commented Sep 29, 2024 • edited Loading

msm-cert commented Sep 29, 2024

msm-cert commented Sep 29, 2024 •

edited

Loading

msm-cert commented Sep 29, 2024 •

edited

Loading

msm-cert commented Sep 29, 2024 •

edited

Loading

msm-cert commented Sep 29, 2024 •

edited

Loading