Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META] Indexing and agent management UI #410

Open
msm-cert opened this issue Sep 29, 2024 · 5 comments
Open

[META] Indexing and agent management UI #410

msm-cert opened this issue Sep 29, 2024 · 5 comments

Comments

@msm-cert
Copy link
Member

msm-cert commented Sep 29, 2024

This is a very ambitious project, that is planned as a main feature of v1.5 release (and the last mquery feature).

The idea is to make it possible to configure and manage every (well, almost every) aspect of mquery from the web UI. Especially indexing, which was the most repeatedly asked feature request since mquery was created.

To do this we'll have to heavily redesign... a lot of things in the UI. In return we'll get a more user-friendly interface overall, I hope.

Another thing we have to do is to add a mquery "manager" agent to ursadb. This is unfortunately not avoidable, for reasons described below. This comment is a meta-comment for tracking the index management UI. I'll add more detailed tasks later.

@msm-cert
Copy link
Member Author

I'll just use screenshots from draw.io, sorry about that.

First of all, we need a new kind of agent, because regular agents are not guaranteed to be on the same machine as ursadb. From the beginning it was supported (and, AFAIK, used in production in most cases) to have mquery workers on a different machine than ursadb. So the architecture will look like this (grey rectangles represent separate machines):

image

To achieve this we need to:

  • Create a new type of agent, called "ursadb manager". Implementation-wise it may be a regular rq deamon, but it must listen to another queue (with index management tasks)
  • Build a managed-ursadb docker image in mquery (a ursadb docker image that will start ursadb and ursadb manager agent)

Technically we don't need to support "unmanaged" ursadb instances after this release, but it shouldn't be hard so I don't see why not.

@msm-cert
Copy link
Member Author

msm-cert commented Sep 29, 2024

So now the interface. I've sketched some rough ideas in draw.io. I'm not overly attached to them, but they contain most of the things I think are important, divided into subpages in a logical way, while also making sure mquery stays flexible (see above for various weird configurations we support).

image

This main status page contains the configuration view of current mquery, but also extend it with a manageable list of agent groups (instead of a flat list of agent groups, like previously)

While designing this I've realised that we also need to implement queues (otherwise reindexing automatically may create extremely inefficient database). This is also necessary to support s3 indexing.

So yeah, indexing queue feature:

  • A list of tuples (timestamp, agent group name, file path) in the database
  • A way to add to this list by making a HTTP POST to mquery web
  • "Index queue" command to the agent, that will index all files from a given queue (this should be easy to implement - just send a ursadb command via socket)

@msm-cert
Copy link
Member Author

msm-cert commented Sep 29, 2024

The agent page. Exposes most of the things from the old status page, and:

  • A list of configured storage locations (of course, for the storage management feature). This included adding them (adding probably needs a model UI, to get a name of a new storage location?)
  • I think it would be neat to show a list of connected agents (even though it's not super important)

image

"Unmanaged agent group" page should be the same, since "managed agent groups" are just agent groups with one (and exactly one) manager active. But it's entirely possible that a group has some sources configured, but management agent is temporarily down.

@msm-cert
Copy link
Member Author

msm-cert commented Sep 29, 2024

Finally, storage page. I've prepared two sketches when thinking about this. First, for s3 storage it would look like this:

image

Here when indexing files, we process them using S3Plugin goes very roughly like this:

  1. "Hey ursadb manager, index files /s3/alpha and /s3/beta"
  2. Ursadb manager executes all filter plugins in order. S3Plugin takes path /s3/alpha, downloads file "alpha" from a configured bucket into /s3/alpha, and /s3/beta similarly. /s3/ base directory is just an example, it could be /mnt/samples for example.
  3. Ursadb manager finally sends the index command to ursadb, and asks to index downloaded files

Sadly we also need a sub-UI to add indexing rules and plugins.

The process is divided into collecting and indexing:

  • scanning a file source adds files to the queue
  • or adding files via a web endpoint adds files to the queue
  • scanning is executed periodically (as configured) or on user request

Indexing sends a ursadb index request, and happens in one of three cases:

  • number of files in the queue is larger than index threshold
  • the oldest file in the queue waits for more than max queue wait time
  • user sends indexing request manually

It's unclear how to implement scheduling functionality, but I think the work can be handled by manager agent, and scheduling by rq-scheduler.

@msm-cert
Copy link
Member Author

Finally, this is how this UI coukd look for a local directory:

image

In this case we have two indexing rules and two plugins, because we want to index small and medium files separately.

This will cause a small performance regression in comparison to the current index.py script (because a file tree needs to be traversed twice), but the slowest operation is indexing anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant