Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CLI support for filtering out rows during output #1098

Merged
merged 3 commits into from
Feb 22, 2024

Conversation

ikelos
Copy link
Member

@ikelos ikelos commented Feb 8, 2024

This pull request adds support for filtering columns based on the following format:

--filters [+-][columnname,]pattern[!] --filters [+-][column2,]pattern2[!]

This allows for multiple filters, each filter can specify an (optional) column name (if no column name is supplied, all columns will be checked for a match) which is case insensitive, if the column cannot be found it's treated as no column name. If the value is prefixed with a - rows are excluded, otherwise the pattern must be found for a row to be included. pattern is used as a substring unless the pattern ends in ! in which case it's treated as a regular expression.

Some things we could improve:

  • test cases
  • currently should return any row that matches one of the filters, we might want to allow conjunction of filters as well as disjunction (and rather than or)
  • improve warning for invalid/typos in column names
  • Better support for typed output of column data (currently tries to convert using a format string, if that fails the column is ignored). This will require rejigging/unifying how columns are rendered to text and will likely result in more of a performance impact that filtering currently has.
  • Implement filtering for more than just quick and pretty output renderers (such as CSV, json)
  • Terminology, so that it's clear whether a filter is things to throw away or things to keep.

@ikelos ikelos marked this pull request as draft February 8, 2024 23:59
@ikelos ikelos marked this pull request as ready for review February 18, 2024 22:03
@adiego8
Copy link

adiego8 commented Feb 19, 2024

I was testing this branch trying to filter windows.pslist output by name: e.g windows.pslist --filters +ImageFileName,service.exe is that the correct syntax? I tried many other filters but the result is always nothing or some error, so I was not able to make it work. Could you please provide an example of a filter for some plugin? Or let me know if I am missing something. Thanks!

@ikelos
Copy link
Member Author

ikelos commented Feb 19, 2024

Hiya, just to check --filters needs to go before the windows.pslist because it's a UI parameter, not a plugin parameter. I've just tested it on an image I've got and vol.py --filters +ImageFileName,services.exe -f image.img windows.pslist returns a single entry. In your line you ask it to filter on service.exe, did you mean services.exe (plural)? If there is no row with service.exe you'll get no results, as you're finding. You could also try --filters +ImageFileName,service, since as a substring match, service.exe or services.exe would match...

@adiego8
Copy link

adiego8 commented Feb 20, 2024

Got it, I tested it and it works, it does resolve #1077 .
I meant service.exe as a placeholder for "some service", but as you mentioned it does work with services.exe.
My impression is that the user experience of filtering is different compared to vol2, but correctly documented, this could be useful. Thanks!

@ikelos
Copy link
Member Author

ikelos commented Feb 20, 2024

Yep, we're not out to duplicate how we did things in vol 2, we're out to produce a properly thought out and designed system that allows people to do what they need to do. Having worked on vol 2, I think I'm allowed to say that a number of the design decisions were driven by users and speed of solution, rather than proper planning or consideration of how best to achieve the goals required. 5;) I'm glad this is useful and resolves #1077. I'm kinda of happier with this PR than with #1081 so this will probably land sooner, hopefully in a few days to give people an opportunity to test it... 5:)

@ikelos ikelos merged commit 887747e into develop Feb 22, 2024
26 checks passed
@ikelos ikelos deleted the feature/output-filter branch February 22, 2024 11:00
@atcuno
Copy link
Contributor

atcuno commented Mar 12, 2024

Hey @ikelos,

From testing the new filtering mechanism, I believe that all plugins that support --dump will need to also support the in-plugin filtering of data and not just the output filtering of rows. In particular, the current filtering implementation being in the UI (--filters) makes it where a significant amount of data that is not wanted or requested by the investigator is produced by the extraction plugins.

For example, the following shows --filters being used on the name column for svchost inside of one PID.

$ rm *dmp
$ python3 vol.py --filters +name,svchost  -q -f sample.lime windows.dlllist --dump --pid 2964
Volatility 3 Framework 2.7.0
PID      Process            Base    Size      Name  Path     LoadTime        File output
2964    svchost.exe     0x7ff7fbb40000          0x10000          svchost.exe     C:\Windows\system32\svchost.exe  2024-03-04 20:36:24.000000             pid.2964.svchost.exe.0x21030e03c80.0x7ff7fbb40000.dmp
$ ls *dmp | wc -l
49

Before running the plugin, I cleared all previous files ending in .dmp. I then ran the plugin, and as can be seen, only one line of output is shown on the command line, but 49 files were produced since the filtering is just on the text output and not the operations of the plugin. Expanded to all processes, this would produce thousands of files that were not requested, wasting significant processing time and disk space, and it would also be very confusing for users.

A similar, but worse, situation occurs with the dumpfiles plugin. The following invocation shows just wanting to extract mapped ntdll.dll files.

$ python3 vol.py --filters +name,ntdll.dll  -q -f sample.lime windows.dumpfiles
Volatility 3 Framework 2.7.0
Cache  FileObject        FileName         Result
ImageSectionObject   0xde08009cb0a0        ntdll.dll            file.0xde08009cb0a0.0xde08009fcd20.ImageSectionObject.ntdll.dll.img
ImageSectionObject   0xde08009d6a30        ntdll.dll            file.0xde08009d6a30.0xde08009fd010.ImageSectionObject.ntdll.dll.img^CTraceback (most recent call last):
$ ls file* | wc -l
135

I let the plugins run for a bit and then ctrl+c to end it early. As shown, two copies of ntdll were found and shown on the command line, but 135 files in total were already extracted, and likely over a thousand would have if I let the plugin finish.

Given that we have seen Windows samples from busy systems with 300-500+ processes and 1000s of cached files, having the plugins do all the extra work of processing all the data, even when the user requests only a subset, is going to break workflows for many of our users as the disk space usage and performance will be unacceptable. Particularly with one of the main benefits of Volatility 3 being the automated profile detection, many users are already taking advantage of it to mass scan dozens or hundreds of memory samples across their environment for triage.

Given what you said above about "If there are specific cases where the plugin could significantly change its behaviour (reducing the time required to scan, or processing the data to carry out the filtering) then we could still add those as parameters as well as the generic UI filtering mechanism." then I think any plugin that supports --dump will need to have options for filtering inside the plugin to ensure only the requested data is worked on and extracted. The question would then be if these plugins would look for --filter options to dictate what work is done or look for other options, such as --name on its own, to decide what data to process.

@ikelos ikelos restored the feature/output-filter branch May 15, 2024 20:09
@ikelos ikelos deleted the feature/output-filter branch May 15, 2024 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants