Implement CLI support for filtering out rows during output #1098

ikelos · 2024-02-08T23:57:10Z

This pull request adds support for filtering columns based on the following format:

--filters [+-][columnname,]pattern[!] --filters [+-][column2,]pattern2[!]

This allows for multiple filters, each filter can specify an (optional) column name (if no column name is supplied, all columns will be checked for a match) which is case insensitive, if the column cannot be found it's treated as no column name. If the value is prefixed with a - rows are excluded, otherwise the pattern must be found for a row to be included. pattern is used as a substring unless the pattern ends in ! in which case it's treated as a regular expression.

Some things we could improve:

test cases
currently should return any row that matches one of the filters, we might want to allow conjunction of filters as well as disjunction (and rather than or)
improve warning for invalid/typos in column names
Better support for typed output of column data (currently tries to convert using a format string, if that fails the column is ignored). This will require rejigging/unifying how columns are rendered to text and will likely result in more of a performance impact that filtering currently has.
Implement filtering for more than just quick and pretty output renderers (such as CSV, json)
Terminology, so that it's clear whether a filter is things to throw away or things to keep.

adiego8 · 2024-02-19T17:25:32Z

I was testing this branch trying to filter windows.pslist output by name: e.g windows.pslist --filters +ImageFileName,service.exe is that the correct syntax? I tried many other filters but the result is always nothing or some error, so I was not able to make it work. Could you please provide an example of a filter for some plugin? Or let me know if I am missing something. Thanks!

ikelos · 2024-02-19T23:37:37Z

Hiya, just to check --filters needs to go before the windows.pslist because it's a UI parameter, not a plugin parameter. I've just tested it on an image I've got and vol.py --filters +ImageFileName,services.exe -f image.img windows.pslist returns a single entry. In your line you ask it to filter on service.exe, did you mean services.exe (plural)? If there is no row with service.exe you'll get no results, as you're finding. You could also try --filters +ImageFileName,service, since as a substring match, service.exe or services.exe would match...

adiego8 · 2024-02-20T15:36:32Z

Got it, I tested it and it works, it does resolve #1077 .
I meant service.exe as a placeholder for "some service", but as you mentioned it does work with services.exe.
My impression is that the user experience of filtering is different compared to vol2, but correctly documented, this could be useful. Thanks!

ikelos · 2024-02-20T20:55:53Z

Yep, we're not out to duplicate how we did things in vol 2, we're out to produce a properly thought out and designed system that allows people to do what they need to do. Having worked on vol 2, I think I'm allowed to say that a number of the design decisions were driven by users and speed of solution, rather than proper planning or consideration of how best to achieve the goals required. 5;) I'm glad this is useful and resolves #1077. I'm kinda of happier with this PR than with #1081 so this will probably land sooner, hopefully in a few days to give people an opportunity to test it... 5:)

atcuno · 2024-03-12T18:46:11Z

Hey @ikelos,

From testing the new filtering mechanism, I believe that all plugins that support --dump will need to also support the in-plugin filtering of data and not just the output filtering of rows. In particular, the current filtering implementation being in the UI (--filters) makes it where a significant amount of data that is not wanted or requested by the investigator is produced by the extraction plugins.

For example, the following shows --filters being used on the name column for svchost inside of one PID.

$ rm *dmp
$ python3 vol.py --filters +name,svchost  -q -f sample.lime windows.dlllist --dump --pid 2964
Volatility 3 Framework 2.7.0
PID      Process            Base    Size      Name  Path     LoadTime        File output
2964    svchost.exe     0x7ff7fbb40000          0x10000          svchost.exe     C:\Windows\system32\svchost.exe  2024-03-04 20:36:24.000000             pid.2964.svchost.exe.0x21030e03c80.0x7ff7fbb40000.dmp
$ ls *dmp | wc -l
49

Before running the plugin, I cleared all previous files ending in .dmp. I then ran the plugin, and as can be seen, only one line of output is shown on the command line, but 49 files were produced since the filtering is just on the text output and not the operations of the plugin. Expanded to all processes, this would produce thousands of files that were not requested, wasting significant processing time and disk space, and it would also be very confusing for users.

A similar, but worse, situation occurs with the dumpfiles plugin. The following invocation shows just wanting to extract mapped ntdll.dll files.

$ python3 vol.py --filters +name,ntdll.dll  -q -f sample.lime windows.dumpfiles
Volatility 3 Framework 2.7.0
Cache  FileObject        FileName         Result
ImageSectionObject   0xde08009cb0a0        ntdll.dll            file.0xde08009cb0a0.0xde08009fcd20.ImageSectionObject.ntdll.dll.img
ImageSectionObject   0xde08009d6a30        ntdll.dll            file.0xde08009d6a30.0xde08009fd010.ImageSectionObject.ntdll.dll.img^CTraceback (most recent call last):
$ ls file* | wc -l
135

I let the plugins run for a bit and then ctrl+c to end it early. As shown, two copies of ntdll were found and shown on the command line, but 135 files in total were already extracted, and likely over a thousand would have if I let the plugin finish.

Given that we have seen Windows samples from busy systems with 300-500+ processes and 1000s of cached files, having the plugins do all the extra work of processing all the data, even when the user requests only a subset, is going to break workflows for many of our users as the disk space usage and performance will be unacceptable. Particularly with one of the main benefits of Volatility 3 being the automated profile detection, many users are already taking advantage of it to mass scan dozens or hundreds of memory samples across their environment for triage.

Given what you said above about "If there are specific cases where the plugin could significantly change its behaviour (reducing the time required to scan, or processing the data to carry out the filtering) then we could still add those as parameters as well as the generic UI filtering mechanism." then I think any plugin that supports --dump will need to have options for filtering inside the plugin to ensure only the requested data is worked on and extracted. The question would then be if these plugins would look for --filter options to dictate what work is done or look for other options, such as --name on its own, to decide what data to process.

ikelos added 2 commits February 8, 2024 23:44

CLI: Add support for filtering lines from output

63a325c

CLI: Don't forget the core text filter class

d8d3e15

ikelos marked this pull request as draft February 8, 2024 23:59

CLI: Ensure no filters still returns results

865f004

ikelos marked this pull request as ready for review February 18, 2024 22:03

This was referenced Feb 18, 2024

Windows: Add --name option to pslist plugin #1077

Open

Windows: Add filtering by type to handles. #1072

Open

ikelos merged commit 887747e into develop Feb 22, 2024
26 checks passed

ikelos deleted the feature/output-filter branch February 22, 2024 11:00

ikelos restored the feature/output-filter branch May 15, 2024 20:09

ikelos deleted the feature/output-filter branch May 15, 2024 20:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement CLI support for filtering out rows during output #1098

Implement CLI support for filtering out rows during output #1098

ikelos commented Feb 8, 2024 •

edited

Loading

adiego8 commented Feb 19, 2024

ikelos commented Feb 19, 2024

adiego8 commented Feb 20, 2024

ikelos commented Feb 20, 2024

atcuno commented Mar 12, 2024

Implement CLI support for filtering out rows during output #1098

Implement CLI support for filtering out rows during output #1098

Conversation

ikelos commented Feb 8, 2024 • edited Loading

adiego8 commented Feb 19, 2024

ikelos commented Feb 19, 2024

adiego8 commented Feb 20, 2024

ikelos commented Feb 20, 2024

atcuno commented Mar 12, 2024

ikelos commented Feb 8, 2024 •

edited

Loading