Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new plugin: windows.notepad.Notepad #1082

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

its5Q
Copy link

@its5Q its5Q commented Jan 11, 2024

Adds a new plugin that is an alternative to volatility2's notepad plugin. Instead of parsing heap structures, it uses simple memory pattern matching to find potential places where the displayed text might reside. It can produce false positives, but I've tried to reduce them to a minimum by filtering strings shorter or equal to 8 characters, and using a table of unicode characters seen in CommonCrawl to remove strings that contain unicode characters that are technically valid, but didn't appear even once in the wild web.

Partially addresses issue #710.

Copy link
Member

@ikelos ikelos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got to say, I'm pretty uncomfortable about this plugin? I wouldn't ever add it to framework/plugins. It feels as though it's entirely heuristic based without any formal forensic analysis or backing. It might work occasionally but I'm concerned about adding it to what is otherwise a forensic framework that may be relied upon for evidence. It also feels as though this could be applied to programs other than notepad and still return partial results? Is this notepad specific or just a general text search through program VADs?

At the moment this is still in limbo and I'm not sure whether it'll ever get added, I need to think over it and discuss it with the others a bit I'm afraid...

proc_id = "Unknown"
try:
proc_id = proc.UniqueProcessId
proc_layer_name = proc.add_process_layer()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer proc_layer_name being given a bogus default initalization, much like proc_id above, although proc_id above should initialize to a number or None. To support the unknown name, change the debug message to be proc_id or 'Unknown' although note that if proc_id is then 0 this will result in the layer being referred to as Unknown.

if private and protection == 'PAGE_READWRITE' and vad_size < 16777216 and tag in ('VadS', 'Vadm'):
try:
data = proc_layer.read(vad_start, vad_size, pad=True)
if not data:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're padding, so this will always return a binary string of vad_size, so this check should'nt ever fire unless vad_size is 0, which you could check before attempting the read, so this doesn't seem useful.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, that makes sense. It was part of the code from the VadInfo plugin that read the VADs in chunks. I've removed the chunking and forgot to this check, so it wasn't intentional, and even if it would fire, it would break out of the loop over the VAD list, which I certaintly didn't intend to do.

Copy link
Member

@ikelos ikelos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to add this first time round, but this much data doesn't belong in the program code, it should be in auxiliary file. Also pointing out the code where it came from, where the code is just another arbitrary program doesn't really act as evidence that this is a suitable or accurate means of identifying appropriate strings...


# I need this huge set for filtering out garbage strings, as I rely on a memory pattern to find the displayed text
# Taken from https://github.com/glmcdona/binary2strings/blob/02f0c983aeede7ed9acbee96a43f9142dc7ba404/src/binary2strings.hpp#L57
COMMONCRAWL_CHARS = {0x20, 0x65, 0x61, 0x69, 0x6f, 0x74, 0x6e, 0x72, 0x73, 0xa, 0x6c, 0x64, 0x63, 0x75, 0x68, 0x6d,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This large a chunk of data doesn't belong in the plugin source, it should be an additional data file.

@its5Q
Copy link
Author

its5Q commented Jan 16, 2024

I've got to say, I'm pretty uncomfortable about this plugin? I wouldn't ever add it to framework/plugins. It feels as though it's entirely heuristic based without any formal forensic analysis or backing. It might work occasionally but I'm concerned about adding it to what is otherwise a forensic framework that may be relied upon for evidence. It also feels as though this could be applied to programs other than notepad and still return partial results? Is this notepad specific or just a general text search through program VADs?

At the moment this is still in limbo and I'm not sure whether it'll ever get added, I need to think over it and discuss it with the others a bit I'm afraid...

Yeah, totally understandable, I agree with you. Being based exclusively on heuristics of where the searched text might be, this is not the best approach for a forensics framework that aims to produce reliable output. I plan on studying NT and LFH heap internals so maybe I could figure something out with that, but for now I think it would be best to make this PR a draft. I'm very much a beginner in writing code to work with memory dumps, and I'm eager to learn, so I hope I don't abandon this idea :D

@its5Q its5Q marked this pull request as draft January 16, 2024 05:52
@ikelos
Copy link
Member

ikelos commented Jan 16, 2024

That sounds like the best course of action, that way people can find and use the code if they want it, but it's clearly not part of the main codebase. Good luck with the heap support! I believe the volatility 2 notepad plugin only works up to a particular version of windows, but if you can find one that legitimately reads the right structures to get the actual contents then we'd love to have it included as a core plugin... Thanks very much for your contribution, we're happy to leave it here in draft for people to find and we look forward to seeing what you come up with in the future! 5:D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants