Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deficient intel filtering #57

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

shawn-davis
Copy link
Collaborator

Filters out CVE's that don't have enough intel from being passed into the agent to avoid issues involving labelling containers vulnerable to non-existent CVE's.

Closes #56

shawn-davis and others added 14 commits November 20, 2024 17:55
…m prompt or the path to a text file containing the prompt. This can be done for the agent system prompt, the checklist creation prompt, the summary prompt, and the justification prompt.
Fixed Llama 3.1 bug with prompt parameter being passed in get_client.
…ompt_config

Added prompt parameter for LLM models to allow for passing in a custo…
…or client creation to ensure compatibility with OpenAI.
…Adjusted the output to account for this filtering.
@shawn-davis shawn-davis added the enhancement New feature or request label Jan 17, 2025
@shawn-davis shawn-davis self-assigned this Jan 17, 2025
Copy link
Collaborator

@ashsong-nv ashsong-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the overall approach of filtering out CVEs with missing intel. Just have some questions on whether we can simplify the approach.

Comment on lines +107 to +113
checklist=[
ChecklistItemOutput(
input="Gather intel for the CVE.",
response=
"There is insufficient intel available to determine vulnerability. This is either due to the CVE not existing or there is not enough gathered intel for the agent to make an informed decision.",
intermediate_steps=None)
],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently these placeholder checklist items are only conditionally displayed when the agent is skipped, which has caused confusion for some users. We recently got a question about whether the SBOM check only happens when the "Check SBOM and dependencies for vulnerability" checklist item is shown.

Do you recall why we decided to include the placeholder checklist item? I'm wondering if we could safely omit it for consistency? This would also make it clearer that no checklist was generated by the model.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put those in to keep the output consistent. Makes parsing easier and ensures that we don't have a bunch of empty fields.

@@ -97,6 +97,24 @@ def _get_placeholder_output(vuln_id: str) -> AgentMorpheusEngineOutput:
justification=JUSTIFICATION)


def _get_deficient_intel_output(vuln_id: str) -> AgentMorpheusEngineOutput:
SUMMARY = "There is insufficient intel available to determine vulnerability. This is either due to the CVE not existing or there is not enough gathered intel for the agent to make an informed decision."
JUSTIFICATION = JustificationOutput(label="insufficient_intel",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use uncertain as the justification label? Using a standardized value in the CVEJustifyNode would prevent any downstream impacts of unexpected labels.

Suggested change
JUSTIFICATION = JustificationOutput(label="insufficient_intel",
JUSTIFICATION = JustificationOutput(label="uncertain",

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could label it whatever we want. I thought it was helpful to distinguish between the LLM couldn't come up with a definitive label vs. the LLM didn't even look at this CVE (likely because it doesn't actually exist).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I do see the value of the label as well. I thought about backing into this info from the summary or the intel object, but this will take more work later on.

My main concern is about breaking any output contracts. Let's double check if the team has any concerns with adding a new justification label. If not, I'm good with it. We did also make another breaking change of adding a newline to the output, so we'll likely need to bump our version anyway.

src/cve/data_models/cve_intel.py Show resolved Hide resolved
src/cve/data_models/cve_intel.py Outdated Show resolved Hide resolved
src/cve/pipeline/pipeline.py Outdated Show resolved Hide resolved
Comment on lines +223 to +236
"""
Logic to determine if the CVE has sufficient intel and can be passed to the agent.
Returns
-------
bool
True if enough intel has been found for the CVE
"""
sufficiency = False
for field_name, field in self.model_fields.items():
if isinstance(getattr(self, field_name), IntelSource):
if not getattr(self, field_name) is None:
sufficiency = getattr(self, field_name).intel_sufficient or sufficiency
return sufficiency
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that adding an intel_sufficient property to each intel class, each with different definitions adds a lot of complexity, maybe more than we need to address the invalid CVE case.

What do you think of the simpler approach to just check if each intel object is None? Unfortunately this will require some changes to the intel retrieval to write a None value when the CVE doesn't exist. It seems like we do this for GHSA and NVD, but not the other sources.

        "intel": [
            {
                "vuln_id": "CVE-0000-00000",
                "ghsa": null,
                "nvd": null,
                "rhsa": {
                    "bugzilla": {
                        "description": null,
                        "id": null,
                        "url": null
                    },
                    "details": null,
                    "statement": null,
                    "package_state": null,
                    "upstream_fix": null,
                    "cvss3": null
                },
                "ubuntu": {
                    "description": null,
                    "notes": null,
                    "notices": null,
                    "priority": null,
                    "ubuntu_description": null,
                    "impact": null
                },
                "epss": {
                    "epss": null,
                    "percentile": null,
                    "date": null
                },
                "sufficient_intel": false
            }
        ],

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was important to have direct control over the exact definition of sufficient detail. If the only source was EPSS (granted, an extremely unlikely scenario) that would not be sufficient for the agent to make an informed call on the validity. It also allows for explicit adjustment if the intel sources change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filter CVE's without adequate intel
2 participants