Deficient intel filtering #57

shawn-davis · 2025-01-17T18:47:39Z

Filters out CVE's that don't have enough intel from being passed into the agent to avoid issues involving labelling containers vulnerable to non-existent CVE's.

Closes #56

…m prompt or the path to a text file containing the prompt. This can be done for the agent system prompt, the checklist creation prompt, the summary prompt, and the justification prompt.

Co-authored-by: Ashley Song <[email protected]>

Fixed Llama 3.1 bug with prompt parameter being passed in get_client.

…DME.

…ompting config.

Co-authored-by: Ashley Song <[email protected]>

…ompt_config Added prompt parameter for LLM models to allow for passing in a custo…

…or client creation to ensure compatibility with OpenAI.

…se in filtering.

…Adjusted the output to account for this filtering.

ashsong-nv

I agree with the overall approach of filtering out CVEs with missing intel. Just have some questions on whether we can simplify the approach.

ashsong-nv · 2025-01-22T16:24:56Z

src/cve/stages/convert_to_output_object.py

+        checklist=[
+            ChecklistItemOutput(
+                input="Gather intel for the CVE.",
+                response=
+                "There is insufficient intel available to determine vulnerability. This is either due to the CVE not existing or there is not enough gathered intel for the agent to make an informed decision.",
+                intermediate_steps=None)
+        ],


Currently these placeholder checklist items are only conditionally displayed when the agent is skipped, which has caused confusion for some users. We recently got a question about whether the SBOM check only happens when the "Check SBOM and dependencies for vulnerability" checklist item is shown.

Do you recall why we decided to include the placeholder checklist item? I'm wondering if we could safely omit it for consistency? This would also make it clearer that no checklist was generated by the model.

I put those in to keep the output consistent. Makes parsing easier and ensures that we don't have a bunch of empty fields.

ashsong-nv · 2025-01-22T21:46:06Z

src/cve/stages/convert_to_output_object.py

@@ -97,6 +97,24 @@ def _get_placeholder_output(vuln_id: str) -> AgentMorpheusEngineOutput:
        justification=JUSTIFICATION)


+def _get_deficient_intel_output(vuln_id: str) -> AgentMorpheusEngineOutput:
+    SUMMARY = "There is insufficient intel available to determine vulnerability. This is either due to the CVE not existing or there is not enough gathered intel for the agent to make an informed decision."
+    JUSTIFICATION = JustificationOutput(label="insufficient_intel",


Could we use uncertain as the justification label? Using a standardized value in the CVEJustifyNode would prevent any downstream impacts of unexpected labels.

Suggested change

JUSTIFICATION = JustificationOutput(label="insufficient_intel",

JUSTIFICATION = JustificationOutput(label="uncertain",

We could label it whatever we want. I thought it was helpful to distinguish between the LLM couldn't come up with a definitive label vs. the LLM didn't even look at this CVE (likely because it doesn't actually exist).

Yeah, I do see the value of the label as well. I thought about backing into this info from the summary or the intel object, but this will take more work later on.

My main concern is about breaking any output contracts. Let's double check if the team has any concerns with adding a new justification label. If not, I'm good with it. We did also make another breaking change of adding a newline to the output, so we'll likely need to bump our version anyway.

src/cve/data_models/cve_intel.py

src/cve/pipeline/pipeline.py

ashsong-nv · 2025-01-22T22:43:05Z

src/cve/data_models/cve_intel.py

+        """
+        Logic to determine if the CVE has sufficient intel and can be passed to the agent.
+
+        Returns
+        -------
+        bool
+            True if enough intel has been found for the CVE
+        """
+        sufficiency = False
+        for field_name, field in self.model_fields.items():
+            if isinstance(getattr(self, field_name), IntelSource):
+                if not getattr(self, field_name) is None:
+                    sufficiency = getattr(self, field_name).intel_sufficient or sufficiency
+        return sufficiency


I feel that adding an intel_sufficient property to each intel class, each with different definitions adds a lot of complexity, maybe more than we need to address the invalid CVE case.

What do you think of the simpler approach to just check if each intel object is None? Unfortunately this will require some changes to the intel retrieval to write a None value when the CVE doesn't exist. It seems like we do this for GHSA and NVD, but not the other sources.

"intel": [ { "vuln_id": "CVE-0000-00000", "ghsa": null, "nvd": null, "rhsa": { "bugzilla": { "description": null, "id": null, "url": null }, "details": null, "statement": null, "package_state": null, "upstream_fix": null, "cvss3": null }, "ubuntu": { "description": null, "notes": null, "notices": null, "priority": null, "ubuntu_description": null, "impact": null }, "epss": { "epss": null, "percentile": null, "date": null }, "sufficient_intel": false } ],

I thought it was important to have direct control over the exact definition of sufficient detail. If the only source was EPSS (granted, an extremely unlikely scenario) that would not be sufficient for the agent to make an informed call on the validity. It also allows for explicit adjustment if the intel sources change.

Co-authored-by: Ashley Song <[email protected]>

shawn-davis and others added 14 commits November 20, 2024 17:55

Added prompt parameter for LLM models to allow for passing in a custo…

37f1035

…m prompt or the path to a text file containing the prompt. This can be done for the agent system prompt, the checklist creation prompt, the summary prompt, and the justification prompt.

Update src/cve/pipeline/engine.py

854ebb1

Co-authored-by: Ashley Song <[email protected]>

Update src/cve/pipeline/engine.py

bb9449e

Co-authored-by: Ashley Song <[email protected]>

Changed up prompt config. Made some changes for consistency.

1e0e222

Fixed Llama 3.1 bug with prompt parameter being passed in get_client.

Added prompt parameter to the configuration file reference in the REA…

1f8622d

…DME.

Removed append_checklist_intel to simplify and streamline usage of pr…

5680afa

…ompting config.

Update src/cve/data_models/config.py

fc5d091

Co-authored-by: Ashley Song <[email protected]>

Update src/cve/data_models/config.py

6a7892f

Co-authored-by: Ashley Song <[email protected]>

Merge pull request NVIDIA-AI-Blueprints#16 from shawn-davis/sdavis_pr…

cbc6d9c

…ompt_config Added prompt parameter for LLM models to allow for passing in a custo…

Problem with agent prompt variable and missing some by_alias=True f…

a5da341

…or client creation to ensure compatibility with OpenAI.

Errant paren problem.

0911478

Started adjusting the intel data model to track intel deficiency to u…

d382eb7

…se in filtering.

Added 'intel_sufficient' flag to intel pydantic model for filtering. …

d3a83fa

…Adjusted the output to account for this filtering.

Added docstring for new intel fields and properties.

3cafe92

shawn-davis added the enhancement New feature or request label Jan 17, 2025

shawn-davis requested a review from ashsong-nv January 17, 2025 18:47

shawn-davis self-assigned this Jan 17, 2025

Merge branch 'main' into deficient_intel_filtering

a4bb8b1

ashsong-nv reviewed Jan 22, 2025

View reviewed changes

shawn-davis and others added 2 commits January 22, 2025 17:51

Update src/cve/data_models/cve_intel.py

ca2baa4

Co-authored-by: Ashley Song <[email protected]>

Update src/cve/pipeline/pipeline.py

ec507bf

Co-authored-by: Ashley Song <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deficient intel filtering #57

Deficient intel filtering #57

shawn-davis commented Jan 17, 2025

ashsong-nv left a comment

ashsong-nv Jan 22, 2025

shawn-davis Jan 22, 2025

ashsong-nv Jan 22, 2025

shawn-davis Jan 22, 2025

ashsong-nv Jan 22, 2025

ashsong-nv Jan 22, 2025

shawn-davis Jan 22, 2025

	JUSTIFICATION = JustificationOutput(label="insufficient_intel",
	JUSTIFICATION = JustificationOutput(label="uncertain",

Deficient intel filtering #57

Are you sure you want to change the base?

Deficient intel filtering #57

Conversation

shawn-davis commented Jan 17, 2025

ashsong-nv left a comment

Choose a reason for hiding this comment

ashsong-nv Jan 22, 2025

Choose a reason for hiding this comment

shawn-davis Jan 22, 2025

Choose a reason for hiding this comment

ashsong-nv Jan 22, 2025

Choose a reason for hiding this comment

shawn-davis Jan 22, 2025

Choose a reason for hiding this comment

ashsong-nv Jan 22, 2025

Choose a reason for hiding this comment

ashsong-nv Jan 22, 2025

Choose a reason for hiding this comment

shawn-davis Jan 22, 2025

Choose a reason for hiding this comment