-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add datadog certifier #2366
base: main
Are you sure you want to change the base?
Add datadog certifier #2366
Conversation
As a general comment, I wonder if we want to call it something more specific than "DataDog"? "DataDog Malicious Packages DataSet" is unwieldy, but I'm concerned that there might be some future thing that pulls from DataDog proper and the name is already taken. I don't have any great ideas and this may not be a concern worth worrying about right now, but I wanted to raise it. |
yeah, that is a solid point, if DataDog eventually spin out other datasets, I can see how that might cause some confusion. The data itself mostly comes from GuardDog but I think not exclusively. Maybe we can go with something like |
6be472b
to
d2f86e2
Compare
Thanks @robert-cronin! Sorry for the delay. We will review this soon! |
No problems, thanks @pxp928! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a super cool addition. I wasn't aware of this dataset but this was a really cool implementation and its such a good example on how to add another data source easily (or at least you made it look easy! - any feedback on how to make this easier would be super great as well, or any particular frictions you had). Thanks so much for yet another great contribution! 🙌
opt(d) | ||
} | ||
|
||
if err := d.fetchManifests(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like manifests are fetched once on initialization. Given the database will be updated regularly - it would be helpful to refresh the manifests based on some frequency. Is this feasible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will add it in!
if pkgInput.Namespace != nil && *pkgInput.Namespace != "" { | ||
namespace := strings.TrimPrefix(*pkgInput.Namespace, "@") | ||
namespace = strings.TrimPrefix(namespace, "%40") | ||
fullName = "@" + namespace + "/" + pkgInput.Name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like it isn't always the case that the packages start with "@" in the dataset, could we add a check here after the trim to see if the namespace had the prefix? and add the "@" only if there was a prefix trim?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly, I've added something to this effect 👍
} | ||
|
||
// NewDatadogMalwareCertifier initializes the Datadog Malicious Software Packages certifier | ||
func NewDatadogMalwareCertifier(ctx context.Context, assemblerFunc assemblerFuncType, opts ...CertifierOption) (certifier.Certifier, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a bit of documentation here on what the datadog malicious software packages are, for those that are not familiar.
In addition could you add some details on:
- The added predicates on the graph
- The recommended interval times (considering that the current certifier will generate a certifyBad each time indefinitely).
- Any caveats: see comment on periodic fetching of manifest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, I'll add some documentation along these lines
Thanks @lumjjb ! I really appreciate your encouraging words 😃 In terms of frictions, I think there are some options for improving the scalability of adding new data sources be they collectors or certifiers. Perhaps one idea is to define a common interface that any collector or certifier must implement and then have a registrar similar to how the backend works today in the spirit of dedpulication. There are also some common logic items in the certifiers/collectors like initialising nats/calling ingestion flow/ emitters etc. Not sure how much of that will be changing in v2.0 but it might be worth looking into |
Signed-off-by: robert-cronin <[email protected]>
d2f86e2
to
4aebd31
Compare
Hello @lumjjb! Your suggestions have been implemented and all outstanding changes addressed. If you have any other suggestions, let me know. |
Description of the PR
Fixes #2345
I am not sure if there is a need for a parser or attestation since were just ingesting CertifyBad for a particular pURL, but if there is a need to represent the source information in a predicate, I'd be happy to try and figure out how to add that in.
PR Checklist
-s
flag togit commit
.make generate
has been runmake generate
has been runmake generate
has been runcollectsub
protobuf has been changed,make proto
has been run