-
Notifications
You must be signed in to change notification settings - Fork 21
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #138 from DataDog/ikretz/no-versions
Clarify meaning of empty version list
- Loading branch information
Showing
1 changed file
with
11 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,17 +12,18 @@ Current ecosystems: | |
|
||
## Usage | ||
|
||
Malicious samples are available under the **[samples/](samples/)** folder and compressed as an encrypted ZIP file with the password `infected`. The date indicated as part of the file name is the | ||
discovery date, not necessarily the package publication date. | ||
Malicious samples are available under the **[samples/](samples/)** folder and compressed as an encrypted ZIP file with the password `infected`. The date indicated as part of the file name is the discovery date, not necessarily the package publication date. | ||
|
||
You can use the script [extract.sh](./samples/pypi/extract.sh) to automatically extract all the samples to perform local analysis on them. Alternatively, you can extract a single sample using: | ||
|
||
``` | ||
$ unzip -o -P infected samples/pypi/2023-03-20-pydefender-v1.0.0.zip -d /tmp/ | ||
Archive: samples/pypi/2023-03-20-pydefender-v1.0.0.zip | ||
$ unzip -o -P infected samples/pypi/pydefender/1.0.0/2023-03-20-pydefender-v1.0.0.zip -d /tmp/ | ||
Archive: samples/pypi/pydefender/1.0.0/2023-03-20-pydefender-v1.0.0.zip | ||
creating: /tmp/2023-03-20-pydefender-v1.0.0/ | ||
``` | ||
|
||
Each [samples/](samples/) subdirectory contains a `manifest.json` file that identifies the packages, and the versions of those packages, that comprise the samples collected for each ecosystem. You can use these files to quickly search the dataset for particular samples. | ||
|
||
## License | ||
|
||
This dataset is released under the Apache-2.0 license. You're welcome to use it with attribution. | ||
|
@@ -63,6 +64,12 @@ We will be regularly adding new packages to the dataset. | |
|
||
Every single software package included in this dataset has been manually triaged by a human. | ||
|
||
### What if the `manifest.json` entry for a package has an empty version list? | ||
|
||
Around 250 packages in the PyPI subset do not have any affected versions listed in their `manifest.json` entries. These cases are holdovers from the earliest days of the project before version information was attached to the sample names. | ||
|
||
If you intend to use this dataset to screen packages for known-maliciousness, then **all** versions of packages with empty version lists should be considered malicious. | ||
|
||
### How are you clustering these packages? | ||
|
||
At the time, we did not make available the clustering algorithm we use internally to group similar samples and ease analysis. If you have interest, please reach out at [email protected] - | ||
|