Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing cpe data split for cpe fields containing colon #1142

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

Bluebottle-new
Copy link

The fields for projects whose vendor and/or project fields contain a colon, like e.g.: perl:archive-tar or perl:convert-asn1, are not correctly split from the given URI binding.

Currently fields with escaped colons like archive::tar_project are also split at the colon. With this pull request, the escaped colons are ignored and the URI binding is only split at unescaped colons.

@AMDmi3
Copy link
Member

AMDmi3 commented Apr 12, 2021

This doesn't seem to handle escaped backslash correctly:

>>> import re
>>> re.split(r"(?<!\\):", 'foo:bar')
['foo', 'bar']
>>> re.split(r"(?<!\\):", 'foo\\:bar')
['foo\\:bar']
>>> re.split(r"(?<!\\):", 'foo\\\\:bar')
['foo\\\\:bar']  # expected ['foo\\\\', 'bar']

We already have proper CPE splitting code in vulnupdater:
https://github.com/repology/repology-vulnupdater/blob/8d8174768b97fc9ca1226a353ac5164b10dce050/vulnupdater/cpe.py#L36-L52
IMO it would be better to add it here as a plain function in repology/parsers/cpe.py and use in the parsing.

@AMDmi3
Copy link
Member

AMDmi3 commented Apr 13, 2021

IMO it would be better to add it here as a plain function in repology/parsers/cpe.py and use in the parsing.

Fancy doing this? Also should be nice to have an example of misparsing to make sure it's fixed afterwards. I guess the issue manifests as some incorrect problem?

@Bluebottle-new
Copy link
Author

You are right, my change fixes the escaped colon but ignores the escaped backslash, and using a function like mentioned above would be more suitable.

I might be inclined to try implementing that function, although it might require some further changes in the gentoo parser, as the program uses fixed fields for the cpe information:

pkg.add_cpe(cpe[2], cpe[3])

which works for the currently used CPE2.2 format but would fail for the CPE2.3 format.

It would probably be a good idea to extend the parser to support both CPE URI formats. Although, going forward, I would also include support for the WFN format, which makes field assignments much easier and would allow the definition of cpe information in three ways:

cpe:/a:microsoft:internet_explorer:8.0.6001:beta
cpe:2.3:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:*
wfn:[part="o",vendor="microsoft",product="windows_vista",version="6\.0", update="sp1",edition=NA,language=NA,sw_edition="home_premium", target_sw=NA,target_hw="x64",other=NA]

@AMDmi3
Copy link
Member

AMDmi3 commented Apr 13, 2021

It depends on which format Gentoo (and other repos) currently uses. As far as I remember, we only need CPE string parsing for Gentoo (other repos with CPE info available provide it as separate fields), and Gentoo only uses 2.2 format. 2.3 format can be supported with the same splitting code with just an extra condition which determines the format by the starting components.

I would not over-engineer by adding support for wfn which is not used anywhere, but if we need it at some point, it would make sense to switch to third party module such as cpe or cpe_utils.

@Bluebottle-new
Copy link
Author

I can't speak for Gentoo, but at Liguros we currently provide the cpe information in the 2.2 format, which is the format we picked up from Gentoo. Switching over to 2.3 format would be a matter of a few minutes, as the information is generated automatically in our update pipeline. Thus we don't mind if we deliver the information in 2.2 or 2.3 format; even changing it to a more suitable format would not be out of the question.

For now I would try to change the gentoo parser, adding the splitting code, with the possibility to parse 2.2 or 2.3 format.

@Bluebottle-new
Copy link
Author

OK, it seems that I finally satisfied the checks :)

I have added cpe.py with the parse function. gentoo.py is now using that function and should now support cpe information in 2.2 and 2.3 format.

A test file with some test cases was added as well.

Copy link
Member

@AMDmi3 AMDmi3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't it be more convenient to have cpe_parse do the format distinguishing?
I'd make it return a dataclass of parsed cpe fields, which can be constructed out of slice of fields list, e.g.

@dataclass
class CPE:
    vendor: str = '*'
    product: str = '*'
    ...
 
def cpe_parse(cpe_str: str) -> CPE:
    ...
    if res[1] == '2.3':
        return CPE(res[3:])
    else:
        return CPE(res[2:])

@Bluebottle-new
Copy link
Author

OK, with my little knowledge of python, I think I was able to add the parse as part of a data class.

Seems to work for the parser, although I am not so sure about the test cases.

@Bluebottle-new
Copy link
Author

Hmm, the check currently fails in dependencies with:

 Unable to locate package postgresql-server-dev-13

Could you have a look at it?

@Bluebottle-new Bluebottle-new requested a review from AMDmi3 May 19, 2021 19:14
@AMDmi3 AMDmi3 force-pushed the master branch 8 times, most recently from ba3fcf1 to 0e8c7b8 Compare June 29, 2022 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants