Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapping from Pom.xml to SPDX-id. #4

Open
michelescarlato opened this issue Jul 5, 2021 · 4 comments
Open

Mapping from Pom.xml to SPDX-id. #4

michelescarlato opened this issue Jul 5, 2021 · 4 comments

Comments

@michelescarlato
Copy link
Member

@tmortagne Dear Thomas,

I am sorry not to have specified that still, I don't have a function able to map the randomly used license's name in the Pom.xml with the SPDX-id (which is a file format used to document information on the software licenses, practically recognized as an open standard).

I implemented a similar function to map from scancode names to SPDX-ids, which clearly does not help in this specific case.
I will keep you updated upon developing this mapping function for the java maven project use case.

Sincerely,
Michele

@tmortagne
Copy link
Member

Note that it's not really a specific Maven thing, it's just that there is no specification in Maven for the license name format. It's probably the same problem in various other context (I think it's the same for the Debian packages, for example).

@zvr
Copy link

zvr commented Jul 5, 2021

We (the SPDX project) are working with Maven folks (and Pypi folks and a number of other communities) to help them adopt he SPDX short license identifiers.
Of course, not all existing artifacts will be automatically updated, even when Maven spec mention SPDX...

@tmortagne
Copy link
Member

Of course, not all existing artifacts will be automatically updated, even when Maven spec mention SPDX...

Yes, Maven Central will probably never be full SPDX because of the existing artifacts. And even if Sonatype starts to impose it for new projects, having other repositories do it is going to be quite a challenge...

@michelescarlato
Copy link
Member Author

I prepared an LCV lite version treating the java-maven use case.

To cope with the mapping issue, merged two different techniques:

  1. static mapping, which makes use of the spdx-id.csv file to compare aliases with spdx-ids.

---> if any of you want to contribute to this file, feel free to do it; anyway, please pay the very attention to not insert a double alias because it would cause an error. The csv is treated with awk '!a[$0]++' spdx-id_merged.csv > spdx_id.csv before being pushed.

  1. dynamic mapping, which substantially split a verbose license name, matched each word against two lists: license names and another containing license version. This dynamic mapping aims to provide an output matchable to one of the aliases in the spdx-id.csv file, or in some particular license case, to assign the SPDX-id for a matched license directly.

ConvertToSPDX() merges these two techniques to convert a verbose license name into an SPDX id.
This function is called upon each license (inbound or outbound) POSTed through rest APIs (use Postman please, at least at the beginning, to take confidence with the syntax, which is now using column - ";" to separate inbound licenses, because many aliases have the comma in the middle, causing the LCV endpoint to interpret them as two licenses instead of a whole alias).

By both techniques: two more functions are used: IsInAliases() and IsAnSPDX(), both return a boolean value: true if a license is an Alias or an SPDX. These checks are used within the three previously mentioned functions, preventing from performing checks upon aliases or spdx that are not matchable (that would case the famous key error, resulting in a 500 error, now avoided).

Thank you in advance for your time and help,
Sincerely,
Michele

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants