Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add utility to generate graph manifest for KG-Hub #2

Merged
merged 34 commits into from
Feb 7, 2022
Merged
Changes from 1 commit
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
d0a4038
Init make_kg_manifest utility
caufieldjh Jan 24, 2022
4f9d456
Get list of all bucket object keys
caufieldjh Jan 24, 2022
85cf012
Get bucket keys resembling graphs
caufieldjh Jan 24, 2022
0b2506e
Add datasets linkml schema python classes
caufieldjh Jan 24, 2022
ab46a7b
Add LinkML classes; load objects as LinkML class
caufieldjh Jan 25, 2022
7f1b52a
Write manifest.yaml
caufieldjh Jan 26, 2022
5bfc9c3
Outpath option
caufieldjh Jan 26, 2022
857c592
Tweaks, bugfixes, more metadata
caufieldjh Jan 26, 2022
05a9757
Individual files are DataResource
caufieldjh Jan 26, 2022
97d286a
Minor changes and comments
caufieldjh Jan 26, 2022
924d00f
More versions; ignore irrelevant dirs
caufieldjh Jan 27, 2022
f391d38
Much more metadata for KG-OBO
caufieldjh Jan 27, 2022
7cd4cd3
Change license to short label
caufieldjh Jan 27, 2022
629e027
More descriptions for KG projects
caufieldjh Jan 27, 2022
7ec9a75
Add requirements, as utilities have dependencies
caufieldjh Jan 31, 2022
8936aed
Add monarch; initial validation function
caufieldjh Jan 31, 2022
1677082
Expand validation function
caufieldjh Jan 31, 2022
d87209d
Use cached OBO Foundry yaml
caufieldjh Jan 31, 2022
e489274
Validate build dates
caufieldjh Jan 31, 2022
dce997d
Refactoring
caufieldjh Jan 31, 2022
16313ba
Validate directory structure for builds
caufieldjh Feb 1, 2022
f8a57cf
Setup and comments
caufieldjh Feb 1, 2022
62ad831
Pass results of validator to create_dataset_objects
caufieldjh Feb 1, 2022
02eafa5
Add kgx to reqs
caufieldjh Feb 2, 2022
035e861
Pass validation status to conforms_to in MANIFEST
caufieldjh Feb 2, 2022
bc5725b
Validate node and edge files with KGX
caufieldjh Feb 2, 2022
d587750
Bugfix for error output
caufieldjh Feb 2, 2022
9ea24eb
More bugfixes for validation
caufieldjh Feb 3, 2022
96b2baf
Validate node/edge file names
caufieldjh Feb 3, 2022
fb98bc9
Setup for previous manifest parsing
caufieldjh Feb 4, 2022
e88dd66
Load previous Manifest
caufieldjh Feb 4, 2022
9c00aa8
Don't validate unchanged projects/builds
caufieldjh Feb 7, 2022
6aa31e0
Add previous entries to output Manifest
caufieldjh Feb 7, 2022
ded64d9
Add link checking function
caufieldjh Feb 7, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Expand validation function
  • Loading branch information
caufieldjh committed Jan 31, 2022

Verified

This commit was signed with the committer’s verified signature.
florianduros Florian Duros
commit 1677082a0c2f7456aaaa48db058ac90cd0552afc
26 changes: 21 additions & 5 deletions utils/make_kg_manifest.py
Original file line number Diff line number Diff line change
@@ -13,6 +13,7 @@
file and directory structure, format, and content.
"""

from distutils.command.build import build
import boto3
import botocore.exceptions
import botocore.errorfactory
@@ -110,23 +111,38 @@ def validate_projects(keys: list) -> None:
* Graph tar.gz files contain only node and edge list
* Files are, in fact, tsvs in KGX format.
All output is to STDOUT.
This assumes everything in the root of the project
directory is a build, but valid builds must
meet the above criteria.
:param keys: list of object keys, as strings
"""

project_keys = {}
project_contents = {}

for project_name in PROJECTS:
project_keys[project_name] = []
project_contents[project_name] = {"objects":[],
"builds": [],
"valid builds":[]}
print(f"Validating {project_name}...")
for keyname in keys:
try:
project_dirname = (keyname.split("/"))[0]
if project_dirname == project_name:
project_keys[project_name].append(keyname)
if project_dirname == project_name: # This is the target project
project_contents[project_name]["objects"].append(keyname)

# Now iterate through builds, validating in the process
build_name = (keyname.split("/"))[1]
if build_name not in project_contents[project_name]["builds"] and \
build_name not in ["index.html", "current"]:
project_contents[project_name]["builds"].append(build_name)

except IndexError:
pass

print(f"The project {project_name} contains:")
print(f"\t{len(project_keys[project_name])} objects")
for object_type in project_contents[project_name]:
object_count = len(project_contents[project_name][object_type])
print(f"\t{object_count} {object_type}")

def get_graph_file_keys(keys: list):
"""Given a list of keys, returns a list of those