Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preset glossary term #12330

Closed
wants to merge 59 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
93a4a53
split ownership pr
enosodigie Nov 27, 2024
66b2257
parseowner fix - not working
enosodigie Dec 13, 2024
b084b7b
skip fetching for dataset when datasource_id is null
llance Dec 16, 2024
aca7830
fix for local variable 'datasource_urn' referenced before assignment
llance Dec 16, 2024
2bceb46
lint
llance Dec 17, 2024
4419996
review comments
enosodigie Dec 18, 2024
e73da8e
Merge pull request #19 from llance/review-add-ownership-preset
llance Jan 3, 2025
9156a84
preset glossary term
llance Jan 13, 2025
09b371c
fix(ingest): consistent fingerprint for sql parsing aggregator (#12239)
mayurinehate Jan 3, 2025
6c1b017
docs(queries_v2): set use_queries_v2 to true in snowflake_recipe.yml …
gabe-lyons Jan 3, 2025
e150251
feat(ingest/gc): truncate query usage statistics aspect (#12268)
anshbansal Jan 5, 2025
f8ca98b
fix(ingest/tableau): retry on auth error for special case (#12264)
mayurinehate Jan 6, 2025
c146922
fix(ingest/gc): infinite loop query entities (#12274)
anshbansal Jan 6, 2025
9c72e74
fix(ingest/snowflake): use fast query fingerprint for lineage (#12275)
mayurinehate Jan 6, 2025
c551785
fix(spark): Finegrained lineage is emitted on the DataJob and not on …
treff7es Jan 6, 2025
22eca1e
docs(tableau): clarify docs around tableau permissions (#12270)
hsheth2 Jan 6, 2025
7d1dd2e
feat(ingest): enable `EnsureAspectSizeProcessor` for all sources (#12…
hsheth2 Jan 6, 2025
393491d
fix(ingestion/classifier): temporary measure to avoid deadlocks for c…
skrydal Jan 6, 2025
c927056
feat(ingest/datahub): use stream_results with mysql (#12278)
hsheth2 Jan 6, 2025
6629617
ci: fix shellcheck warnings, update actions (#12281)
anshbansal Jan 7, 2025
ffe693c
docs(business attribute): clarify support (#12260)
skrydal Jan 7, 2025
1d32dcf
fix(airflow): fix tests with Airflow 2.4 (#12279)
hsheth2 Jan 7, 2025
0043494
fix(ingest): better correctness on the emitter -> graph conversion (#…
hsheth2 Jan 7, 2025
9ad5cec
feat(ingest): configurable query generation in combined sources (#12284)
hsheth2 Jan 8, 2025
664c835
fix(javaEntityClient): correct config parameter (#12287)
david-leifker Jan 8, 2025
dc96d9b
ci: upload test coverage to codecov (#12291)
anshbansal Jan 8, 2025
3c2a277
log(elastic/index builder): add est time remaining (#12280)
anshbansal Jan 8, 2025
2a790a8
fix(ingest/glue): don't fail on profile (#12288)
anshbansal Jan 8, 2025
a6de4fe
fix(ingest/gc): also query data process instance (#12292)
anshbansal Jan 8, 2025
a96d353
fix(cli): correct url ending with acryl.io:8080 (#12289)
anshbansal Jan 8, 2025
a60a9d0
dev: add pre-commit hooks installed by default (#12293)
anshbansal Jan 8, 2025
699efab
fix(ingest/file-backed-collections): Properly set _use_sqlite_on_conf…
asikowitz Jan 8, 2025
4956543
fix(doc): make folder_path_pattern usage more clear (#12298)
kevinkarchacryl Jan 8, 2025
5ad7c5f
dev: fix pre-commit passing filenames incorrectly (#12304)
anshbansal Jan 9, 2025
c6b262d
feat(sdk): structured properties - add support for listing (#12283)
shirshanka Jan 9, 2025
2f9481f
chore(tableau): set ingestion stage report and perftimers (#12234)
sgomezvillamor Jan 9, 2025
0499392
chore(version): bump jdbc drivers (#12301)
david-leifker Jan 9, 2025
3eaf5b8
build(coverage): fix carry-forward coverage (#12306)
chakru-r Jan 9, 2025
b600c06
chore(deps): Migrate EOL vulnerability of javax.mail to jakarta.mail …
pankajmahato-visa Jan 9, 2025
f455b31
chore(alpine): bump alpine images 3.21 (#12302)
david-leifker Jan 9, 2025
4e809da
feat(ingest/datahub): support dropping duplicate schema fields (#12308)
hsheth2 Jan 9, 2025
565334c
feat(ci): add manual trigger for full build (#12307)
chakru-r Jan 10, 2025
5508e27
fix(ci): make upload-artifact name unique (#12312)
chakru-r Jan 10, 2025
84eb978
fix(ingestion/s3): groupby group-splitting issue (#12254)
eagle-25 Jan 10, 2025
2f8ab97
feat(graphql): adds container aspect for dataflow and datajob entitie…
sgomezvillamor Jan 10, 2025
1fff8b3
docs(ingest/glue): add permissions for glue (#12290)
anshbansal Jan 10, 2025
d4947c4
fix(ingest/gc): add delete limit execution request (#12313)
anshbansal Jan 10, 2025
5b2ffe2
chore(deps): Migrate CVE-2024-52046 with severity >= 9 (severity = 9.…
pankajmahato-visa Jan 10, 2025
dbd6092
fix(ci): fix artifact upload name (#12319)
chakru-r Jan 10, 2025
673e65b
feat(sdk): support urns in other urn constructors (#12311)
hsheth2 Jan 10, 2025
0c6b254
fix(ingest): improve error reporting in `emit_all` (#12309)
hsheth2 Jan 10, 2025
a69ccad
docs(ingest): refactor docgen process (#12300)
hsheth2 Jan 10, 2025
61919fb
fix(dockerfile) Remove all references to jetty from the docker file (…
ryota-cloud Jan 10, 2025
4e55957
docs(notification): docs on platform notifications and multiple chann…
ethan-cartwright Jan 10, 2025
c40c15c
fix(cli/delete): prevent duplicates in delete message (#12323)
hsheth2 Jan 13, 2025
0726436
feat(ingestion/iceberg): Improve iceberg connector logging (#12317)
skrydal Jan 13, 2025
db60405
fix(header): prevent clickjack/iframing (#12328)
david-leifker Jan 13, 2025
25161fc
Revert "split ownership pr"
llance Jan 13, 2025
2d089aa
merge
llance Jan 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions .github/.codecov.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
comment:
layout: "header, files, footer" # remove "new" from "header" and "footer"
hide_project_coverage: true # set to false
require_changes: false # if true: only post the comment if coverage changes

codecov:
#due to ci-optimization, reports for modules that have not changed may be quite old
max_report_age: off

flag_management:
default_rules: # the rules that will be followed for any flag added, generally
carryforward: true
statuses:
- type: project
target: auto
threshold: 0% #Not enforcing project coverage yet.
- type: patch
target: 90%
individual_flags: # exceptions to the default rules above, stated flag by flag
- name: frontend
paths:
- "datahub-frontend/**"
- "datahub-web-react/**"
- name: backend
paths:
- "metadata-models/**"
- "datahub-upgrade/**"
- "entity-registry/**"
- "li-utils/**"
- "metadata-auth/**"
- "metadata-dao-impl/**"
- "metadata-events/**"
- "metadata-jobs/**"
- "metadata-service/**"
- "metadata-utils/**"
- "metadata-operation-context/**"
- "datahub-graphql-core/**"
- name: metadata-io
paths:
- "metadata-io/**"
- name: ingestion
paths:
- "metadata-ingestion/**"
- name: ingestion-airflow
paths:
- "metadata-ingestion-modules/airflow-plugin/**"
- name: ingestion-dagster
paths:
- "metadata-ingestion-modules/dagster-plugin/**"
- name: ingestion-gx-plugin
paths:
- "metadata-ingestion-modules/gx-plugin/**"
- name: ingestion-prefect
paths:
- "metadata-ingestion-modules/prefect-plugin/**"
coverage:
status:
project:
default:
target: 0% # no threshold enforcement yet
only_pulls: true
patch:
default:
target: 90% # for new code added in the patch
only_pulls: true
15 changes: 12 additions & 3 deletions .github/actions/ci-optimization/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,16 @@ outputs:
value: ${{ steps.filter.outputs.frontend == 'false' && steps.filter.outputs.ingestion == 'false' && steps.filter.outputs.backend == 'true' }}
backend-change:
description: "Backend code has changed"
value: ${{ steps.filter.outputs.backend == 'true' }}
value: ${{ steps.filter.outputs.backend == 'true' || steps.trigger.outputs.trigger == 'manual' }}
ingestion-change:
description: "Ingestion code has changed"
value: ${{ steps.filter.outputs.ingestion == 'true' }}
value: ${{ steps.filter.outputs.ingestion == 'true' || steps.trigger.outputs.trigger == 'manual' }}
ingestion-base-change:
description: "Ingestion base image docker image has changed"
value: ${{ steps.filter.outputs.ingestion-base == 'true' }}
frontend-change:
description: "Frontend code has changed"
value: ${{ steps.filter.outputs.frontend == 'true' }}
value: ${{ steps.filter.outputs.frontend == 'true' || steps.trigger.outputs.trigger == 'manual' }}
docker-change:
description: "Docker code has changed"
value: ${{ steps.filter.outputs.docker == 'true' }}
Expand All @@ -44,6 +44,15 @@ outputs:
runs:
using: "composite"
steps:
- name: Check trigger type
id: trigger # Add an ID to reference this step
shell: bash
run: |
if [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
echo "trigger=manual" >> $GITHUB_OUTPUT
else
echo "trigger=pr" >> $GITHUB_OUTPUT
fi
- uses: dorny/paths-filter@v3
id: filter
with:
Expand Down
279 changes: 279 additions & 0 deletions .github/scripts/generate_pre_commit.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,279 @@
"""Generate pre-commit hooks for Java and Python projects.

This script scans a repository for Java and Python projects and generates appropriate
pre-commit hooks for linting and formatting. It also merges in additional hooks from
an override file.
"""

import os
from dataclasses import dataclass
from enum import Enum, auto
from pathlib import Path
import datetime

import yaml


class ProjectType(Enum):
"""Types of projects supported for hook generation."""

JAVA = auto()
PYTHON = auto()


@dataclass
class Project:
"""Represents a project found in the repository."""

path: str
type: ProjectType

@property
def gradle_path(self) -> str:
"""Convert path to Gradle task format."""
return ":" + self.path.replace("/", ":")

@property
def project_id(self) -> str:
"""Generate a unique identifier for the project."""
return self.path.replace("/", "-").replace(".", "-")


class ProjectFinder:
"""Find Java and Python projects in a repository."""

JAVA_PATTERNS = [
"plugins.hasPlugin('java')",
"apply plugin: 'java'",
"id 'java'",
"id 'java-library'",
"plugins.hasPlugin('java-library')",
"apply plugin: 'java-library'",
"plugins.hasPlugin('pegasus')",
"org.springframework.boot",
]

EXCLUDED_DIRS = {".git", "build", "node_modules", ".tox", "venv"}
SOURCE_EXTENSIONS = {".java", ".kt", ".groovy"}

def __init__(self, root_dir: str):
self.root_path = Path(root_dir)

def find_all_projects(self) -> list[Project]:
"""Find all Java and Python projects in the repository."""
java_projects = self._find_java_projects()
python_projects = self._find_python_projects()

all_projects = []
all_projects.extend(
Project(path=p, type=ProjectType.JAVA) for p in java_projects
)
all_projects.extend(
Project(path=p, type=ProjectType.PYTHON) for p in python_projects
)

return sorted(all_projects, key=lambda p: p.path)

def _find_java_projects(self) -> set[str]:
"""Find all Java projects by checking build.gradle files."""
java_projects = set()

# Search both build.gradle and build.gradle.kts
for pattern in ["build.gradle", "build.gradle.kts"]:
for gradle_file in self.root_path.rglob(pattern):
if self._should_skip_directory(gradle_file.parent):
continue

if self._is_java_project(gradle_file):
java_projects.add(self._get_relative_path(gradle_file.parent))

return {
p
for p in java_projects
if "buildSrc" not in p and "spark-smoke-test" not in p and p != "."
}

def _find_python_projects(self) -> set[str]:
"""Find all Python projects by checking for setup.py or pyproject.toml."""
python_projects = set()

for file_name in ["setup.py", "pyproject.toml"]:
for path in self.root_path.rglob(file_name):
if self._should_skip_directory(path.parent):
continue

rel_path = self._get_relative_path(path.parent)
if "examples" not in rel_path:
python_projects.add(rel_path)

return python_projects

def _should_skip_directory(self, path: Path) -> bool:
"""Check if directory should be skipped."""
return any(
part in self.EXCLUDED_DIRS or part.startswith(".") for part in path.parts
)

def _is_java_project(self, gradle_file: Path) -> bool:
"""Check if a Gradle file represents a Java project."""
try:
content = gradle_file.read_text()
has_java_plugin = any(pattern in content for pattern in self.JAVA_PATTERNS)

if has_java_plugin:
# Verify presence of source files
return any(
list(gradle_file.parent.rglob(f"*{ext}"))
for ext in self.SOURCE_EXTENSIONS
)
return False

except Exception as e:
print(f"Warning: Error reading {gradle_file}: {e}")
return False

def _get_relative_path(self, path: Path) -> str:
"""Get relative path from root, normalized with forward slashes."""
return str(path.relative_to(self.root_path)).replace("\\", "/")


class HookGenerator:
"""Generate pre-commit hooks for projects."""

def __init__(self, projects: list[Project], override_file: str = None):
self.projects = projects
self.override_file = override_file

def generate_config(self) -> dict:
"""Generate the complete pre-commit config."""
hooks = []

for project in self.projects:
if project.type == ProjectType.PYTHON:
hooks.append(self._generate_lint_fix_hook(project))
else: # ProjectType.JAVA
hooks.append(self._generate_spotless_hook(project))

config = {"repos": [{"repo": "local", "hooks": hooks}]}

# Merge override hooks if they exist
if self.override_file and os.path.exists(self.override_file):
try:
with open(self.override_file, 'r') as f:
override_config = yaml.safe_load(f)

if override_config and 'repos' in override_config:
for override_repo in override_config['repos']:
matching_repo = next(
(repo for repo in config['repos']
if repo['repo'] == override_repo['repo']),
None
)

if matching_repo:
matching_repo['hooks'].extend(override_repo.get('hooks', []))
else:
config['repos'].append(override_repo)

print(f"Merged additional hooks from {self.override_file}")
except Exception as e:
print(f"Warning: Error reading override file {self.override_file}: {e}")

return config

def _generate_lint_fix_hook(self, project: Project) -> dict:
"""Generate a lint-fix hook for Python projects."""
return {
"id": f"{project.project_id}-lint-fix",
"name": f"{project.path} Lint Fix",
"entry": f"./gradlew {project.gradle_path}:lintFix",
"language": "system",
"files": f"^{project.path}/.*\\.py$",
"pass_filenames": False,
}

def _generate_spotless_hook(self, project: Project) -> dict:
"""Generate a spotless hook for Java projects."""
return {
"id": f"{project.project_id}-spotless",
"name": f"{project.path} Spotless Apply",
"entry": f"./gradlew {project.gradle_path}:spotlessApply",
"language": "system",
"files": f"^{project.path}/.*\\.java$",
"pass_filenames": False,
}


class PrecommitDumper(yaml.Dumper):
"""Custom YAML dumper that maintains proper indentation."""

def increase_indent(self, flow=False, *args, **kwargs):
return super().increase_indent(flow=flow, indentless=False)


def write_yaml_with_spaces(file_path: str, data: dict):
"""Write YAML file with extra spacing between hooks and a timestamp header."""
with open(file_path, "w") as f:
# Add timestamp header
current_time = datetime.datetime.now(datetime.timezone.utc)
formatted_time = current_time.strftime("%Y-%m-%d %H:%M:%S %Z")
header = f"# Auto-generated by .github/scripts/generate_pre_commit.py at {formatted_time}\n"
f.write(header)
header = f"# Do not edit this file directly. Run the script to regenerate.\n"
f.write(header)
header = f"# Add additional hooks in .github/scripts/pre-commit-override.yaml\n"
f.write(header)

# Write the YAML content
yaml_str = yaml.dump(
data, Dumper=PrecommitDumper, sort_keys=False, default_flow_style=False
)

# Add extra newline between hooks
lines = yaml_str.split("\n")
result = []
in_hook = False

for line in lines:
if line.strip().startswith("- id:"):
if in_hook: # If we were already in a hook, add extra newline
result.append("")
in_hook = True
elif not line.strip() and in_hook:
in_hook = False

result.append(line)

f.write("\n".join(result))


def main():
root_dir = os.path.abspath(os.curdir)
override_file = ".github/scripts/pre-commit-override.yaml"

# Find projects
finder = ProjectFinder(root_dir)
projects = finder.find_all_projects()

# Print summary
print("Found projects:")
print("\nJava projects:")
for project in projects:
if project.type == ProjectType.JAVA:
print(f" - {project.path}")

print("\nPython projects:")
for project in projects:
if project.type == ProjectType.PYTHON:
print(f" - {project.path}")

# Generate and write config
generator = HookGenerator(projects, override_file)
config = generator.generate_config()
write_yaml_with_spaces(".pre-commit-config.yaml", config)

print("\nGenerated .pre-commit-config.yaml")


if __name__ == "__main__":
main()
9 changes: 9 additions & 0 deletions .github/scripts/pre-commit-override.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
repos:
- repo: local
hooks:
- id: smoke-test-cypress-lint-fix
name: smoke-test cypress Lint Fix
entry: ./gradlew :smoke-test:cypressLintFix
language: system
files: ^smoke-test/tests/cypress/.*$
pass_filenames: false
Loading
Loading