Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(View): enabling view in SemanticLayerSchema #1532

Closed
wants to merge 5 commits into from

Conversation

scaliseraoul
Copy link

@scaliseraoul scaliseraoul commented Jan 20, 2025


Important

Adds view support to SemanticLayerSchema with new constraints, validation logic, and documentation updates.

  • Behavior:
    • Adds view support in SemanticLayerSchema with constraints for column and relation formats.
    • Enforces mutual exclusivity between table and view in Source.
    • Validates column uniqueness and format in views.
  • Models:
    • Adds Relation class to define relationships in views.
    • Updates Source class to include view attribute.
  • Validation:
    • Adds check_columns_relations() in SemanticLayerSchema for view-specific validation.
    • Updates validate_type_and_fields() in Source for view constraints.
  • Documentation:
    • Updates semantic-layer.mdx with view configuration and constraints.
  • Tests:
    • Adds tests for view-specific validation in test_semantic_layer_schema.py.

This description was created by Ellipsis for cf142de. It will automatically update as commits are pushed.

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jan 20, 2025
Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to cf142de in 2 minutes and 14 seconds

More details
  • Looked at 338 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 8 drafted comments based on config settings.
1. tests/unit_tests/dataframe/test_semantic_layer_schema.py:113
  • Draft comment:
    The view field should be a boolean value, not a string. Use True instead of 'true'.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    While technically the comment is correct that True would be more proper than "true", the code is working as intended since the validation layer handles the type conversion. The test passes with the current implementation. Making this change would be a minor style improvement but doesn't affect functionality.
    The comment points out a real style issue. Using proper boolean types is generally better practice than string representations of booleans.
    However, since the validation layer already handles this gracefully and the tests pass, this is a very minor style issue that doesn't impact functionality or readability significantly.
    Delete the comment. While technically correct, it's a minor style suggestion that doesn't materially improve the code, and the current implementation works correctly.
2. pandasai/data_loader/semantic_layer_schema.py:93
  • Draft comment:
    The error message here can be improved for clarity. Consider using:
                raise ValueError("A schema can only define either 'table' or 'view', not both.")
  • Reason this comment was not posted:
    Marked as duplicate.
3. pandasai/data_loader/semantic_layer_schema.py:95
  • Draft comment:
    The error message here can be improved for clarity. Consider using:
                raise ValueError("A schema must define either 'table' or 'view'.")
  • Reason this comment was not posted:
    Marked as duplicate.
4. pandasai/data_loader/semantic_layer_schema.py:156
  • Draft comment:
    The error message here can be improved for clarity. Consider using:
            raise ValueError("Column names must be unique. Duplicate column names detected.")
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The original error message is already clear and grammatically correct. The suggested change is extremely minor - changing "names found" to "column names detected". This doesn't make the message meaningfully clearer or fix any actual problems. The suggestion feels like personal preference rather than a meaningful improvement.
    Perhaps in some contexts, being explicit about "column names" rather than just "names" could prevent ambiguity. The word "detected" may be more precise than "found".
    While those points are valid, the original message is already sufficiently clear in context - we're clearly talking about column names from the surrounding code, and "found" vs "detected" is purely stylistic.
    This comment should be removed as it suggests an extremely minor rewording that doesn't meaningfully improve clarity or fix any actual issues with the error message.
5. pandasai/data_loader/semantic_layer_schema.py:175
  • Draft comment:
    The error message here can be improved for clarity. Consider using:
                    "All column names in a view must follow the format '[table].[column]'."
  • Reason this comment was not posted:
    Marked as duplicate.
6. pandasai/data_loader/semantic_layer_schema.py:183
  • Draft comment:
    The error message here can be improved for clarity. Consider using:
                    "All 'from' and 'to' parameters in relations must follow the format '[table].[column]'."
  • Reason this comment was not posted:
    Marked as duplicate.
7. pandasai/data_loader/semantic_layer_schema.py:191
  • Draft comment:
    The error message here can be improved for clarity. Consider using:
                    f"No relations defined for the following tables: {uncovered_tables}."
  • Reason this comment was not posted:
    Marked as duplicate.
8. pandasai/data_loader/semantic_layer_schema.py:195
  • Draft comment:
    The error message here can be improved for clarity. Consider using:
            "All column names in a table must not include a table prefix."
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_GNrarDy3W8RIJwod


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@@ -120,6 +140,61 @@ class SemanticLayerSchema(BaseModel):
None, description="Frequency of dataset updates."
)

@model_validator(mode="after")
def check_columns_relations(self):
column_re_check = r"^[a-zA-Z_]+\.[a-zA-Z_]+$"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex ^[a-zA-Z_]+\.[a-zA-Z_]+$ does not account for numeric characters in table or column names. Consider updating it to ^[a-zA-Z0-9_]+\.[a-zA-Z0-9_]+$ to allow numeric characters.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ellipsis-dev can you commit the changes?

pandasai/data_loader/semantic_layer_schema.py Outdated Show resolved Hide resolved
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
3. **Relationships for Views**:
- Each table referenced in `columns` must have at least one relationship defined in `relations`.
- Relationships must specify `from` and `to` attributes in the `[table].[column]` format.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well documented 💯

gventuri pushed a commit that referenced this pull request Jan 20, 2025
…icLayerSchema (#1534)

* fix(VirtualDataframe): fixing virtual dataframe name conflict

* feature(View): enabling view in SemanticLayerSchema

* Update pandasai/data_loader/semantic_layer_schema.py

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* address comments left by @scaliseraoul on #1532 (feature(View): enabling view in SemanticLayerSchema);

---------

Co-authored-by: Raoul <[email protected]>
Co-authored-by: Raoul Scalise <[email protected]>
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jan 20, 2025
@gventuri gventuri closed this Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants