Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL/PGQ or even GQL support #13545

Open
gsvgit opened this issue Nov 24, 2024 · 4 comments
Open

SQL/PGQ or even GQL support #13545

gsvgit opened this issue Nov 24, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@gsvgit
Copy link

gsvgit commented Nov 24, 2024

Is your feature request related to a problem or challenge?

SQL (standard) was recently extended with property graph querying features (PGQ): ISO standard, theoretical foundations. I wonder if DataFusion can be extended with PGQ.

Describe the solution you'd like

All parts should be extended. The most nontrivial part is interconnection between traditional SQL and graph analysis (path-related evaluations). While it is possible to store graph in columnar storage (e.g. Apache Arrow), it may be inefficient for path-related queries (while pretty efficient for attributes-of-vertex-related analytical queries). So, specific path-indexes may be required. Even more, in some cases it may be good idea to store graph topology in separated storage in specific format (e.g. sparse adjacency matrix, similar to FalkorDB).
On the other hand, even if we store graph in columnar storage, linear-algebra-related primitives can be useful for path querying (DuckPGQ: Efficient Property Graph Queries in an analytical RDBMS).

So, logical and physical plans should provide not only specific operators, but support balancing between data representation.

Describe alternatives you've considered

No response

Additional context

Can something like this project be used for physical level of linear algebra?

Possible theoretical foundations.

It may be a first step to support GQL.

I'm interested in such a system design and development, but I'm aware that such an extension of DataFusion may leads to system recreation. So, I want to discuss this direction: should we extend DataFusion or create new independent system.

@gsvgit gsvgit added the enhancement New feature or request label Nov 24, 2024
@gsvgit
Copy link
Author

gsvgit commented Dec 12, 2024

Possible steps.

  • Extend SLQ parser with PGQ support Independent task. Technical.
  • Express PGQ in terms of existing operations. As shown in GQL and SQL/PGQ: Theoretical Models and Expressive Power, core PGQ can be translated to standard relational algebra (plus recursion). It may be not optimal in terms of performance, but can be a baseline for analysis and further optimizations. Technical.
  • Advanced techniques introduction. For example, linear algebra utilization as a physical implementation of some operation. Respective logical plan extension. It is a topic for research.

So, the first and the second can be done as a technical tasks, and after that possible directions should be discussed.

Additionally for discussion: TenSQL: An SQL Database Built on GraphBLAS that shows linear algebra as a physical level can improve performance of some SQL queries in some cases.

@georgiy-belyanin
Copy link

As mentioned before, SQL/PGQ could be expressed with relational algebra operations plus recursion.

Though, as it's mentioned in the issue on recursive CTEs (#462) they tend to be quite slow. Implementing one of the graph querying languages using this feature may require performance improvements to make consequent JOINs execute in decent time.

@gsvgit
Copy link
Author

gsvgit commented Dec 12, 2024

Implementing one of the graph querying languages using this feature may require performance improvements to make consequent JOINs execute in decent time.

On the Optimization of Recursive Relational Queries: Application to Graph Queries may be a good way to optimize recursive queries. At least in the context of graph querying.

@gsvgit
Copy link
Author

gsvgit commented Jan 16, 2025

My presentation on the topic for DataFusion community meeting: SemyonGrigorev_DataFusion_PGQ.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants