Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Implement ODBC driver "wrapper" using FlightSQL #30622

Open
asfimport opened this issue Dec 15, 2021 · 19 comments
Open

[C++] Implement ODBC driver "wrapper" using FlightSQL #30622

asfimport opened this issue Dec 15, 2021 · 19 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Dec 15, 2021

The ODBC analogue to ARROW-7744

Reporter: Wes McKinney / @wesm

Related issues:

Note: This issue was originally created as ARROW-15111. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

David Li / @lidavidm:
For reference see https://github.com/dremio/warpdrive which was recently released (GPL2, though)

@spencerwilson
Copy link

spencerwilson commented Oct 20, 2023

Possibly related to dremio/warpdrive:

@alinaliBQ
Copy link

https://github.com/dremio/flightsql-odbc should have the Apache 2.0 license, my understanding is that it is a data source client that works with Apache Flight SQL. One should be able to develop an ODBC driver using this data source client.

Would cpp/src/arrow/flight/sql be an appropriate subdirectory to put the ODBC driver? James and I are thinking that this looks like the right place, but please let us know if the proposal makes sense.
cc @jduo

@alinaliBQ
Copy link

Hello @wesm and @lidavidm, would either of you mind taking a look and letting me know if cpp/src/arrow/flight/sql is an appropriate subdirectory to put the ODBC driver?
Thank you

@lidavidm
Copy link
Member

lidavidm commented Dec 4, 2023

@alinaliBQ how about cpp/src/flightsql_odbc or similar?

@alinaliBQ
Copy link

alinaliBQ commented Dec 5, 2023

@lidavidm Sure, cpp/src/flightsql_odbc makes sense to me. Ty.
I have another question. We are looking to build a new ODBC driver for Flight SQL that can be part of the Arrow project. It would utilize parts of the Amazon Timestream ODBC driver and the Flight SQL ODBC driver (flightsql-odbc) (written by Dremio), which are Open Source and Apache 2.0-licensed. Are there any questions/concerns regarding using those drivers?

@lidavidm
Copy link
Member

lidavidm commented Dec 5, 2023

Do you mean that you plan to import or copy large chunks of one or both projects, or do you mean that you plan to use them as dependencies? If the former, I think depending on how much is copied we may have to think about IP clearance, but it's not clear to me what the threshold is.

@alinaliBQ
Copy link

We're planning the former, so large chunks from both projects will be used. We're in the designing stage of developing the driver, so things may change later; the plan is that flightsql-odbc will be mostly used as-is other than changes to conforming to Arrow coding guidelines, and for Amazon Timestream driver, only its ODBC function entry code will be used and adapted to call into flightsql-odbc classes.

@lidavidm
Copy link
Member

lidavidm commented Dec 5, 2023

Ok. I would encourage you to submit a PR ASAP even if it is not complete so we can do as much development as possible in Apache repos. The guidelines state IP clearance is needed when most of the development is done outside of Apache repos, so submitting a single large PR (as with the JDBC driver) might mean we want to do an IP clearance to be safe; submitting PRs here early and often would help avoid that.

At least, flightsql-odbc is probably enough code that we might want to do IP clearance anyways...however, please discuss this on dev@ so others can chime in

@alinaliBQ
Copy link

alinaliBQ commented Dec 5, 2023

We were thinking of submitting the PR early as well. Our initial plan is to submit the PR when the driver is able to connect the Flight SQL ODBC driver, with irrelevant code pruned. James has let me know that the community has indicated that the PR for Timestream ODBC + flightsql-odbc can be sent even if the driver doesn't compile since it's for starting the IP scanning process, so we can go with that. And I have written an email dev@ for other's opinions on this matter at: https://lists.apache.org/thread/t1r3pntpzoxdncgoj5f581hxyyl19bkl.

@laurentgo
Copy link
Collaborator

Possibly related to dremio/warpdrive:

* https://docs.dremio.com/current/sonar/client-applications/drivers/arrow-flight-sql-odbc-driver
  
  * the linked [driver download page](https://www.dremio.com/drivers/odbc/) indicates that its license is some version of LGPL, but I can't find a link to its source code

The driver is a combination of ASL 2.0 and LGPL. The LGPL license is available at https://github.com/dremio/warpdrive/blob/master/license.txt

@alinaliBQ
Copy link

The driver is a combination of ASL 2.0 and LGPL. The LGPL license is available at https://github.com/dremio/warpdrive/blob/master/license.txt

Thank you @laurentgo for explaining.

Just for clarification, we'll not be using any LGPL code from the warpdrive (https://github.com/dremio/warpdrive) for the driver development. We are planning to make a Flight SQL ODBC driver that is fully ASL 2.0 so we can contribute back to the Apache community.

@alinaliBQ
Copy link

Hi @lidavidm, currently our team's implementation is being done inside our own arrow fork. I was wondering if you know any Arrow community members who would be interested in taking a look at our incremental PRs for the ODBC driver? If so, we could assign folks as the reviewers to let them know which PRs to take a look at. We would appreciate additional pairs of eyes from the community. Please let me know if you have any questions.

@kou
Copy link
Member

kou commented Dec 14, 2023

Could you open a PR to apache/arrow instead of your fork?

@wesm
Copy link
Member

wesm commented Dec 14, 2023

We could create a branch on apache/arrow so that PRs do not have to go into the main branch (in case things are unstable)?

@kou
Copy link
Member

kou commented Dec 14, 2023

It seems that you already have many changes.
Can we break down it to small pieces and proceed step-by-step like we did to implement Google Cloud Storage file system and Azure Blob Storage file system?

@kou
Copy link
Member

kou commented Dec 14, 2023

We could create a branch on apache/arrow so that PRs do not have to go into the main branch (in case things are unstable)?

We can do it but we may not need to do it. Because we will have a build option for this module such as ARROW_FLIGHT_SQL_ODBC and it's OFF by default. If this module isn't built by default, we don't need to care about stability.

We require IP clearance for this, right? We can't merge the first PR to apache/arrow before it's completed.
I think that we should avoid developing outside of apache/arrow as much as possible. So I think that we should focus on the IP clearance instead of developing for now.

@lidavidm
Copy link
Member

I agree with Kou. We can consider a branch if needed but since this should be reasonably fenced off from the rest of the codebase, it should be OK to just have it on main.

@alinaliBQ for the original question please tag me to start with and we can pull in more people as needed.

@devozerov
Copy link
Member

Our team has some experience working with Dremio's ODBC driver when connecting to a custom Arrow Flight endpoint (a Trino fork with Arrow Flight SQL support). We were also considering taking Dremio's Apache 2.0 code as a base and creating a fully-fledged driver. I am very happy that some work already being done in Arrow's community. If needed, we can help with the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants