Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Arrow Flight SQL ODBC driver #40939

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jbonofre
Copy link
Member

@jbonofre jbonofre commented Apr 2, 2024

Rationale for this change

An ODBC driver uses the Open Database Connectivity (ODBC) interface that allow applications to access data in DBMS like environment using SQL.
As Arrow Flight provides JDBC and ADBC drivers, similarly Arrow Flight can provide ODBC driver.

What changes are included in this PR?

This PR adds an ODBC driver implementation.

Are these changes tested?

This ODBC driver is coming from a production system (SGA) that is tested/ran for a while.

Are there any user-facing changes?

No change, but new user option to use Arrow Flight.

@jbonofre jbonofre requested a review from lidavidm as a code owner April 2, 2024 05:59
@jbonofre jbonofre marked this pull request as draft April 2, 2024 05:59
Copy link

github-actions bot commented Apr 2, 2024

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@jbonofre
Copy link
Member Author

jbonofre commented Apr 2, 2024

@ianmcook @lidavidm @assignUser as discussed during the community meetings, here's the ODBC driver PR draft.

This PR includes:

  • sources files with legal fixes (I'm checking whoami source dual license/MIT, eventually alternatives)
  • build convenient files (like Brewfile)

I'm still working on:

  • finalize Arrow update (the original code was using Arrow 9.x)
  • finalize the build and cleanup the CMakeLists.txt files

I will also create a GH Issue to track this addition.

Copy link
Member

@jduo jduo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @jbonofre !

Some of the code in odbc_impl should be updated to use the same naming conventions as the arrow project (eg ODBCConnection -> odbc_connection).

This covers the flightsql-odbc repo, but there is additional code needed for the ODBC entry points to build a functional driver. That was in warpdrive, but that code isn't Apache-compatible.

Need to add the license for whereami to the Arrow project itself.

Are the separate brewfile and vcpkg.json necessary or can this utilize the ones already at the top-level?

Should the top-level CMakeLists now include the driver build? Or is that on hold as part of the donation process?

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Apr 4, 2024
void Create();

/**
* @copedoc ignite::odbc::system::ui::CustomWindow::OnCreate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@assignUser
Copy link
Member

finalize Arrow update (the original code was using Arrow 9.x)

I fear I can't really help on that front but once that's done I'd be happy to continue working on the CMake!

@jbonofre
Copy link
Member Author

jbonofre commented Apr 5, 2024

Hi @jduo !

Some of the code in odbc_impl should be updated to use the same naming conventions as the arrow project (eg ODBCConnection -> odbc_connection).

Yes, I'm doing that (that's why the PR is still a draft).

This covers the flightsql-odbc repo, but there is additional code needed for the ODBC entry points to build a functional driver. That was in warpdrive, but that code isn't Apache-compatible.

Yes, locally I fixed/updated it, but I'm looking for alternative.

Need to add the license for whereami to the Arrow project itself.

I'm checking if we can't find an alternative, else I will update NOTICE file.

Are the separate brewfile and vcpkg.json necessary or can this utilize the ones already at the top-level?

Yes, that's the plan, I'm doing in two steps: fixing the build and integrating in the Arrow ecosystem.

Should the top-level CMakeLists now include the driver build? Or is that on hold as part of the donation process?

Agree, same the same as for brewfile/vcpkg.json: I'm doing that in two steps.

I will push new changes to this PR and "remove" the draft state as soon as it's clean to review (build ok, etc).

@jbonofre
Copy link
Member Author

jbonofre commented Apr 5, 2024

finalize Arrow update (the original code was using Arrow 9.x)

I fear I can't really help on that front but once that's done I'd be happy to continue working on the CMake!

No worries, I'm working on it :) I will keep you posted when good for review (when I will remove the draft flag).

@jduo
Copy link
Member

jduo commented Apr 10, 2024

This covers the flightsql-odbc repo, but there is additional code needed for the ODBC entry points to build a functional driver. That was in warpdrive, but that code isn't Apache-compatible.

Yes, locally I fixed/updated it, but I'm looking for alternative.

There was another group that did some work to build a full driver from flightsql-odbc. I haven't seen it though, but they've mentioned it here:
#30622 (comment)

@lidavidm
Copy link
Member

lidavidm commented Jul 2, 2024

Following up here - do you want any help?

@jduo
Copy link
Member

jduo commented Jul 16, 2024

@jbonofre checking in on this. Were you still continuing work on this? Did you need help?
Thanks

@jbonofre
Copy link
Member Author

@jduo yes, I'm back on it (sorry I was busy on Iceberg and ASF stuff). Let me do a new rebase/build/update and I will ping you for review/help.

Thanks ! And sorry again for delay :/

@jbonofre
Copy link
Member Author

jbonofre commented Sep 3, 2024

@lidavidm @jduo hey guys. I'm very very sorry to have been quiet for long. Between vacation and ASF tasks, I was not able to be active enough on Arrow. I'm now back and resuming my work on Arrow, including ODBC.

@lidavidm
Copy link
Member

lidavidm commented Sep 3, 2024

No worries! Glad to see you back around :)

@rpourzand
Copy link

Hey @jbonofre we're excited about the AFS ODBC driver! We want to start using it as soon as we can for a project we're working on and was wondering whether you had a sense of timing where it might be ready to be in someone's hands for development. We would be happy to be early adopters :)

@jduo
Copy link
Member

jduo commented Sep 4, 2024

Thanks @jbonofre . We're around if you need assistance moving this forward.

@jduo
Copy link
Member

jduo commented Oct 1, 2024

Hi @jbonofre , just checking in on if we can help to move this forward.

@jbonofre
Copy link
Member Author

jbonofre commented Oct 2, 2024

Hey @jduo
I started the recase. I will push this.
Would you have time for a Quick sync ?
Thanks !

@aiguofer
Copy link
Contributor

aiguofer commented Oct 2, 2024

I'm curious, does this version of the driver allow passing arbitrary parameters? I tried using the Dremio ODBC driver but it can't seem to be able to set arbitrary parameters. I have the following in a custom PowerBI connector:

        // This record contains all of the connection string properties we
        // will set for this ODBC driver. The 'Driver' field is required for
        // all ODBC connections. Other properties will vary between ODBC drivers,
        // but generally take Server and Database properties. Note that
        // credential related properties will be set separately.
        ConnectionString = [
            Driver = "Arrow Flight SQL ODBC Driver",
            host = server,
            port = 443,
            environmentId = "200532"
        ],

But i get the following error: ODBC: ERROR [HY000] [Apache Arrow][Flight SQL] (100) Flight returned invalid argument error, with message: environmentId is a required connection parameter

For our use-case we need to be able to pass various parameters as extra headers, which is supported by both ADBC and JDBC.

@jduo
Copy link
Member

jduo commented Oct 2, 2024

I'm curious, does this version of the driver allow passing arbitrary parameters? I tried using the Dremio ODBC driver but it can't seem to be able to set arbitrary parameters. I have the following in a custom PowerBI connector:

        // This record contains all of the connection string properties we
        // will set for this ODBC driver. The 'Driver' field is required for
        // all ODBC connections. Other properties will vary between ODBC drivers,
        // but generally take Server and Database properties. Note that
        // credential related properties will be set separately.
        ConnectionString = [
            Driver = "Arrow Flight SQL ODBC Driver",
            host = server,
            port = 443,
            environmentId = "200532"
        ],

But i get the following error: ODBC: ERROR [HY000] [Apache Arrow][Flight SQL] (100) Flight returned invalid argument error, with message: environmentId is a required connection parameter

For our use-case we need to be able to pass various parameters as extra headers, which is supported by both ADBC and JDBC.

Last I tried, it supported this. I believe it lower-cases headers though (the C++ grpc client itself requires this) -- is your server looking for environmentId case-insensitively?

@jduo
Copy link
Member

jduo commented Oct 2, 2024

Hey @jduo I started the recase. I will push this. Would you have time for a Quick sync ? Thanks !

Hi @jbonofre , yes we can have a chat. Will send meeting info offline.

@aiguofer
Copy link
Contributor

aiguofer commented Oct 2, 2024

Last I tried, it supported this. I believe it lower-cases headers though (the C++ grpc client itself requires this) -- is your server looking for environmentId case-insensitively?

Yeah it's case-insensitive. I also tried setting environmentid in all lower-case and still get the same error

@aiguofer
Copy link
Contributor

aiguofer commented Oct 2, 2024

Interesting, I do see:

TEST(PopulateCallOptionsTest, GenericOption) {
  FlightSqlConnection connection(odbcabstraction::V_3);
  connection.SetClosed(false);

  Connection::ConnPropertyMap properties;
  properties["Foo"] = "Bar";
  auto options = connection.PopulateCallOptions(properties);
  auto headers = options.headers;
  ASSERT_EQ(1, headers.size());

  // Header name must be lower-case because gRPC will crash if it is not lower-case.
  ASSERT_EQ("foo", headers[0].first);

  // Header value should preserve case.
  ASSERT_EQ("Bar", headers[0].second);
}

Maybe there's a bug with call options not getting set correctly for the HANDSHAKE request? I remember that being an issue with the JDBC driver: #33946

@CurtHagenlocher
Copy link
Contributor

Last I tried, it supported this. I believe it lower-cases headers though (the C++ grpc client itself requires this)

This is actually part of the HTTP/2 spec.

@aiguofer
Copy link
Contributor

aiguofer commented Oct 2, 2024

Ok, I think I know what it is... PopulateCallOptions gets called AFTER Authenticate in FlightSqlConnection::Connect. We'd probably need that before.

@jbonofre jbonofre force-pushed the FLIGHT_SQL_ODBC branch 2 times, most recently from 4dd325d to 7667e55 Compare October 28, 2024 13:06
@jbonofre
Copy link
Member Author

@aiguofer good call. Let me update the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants