From 5d52362f24a7d4079478371a6f5d37880f63b427 Mon Sep 17 00:00:00 2001
From: David Li
Date: Sat, 25 May 2024 11:41:49 +0900
Subject: [PATCH 01/26] WIP: [Format] Add Other canonical extension type
---
docs/source/format/CanonicalExtensions.rst | 53 ++++++++++++++++++++++
1 file changed, 53 insertions(+)
diff --git a/docs/source/format/CanonicalExtensions.rst b/docs/source/format/CanonicalExtensions.rst
index c258f889dc6ac..0ed2874ddfe89 100644
--- a/docs/source/format/CanonicalExtensions.rst
+++ b/docs/source/format/CanonicalExtensions.rst
@@ -283,6 +283,59 @@ UUID
A specific UUID version is not required or guaranteed. This extension represents
UUIDs as FixedSizeBinary(16) with big-endian notation and does not interpret the bytes in any way.
+Other
+=====
+
+Other represents a type or array that one Arrow-based system received from an
+external (likely non-Arrow) system, but cannot interpret itself. In this
+case, the Other type explicitly communicates the name and presence of a field
+to downstream clients.
+
+For example:
+
+* A Flight SQL service may support connecting external databases. In this
+ case, its catalog (``GetTables`` etc.) should reflect the names and types of
+ tables in external databases. These tables may support types it does not
+ recognize. Instead of erroring or silently dropping columns from the
+ catalog, it can use the Other[NA] type to report that a column exists with a
+ particular name and type name in the external database; the Other type lets
+ clients know that the column is not supported, but still exists.
+
+* The ADBC PostgreSQL driver, because of how the PostgreSQL wire protocol
+ works, may get bytes for a field whose type it does not recognize (say, a
+ geospatial type). It can still return the bytes to the application which
+ may be able to parse the data itself. In that case, it can use the
+ Other[binary] type to return the column data. The Other type differentiates
+ the column from actual binary columns.
+
+Extension parameters:
+
+* Extension name: ``arrow.other``.
+
+* The storage type of this extension is any type. If there is no underlying
+ data, the storage type should be NA. If there is data (because the system
+ got bytes or some other data it does not know how to interpret), the storage
+ type should preferably be binary or fixed-size binary, but may be any type.
+
+* Extension type parameters:
+
+ * **type_name** = the name of the unknown type in the external system.
+
+* Description of the serialization:
+
+ A valid JSON object containing the parameters as fields. In the future,
+ additional fields may be added, but all fields current and future are never
+ required to interpret the array.
+
+ For example:
+
+ - The PostgreSQL ``polygon`` type may be represented as Other[binary] with
+ metadata ``{"type_name": "polygon"}``.
+ - The PostgreSQL ``point`` type may be represented as
+ Other[fixed_size_binary[16]] with metadata ``{"type_name": "point"}``.
+ - A Flight SQL service may return an array type as Other[NA] with metadata
+ ``{"type_name": "ARRAY"}``.
+
=========================
Community Extension Types
=========================
From d3cd22f376463122a2ae4a11ac288cacdfa67e1a Mon Sep 17 00:00:00 2001
From: David Li
Date: Tue, 28 May 2024 00:37:56 -0400
Subject: [PATCH 02/26] Feedback
---
docs/source/format/CanonicalExtensions.rst | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/docs/source/format/CanonicalExtensions.rst b/docs/source/format/CanonicalExtensions.rst
index 0ed2874ddfe89..d632748a7a06f 100644
--- a/docs/source/format/CanonicalExtensions.rst
+++ b/docs/source/format/CanonicalExtensions.rst
@@ -297,9 +297,9 @@ For example:
case, its catalog (``GetTables`` etc.) should reflect the names and types of
tables in external databases. These tables may support types it does not
recognize. Instead of erroring or silently dropping columns from the
- catalog, it can use the Other[NA] type to report that a column exists with a
- particular name and type name in the external database; the Other type lets
- clients know that the column is not supported, but still exists.
+ catalog, it can use the Other[Null] type to report that a column exists with
+ a particular name and type name in the external database; the Other type
+ lets clients know that the column is not supported, but still exists.
* The ADBC PostgreSQL driver, because of how the PostgreSQL wire protocol
works, may get bytes for a field whose type it does not recognize (say, a
@@ -320,6 +320,7 @@ Extension parameters:
* Extension type parameters:
* **type_name** = the name of the unknown type in the external system.
+ * **vendor_name** = the name of the external system.
* Description of the serialization:
@@ -330,11 +331,12 @@ Extension parameters:
For example:
- The PostgreSQL ``polygon`` type may be represented as Other[binary] with
- metadata ``{"type_name": "polygon"}``.
+ metadata ``{"type_name": "polygon", "vendor_name": "PostgreSQL"}``.
- The PostgreSQL ``point`` type may be represented as
- Other[fixed_size_binary[16]] with metadata ``{"type_name": "point"}``.
- - A Flight SQL service may return an array type as Other[NA] with metadata
- ``{"type_name": "ARRAY"}``.
+ Other[fixed_size_binary[16]] with metadata
+ ``{"type_name": "point", "vendor_name": "PostgreSQL"}``.
+ - A Flight SQL service may return an array type as Other[Null] with metadata
+ ``{"type_name": "varray", "vendor_name": "Oracle"}``.
=========================
Community Extension Types
From b486b3607fa4fae41195098da0a07e635e03c36f Mon Sep 17 00:00:00 2001
From: David Li
Date: Wed, 29 May 2024 01:08:16 -0400
Subject: [PATCH 03/26] Update example
---
docs/source/format/CanonicalExtensions.rst | 34 ++++++++++++++++------
1 file changed, 25 insertions(+), 9 deletions(-)
diff --git a/docs/source/format/CanonicalExtensions.rst b/docs/source/format/CanonicalExtensions.rst
index d632748a7a06f..ff3821be08d00 100644
--- a/docs/source/format/CanonicalExtensions.rst
+++ b/docs/source/format/CanonicalExtensions.rst
@@ -295,11 +295,11 @@ For example:
* A Flight SQL service may support connecting external databases. In this
case, its catalog (``GetTables`` etc.) should reflect the names and types of
- tables in external databases. These tables may support types it does not
- recognize. Instead of erroring or silently dropping columns from the
- catalog, it can use the Other[Null] type to report that a column exists with
- a particular name and type name in the external database; the Other type
- lets clients know that the column is not supported, but still exists.
+ tables in external databases. But those external systems may support types
+ it does not recognize. Instead of erroring or silently dropping columns
+ from the catalog, it can use the Other[Null] type to report that a column
+ exists with a particular name and type name in the external database; this
+ lets clients know that a column exists, but is not supported.
* The ADBC PostgreSQL driver, because of how the PostgreSQL wire protocol
works, may get bytes for a field whose type it does not recognize (say, a
@@ -308,12 +308,20 @@ For example:
Other[binary] type to return the column data. The Other type differentiates
the column from actual binary columns.
+Of course, the intermediate system *could* implement a custom extension type
+for these example types. But there is no way in general that every type can
+be known in advance. In such cases, the Other type allows the system to
+explicitly note that it does not support some type or field, without silently
+losing data or sending irrelevant errors. It could also pretend to support
+the types by making up extension types on the fly. But this misleads
+downstream systems who cannot tell if the type is supported or not.
+
Extension parameters:
* Extension name: ``arrow.other``.
* The storage type of this extension is any type. If there is no underlying
- data, the storage type should be NA. If there is data (because the system
+ data, the storage type should be Null. If there is data (because the system
got bytes or some other data it does not know how to interpret), the storage
type should preferably be binary or fixed-size binary, but may be any type.
@@ -332,12 +340,20 @@ Extension parameters:
- The PostgreSQL ``polygon`` type may be represented as Other[binary] with
metadata ``{"type_name": "polygon", "vendor_name": "PostgreSQL"}``.
- - The PostgreSQL ``point`` type may be represented as
- Other[fixed_size_binary[16]] with metadata
- ``{"type_name": "point", "vendor_name": "PostgreSQL"}``.
+ - The PostGIS ``geometry`` type may be represented as Other[binary] with
+ metadata ``{"type_name": "geometry", "vendor_name": "PostGIS"}``.
- A Flight SQL service may return an array type as Other[Null] with metadata
``{"type_name": "varray", "vendor_name": "Oracle"}``.
+ Applications **should not** try to make conventions around vendor_name and
+ type_name. In other words, if there is an Other type that multiple systems
+ want to support, instead of agreeing on using particular parameters of the
+ Other type they should create a formal extension type. The parameters of
+ the Other type are primarily meant for human operators to understand what
+ type was not supported. Applications may choose to interpret these fields
+ regardless but should be prepared for breakage (if for example the type
+ becomes formally supported).
+
=========================
Community Extension Types
=========================
From 194bec6e09701262587f39f0310b3fa72c8c6a43 Mon Sep 17 00:00:00 2001
From: David Li
Date: Wed, 12 Jun 2024 21:19:16 -0400
Subject: [PATCH 04/26] revise
---
docs/source/format/CanonicalExtensions.rst | 161 ++++++++++++++-------
1 file changed, 108 insertions(+), 53 deletions(-)
diff --git a/docs/source/format/CanonicalExtensions.rst b/docs/source/format/CanonicalExtensions.rst
index ff3821be08d00..e67ee6fd6f4b6 100644
--- a/docs/source/format/CanonicalExtensions.rst
+++ b/docs/source/format/CanonicalExtensions.rst
@@ -283,47 +283,22 @@ UUID
A specific UUID version is not required or guaranteed. This extension represents
UUIDs as FixedSizeBinary(16) with big-endian notation and does not interpret the bytes in any way.
-Other
-=====
-
-Other represents a type or array that one Arrow-based system received from an
-external (likely non-Arrow) system, but cannot interpret itself. In this
-case, the Other type explicitly communicates the name and presence of a field
-to downstream clients.
-
-For example:
-
-* A Flight SQL service may support connecting external databases. In this
- case, its catalog (``GetTables`` etc.) should reflect the names and types of
- tables in external databases. But those external systems may support types
- it does not recognize. Instead of erroring or silently dropping columns
- from the catalog, it can use the Other[Null] type to report that a column
- exists with a particular name and type name in the external database; this
- lets clients know that a column exists, but is not supported.
-
-* The ADBC PostgreSQL driver, because of how the PostgreSQL wire protocol
- works, may get bytes for a field whose type it does not recognize (say, a
- geospatial type). It can still return the bytes to the application which
- may be able to parse the data itself. In that case, it can use the
- Other[binary] type to return the column data. The Other type differentiates
- the column from actual binary columns.
-
-Of course, the intermediate system *could* implement a custom extension type
-for these example types. But there is no way in general that every type can
-be known in advance. In such cases, the Other type allows the system to
-explicitly note that it does not support some type or field, without silently
-losing data or sending irrelevant errors. It could also pretend to support
-the types by making up extension types on the fly. But this misleads
-downstream systems who cannot tell if the type is supported or not.
+Unknown
+=======
+
+Unknown represents a type or array that an Arrow-based system received from an
+external (often non-Arrow) system, which it cannot interpret itself or did not
+have support for in advance. In this case, it can pass on Unknown to its own
+clients to communicate that a field exists, but that it cannot interpret the
+field or data.
Extension parameters:
-* Extension name: ``arrow.other``.
+* Extension name: ``arrow.unknown``.
* The storage type of this extension is any type. If there is no underlying
- data, the storage type should be Null. If there is data (because the system
- got bytes or some other data it does not know how to interpret), the storage
- type should preferably be binary or fixed-size binary, but may be any type.
+ data, the storage type should be Null. If there is data, the storage type
+ should preferably be binary or fixed-size binary, but may be any type.
* Extension type parameters:
@@ -336,23 +311,103 @@ Extension parameters:
additional fields may be added, but all fields current and future are never
required to interpret the array.
- For example:
-
- - The PostgreSQL ``polygon`` type may be represented as Other[binary] with
- metadata ``{"type_name": "polygon", "vendor_name": "PostgreSQL"}``.
- - The PostGIS ``geometry`` type may be represented as Other[binary] with
- metadata ``{"type_name": "geometry", "vendor_name": "PostGIS"}``.
- - A Flight SQL service may return an array type as Other[Null] with metadata
- ``{"type_name": "varray", "vendor_name": "Oracle"}``.
-
- Applications **should not** try to make conventions around vendor_name and
- type_name. In other words, if there is an Other type that multiple systems
- want to support, instead of agreeing on using particular parameters of the
- Other type they should create a formal extension type. The parameters of
- the Other type are primarily meant for human operators to understand what
- type was not supported. Applications may choose to interpret these fields
- regardless but should be prepared for breakage (if for example the type
- becomes formally supported).
+Examples:
+
+* Consider a Flight SQL service that supports connecting external databases.
+ Its clients may request the names and types of columns of tables in those
+ databases, but then there may be types that the Flight SQL service does not
+ recognize, due to lack of support or because those systems have their own
+ extensions or user-defined types.
+
+ The Flight SQL service can use the Unknown[Null] type to report that a
+ column exists with a particular name and type name in the external database.
+ This lets clients know that a column exists, but is not supported. Null is
+ used as the storage type here because only schemas are involved.
+
+ The client would presumably not be able to query such columns from the
+ Flight SQL service, but there may be other columns in the table that it
+ could query, or it could prepare a query that references the unknown column
+ in an expression and produces a result that *is* supported. The Unknown
+ type is a better experience than erroring or silently dropping columns from
+ the catalog.
+
+ An example of the extension metadata would be::
+
+ {"type_name": "varray", "vendor_name": "Oracle"}
+
+* The ADBC PostgreSQL driver may get bytes for a field whose type it does not
+ recognize. This is because of how PostgreSQL and its wire protocol work:
+ the driver will always get bytes for fields and must implement support for
+ all potential types to interpret those bytes. But the driver cannot know
+ about all types in advance, as there may be extensions (e.g. PostGIS for
+ geospatial functionality).
+
+ Beacuse the driver still has the raw bytes, it can use Unknown[Binary] to
+ return those bytes to the application, which may be able to parse the data
+ itself. Unknown differentiates the column from an actual binary column.
+
+ An example of the extension metadata would be::
+
+ {"type_name": "geometry", "vendor_name": "PostGIS"}
+
+* The ADBC PostgreSQL driver may also get bytes for a field whose type it can
+ only partially recognize. For example, PostgreSQL supports `composite types
+ `_ that ascribe new
+ semantics to existing types, somewhat like Arrow extension types.
+
+ The driver would be able to parse the underlying type in this case.
+ However, the driver may still with to use the Unknown type. Consider the
+ example in the PostgreSQL documentation above of a ``complex`` type. Just
+ mapping the type to a plain Arrow ``struct`` type would lose the semantics
+ of that custom type. In this case, the driver can use Unknown[Struct]. The
+ driver would never actually be able to directly support the type in this
+ example, since these types are defined by database administrators, not by
+ the developers.
+
+ An example of the extension metadata would be::
+
+ {"type_name": "database_name.schema_name.complex", "vendor_name": "PostgreSQL"}
+
+* The JDBC adapter in the Arrow Java libraries converts JDBC result sets into
+ Arrow arrays, and also to get Arrow schemas from result sets. JDBC,
+ however, allows drivers to return `arbitrary Java objects
+ `_.
+
+ Currently, the JDBC adapter simply errors, making usage of the adapter a
+ minefield where results are all-or-nothing, even if an application just
+ wants to fetch a schema. Instead, the driver could use Unknown[Null] as a
+ placeholder during schema conversion, only erroring if the application tries
+ to fetch the actual data. That way, clients could at least introspect
+ tables and queries to decide whether it can proceed to query the data, or
+ only query certain columns.
+
+ An example of the extension metadata would be::
+
+ {"type_name": "OTHER", "vendor_name": "JDBC driver name"}
+
+Of course, the intermediate system *could* implement custom extension types in
+these cases. But there is no way that every type can be known in advance, as
+discussed specifically for each example. In such cases, the Unknown type
+allows the system to explicitly note that it does not support some type or
+field, without silently losing data or sending irrelevant errors.
+
+Another option would be to pretend to support the type and make up new
+extension types on the fly. But this misleads downstream systems who cannot
+tell if the type is truly supported or not by the intermediate Arrow
+application (the Flight SQL service, or the JDBC adapter, etc.), which
+particularly matters for the Flight SQL and JDBC examples.
+
+Applications **should not** make conventions around vendor_name and type_name.
+In other words, if there is a type that multiple systems want to support, they
+should create a formal extension type. They *should not* try to agree on
+particular parameters of the Unknown type to recognize. These parameters are
+primarily meant for human end users to understand what type was not supported.
+Of course, applications may choose to interpret these fields regardless but
+must be prepared for breakage (if for example the type becomes formally
+supported with a custom extension type in a later software revision).
+
+Unknown is not about file formats. Considerations such as JSON or other file
+formats, or MIME types, are irrelevant.
=========================
Community Extension Types
From 811590ba434c0dc1af2f3041f55d9325dd72da91 Mon Sep 17 00:00:00 2001
From: David Li
Date: Fri, 14 Jun 2024 03:48:44 -0400
Subject: [PATCH 05/26] Revise
---
docs/source/format/CanonicalExtensions.rst | 115 ++++++++++++---------
1 file changed, 69 insertions(+), 46 deletions(-)
diff --git a/docs/source/format/CanonicalExtensions.rst b/docs/source/format/CanonicalExtensions.rst
index e67ee6fd6f4b6..2415eaa6afc9a 100644
--- a/docs/source/format/CanonicalExtensions.rst
+++ b/docs/source/format/CanonicalExtensions.rst
@@ -311,6 +311,43 @@ Extension parameters:
additional fields may be added, but all fields current and future are never
required to interpret the array.
+Rationale
+---------
+
+Arrow systems often wrap non-Arrow systems, and so they must be prepared to
+handle data types and data that don't have an equivalent Arrow type. A hard
+error is not useful to clients. A client may still want to know of the
+existence of a field, or the types of other, supported fields, without getting
+an error just because of an unrecognized type in one column. Similarly,
+dropping unsupported fields/columns is also a poor solution.
+
+Of course, the Arrow system can use extension types. But it cannot have an
+extension type prepared for every possible type in advance; the non-Arrow
+system can have its own extension mechanisms. It could "make up" a fresh
+extension type on-the-fly. But this misleads downstream systems who cannot
+tell if the type is truly supported or not by the intermediate Arrow
+application.
+
+The Unknown type is superior in all cases. Because it explicitly means that
+the *intermediate* system does not support a type, it can be used to
+explicitly declare an unsupported field or column, without silently losing
+data or sending irrelevant errors. In other words: if an Arrow system
+encounters a non-Arrow type it was not prepared to handle at runtime, it can
+use Unknown to pass the type along to an Arrow client.
+
+Applications **should not** make conventions around vendor_name and type_name.
+If there is a type that multiple systems want to support, they should create a
+formal extension type. They *should not* try to agree to use particular
+parameters of the Unknown type. These parameters are meant for human end
+users to understand what type was not supported. Of course, applications may
+interpret these fields regardless but must be prepared for breakage (if for
+example the type becomes formally supported with a custom extension type in a
+later software revision).
+
+Unknown is not about file formats. Considerations such as JSON or other file
+formats, or MIME types, are irrelevant, and Unknown should not be used as a
+generic container for file format data (XML/JSON/etc.).
+
Examples:
* Consider a Flight SQL service that supports connecting external databases.
@@ -325,11 +362,13 @@ Examples:
used as the storage type here because only schemas are involved.
The client would presumably not be able to query such columns from the
- Flight SQL service, but there may be other columns in the table that it
- could query, or it could prepare a query that references the unknown column
- in an expression and produces a result that *is* supported. The Unknown
- type is a better experience than erroring or silently dropping columns from
- the catalog.
+ service, but there may be other columns that it could query, or it could
+ prepare a query that references the unknown column in an expression and
+ produces a result that *is* supported. The server could make up an
+ extension type on the fly, but then the client wouldn't be able to tell if
+ it can try to query the column or not, while with Unknown, it knows the
+ column is unsupported. So as discussed above, Unknown is superior to all
+ alternatives.
An example of the extension metadata would be::
@@ -337,14 +376,15 @@ Examples:
* The ADBC PostgreSQL driver may get bytes for a field whose type it does not
recognize. This is because of how PostgreSQL and its wire protocol work:
- the driver will always get bytes for fields and must implement support for
- all potential types to interpret those bytes. But the driver cannot know
- about all types in advance, as there may be extensions (e.g. PostGIS for
- geospatial functionality).
+ values come from the server as length-prefixed bytes, so the driver will
+ always have bytes for fields and needs to know how to parse them. But the
+ driver cannot know about all types in advance, as there may be extensions
+ (e.g. PostGIS for geospatial functionality).
Beacuse the driver still has the raw bytes, it can use Unknown[Binary] to
- return those bytes to the application, which may be able to parse the data
- itself. Unknown differentiates the column from an actual binary column.
+ still return those bytes to the application, which may be able to parse the
+ data itself. Unknown differentiates the column from an actual binary
+ column and makes it clear that the value is unparsed.
An example of the extension metadata would be::
@@ -355,14 +395,21 @@ Examples:
`_ that ascribe new
semantics to existing types, somewhat like Arrow extension types.
- The driver would be able to parse the underlying type in this case.
- However, the driver may still with to use the Unknown type. Consider the
- example in the PostgreSQL documentation above of a ``complex`` type. Just
+ The driver would be able to parse the underlying bytes in this case.
+ However, the driver may still want to use the Unknown type. Consider the
+ example in the PostgreSQL documentation above of a ``complex`` type. Simply
mapping the type to a plain Arrow ``struct`` type would lose the semantics
- of that custom type. In this case, the driver can use Unknown[Struct]. The
- driver would never actually be able to directly support the type in this
- example, since these types are defined by database administrators, not by
- the developers.
+ of that custom type, just like how an Arrow system deciding to treat all
+ extension types by dropping the extension metadata would be undesirable.
+ Meanwhile, dynamically generating an extension type would also be wrong
+ semantically - for instance, there may be an actual extension type that
+ should be used.
+
+ Instead, the driver can use Unknown[Struct] to pass on the composite type
+ info. The driver would never actually be able to directly support the type
+ in this example, since these types are defined by database administrators,
+ not by the developers, and the driver developers can never know about all
+ these possibilities.
An example of the extension metadata would be::
@@ -373,42 +420,18 @@ Examples:
however, allows drivers to return `arbitrary Java objects
`_.
- Currently, the JDBC adapter simply errors, making usage of the adapter a
- minefield where results are all-or-nothing, even if an application just
- wants to fetch a schema. Instead, the driver could use Unknown[Null] as a
+ Without the extension type, the JDBC adapter would simply error, making the
+ adapter a minefield where results are all-or-nothing, even if an application
+ just wants a schema. Instead, the driver could use Unknown[Null] as a
placeholder during schema conversion, only erroring if the application tries
to fetch the actual data. That way, clients could at least introspect
- tables and queries to decide whether it can proceed to query the data, or
+ tables and queries to decide whether it can proceed to fetch the data, or
only query certain columns.
An example of the extension metadata would be::
{"type_name": "OTHER", "vendor_name": "JDBC driver name"}
-Of course, the intermediate system *could* implement custom extension types in
-these cases. But there is no way that every type can be known in advance, as
-discussed specifically for each example. In such cases, the Unknown type
-allows the system to explicitly note that it does not support some type or
-field, without silently losing data or sending irrelevant errors.
-
-Another option would be to pretend to support the type and make up new
-extension types on the fly. But this misleads downstream systems who cannot
-tell if the type is truly supported or not by the intermediate Arrow
-application (the Flight SQL service, or the JDBC adapter, etc.), which
-particularly matters for the Flight SQL and JDBC examples.
-
-Applications **should not** make conventions around vendor_name and type_name.
-In other words, if there is a type that multiple systems want to support, they
-should create a formal extension type. They *should not* try to agree on
-particular parameters of the Unknown type to recognize. These parameters are
-primarily meant for human end users to understand what type was not supported.
-Of course, applications may choose to interpret these fields regardless but
-must be prepared for breakage (if for example the type becomes formally
-supported with a custom extension type in a later software revision).
-
-Unknown is not about file formats. Considerations such as JSON or other file
-formats, or MIME types, are irrelevant.
-
=========================
Community Extension Types
=========================
From f8e0905cd98f71c73725d6e547c2a6a59bf7058a Mon Sep 17 00:00:00 2001
From: David Li
Date: Sun, 16 Jun 2024 20:58:05 -0400
Subject: [PATCH 06/26] Revise (2)
---
docs/source/format/CanonicalExtensions.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/source/format/CanonicalExtensions.rst b/docs/source/format/CanonicalExtensions.rst
index 2415eaa6afc9a..40b4bc9ee31d7 100644
--- a/docs/source/format/CanonicalExtensions.rst
+++ b/docs/source/format/CanonicalExtensions.rst
@@ -298,7 +298,7 @@ Extension parameters:
* The storage type of this extension is any type. If there is no underlying
data, the storage type should be Null. If there is data, the storage type
- should preferably be binary or fixed-size binary, but may be any type.
+ may be any type.
* Extension type parameters:
From 767cbf578f5a2b236740469d26ee44aa03087143 Mon Sep 17 00:00:00 2001
From: David Li
Date: Thu, 20 Jun 2024 03:28:28 -0400
Subject: [PATCH 07/26] Implement Java extension type
---
.../jdbc/JdbcToArrowConfigBuilder.java | 2 +
.../arrow/adapter/jdbc/JdbcToArrowUtils.java | 22 +-
.../jdbc/h2/JdbcToArrowDataTypesTest.java | 52 +++
.../InvalidExtensionMetadataException.java | 28 ++
.../arrow/vector/extension/UnknownType.java | 384 ++++++++++++++++++
.../arrow/vector/extension/UnknownVector.java | 54 +++
.../vector/TestUnknownExtensionType.java | 133 ++++++
7 files changed, 674 insertions(+), 1 deletion(-)
create mode 100644 java/vector/src/main/java/org/apache/arrow/vector/extension/InvalidExtensionMetadataException.java
create mode 100644 java/vector/src/main/java/org/apache/arrow/vector/extension/UnknownType.java
create mode 100644 java/vector/src/main/java/org/apache/arrow/vector/extension/UnknownVector.java
create mode 100644 java/vector/src/test/java/org/apache/arrow/vector/TestUnknownExtensionType.java
diff --git a/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowConfigBuilder.java b/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowConfigBuilder.java
index 783a373c6d0a7..ea9ffe55d334a 100644
--- a/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowConfigBuilder.java
+++ b/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowConfigBuilder.java
@@ -211,6 +211,8 @@ public JdbcToArrowConfigBuilder setTargetBatchSize(int targetBatchSize) {
*
*
Defaults to wrapping {@link JdbcToArrowUtils#getArrowTypeFromJdbcType(JdbcFieldInfo,
* Calendar)}.
+ *
+ * @see JdbcToArrowUtils#reportUnsupportedTypesAsUnknown(Function)
*/
public JdbcToArrowConfigBuilder setJdbcToArrowTypeConverter(
Function jdbcToArrowTypeConverter) {
diff --git a/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java b/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java
index 8397d4c9e0dc4..c08a36fea2ea0 100644
--- a/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java
+++ b/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java
@@ -18,6 +18,7 @@
import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE;
import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE;
+import static org.apache.arrow.vector.types.Types.MinorType;
import java.io.IOException;
import java.math.RoundingMode;
@@ -37,6 +38,7 @@
import java.util.Locale;
import java.util.Map;
import java.util.TimeZone;
+import java.util.function.Function;
import org.apache.arrow.adapter.jdbc.consumer.ArrayConsumer;
import org.apache.arrow.adapter.jdbc.consumer.BigIntConsumer;
import org.apache.arrow.adapter.jdbc.consumer.BinaryConsumer;
@@ -80,6 +82,7 @@
import org.apache.arrow.vector.VectorSchemaRoot;
import org.apache.arrow.vector.complex.ListVector;
import org.apache.arrow.vector.complex.MapVector;
+import org.apache.arrow.vector.extension.UnknownType;
import org.apache.arrow.vector.types.DateUnit;
import org.apache.arrow.vector.types.TimeUnit;
import org.apache.arrow.vector.types.pojo.ArrowType;
@@ -216,11 +219,28 @@ public static ArrowType getArrowTypeFromJdbcType(
case Types.STRUCT:
return new ArrowType.Struct();
default:
- // no-op, shouldn't get here
throw new UnsupportedOperationException("Unmapped JDBC type: " + fieldInfo.getJdbcType());
}
}
+ /**
+ * Wrap a JDBC to Arrow type converter such that {@link UnsupportedOperationException} becomes
+ * {@link org.apache.arrow.vector.extension.UnknownType}.
+ *
+ * @param typeConverter The type converter to wrap.
+ * @param vendorName The database name to report as the Unknown type's vendor name.
+ */
+ public static Function reportUnsupportedTypesAsUnknown(
+ Function typeConverter, String vendorName) {
+ return (final JdbcFieldInfo fieldInfo) -> {
+ try {
+ return typeConverter.apply(fieldInfo);
+ } catch (UnsupportedOperationException e) {
+ return new UnknownType(MinorType.NULL.getType(), fieldInfo.getTypeName(), vendorName);
+ }
+ };
+ }
+
/**
* Create Arrow {@link Schema} object for the given JDBC {@link java.sql.ResultSetMetaData}.
*
diff --git a/java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/h2/JdbcToArrowDataTypesTest.java b/java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/h2/JdbcToArrowDataTypesTest.java
index 5537e1acba2bc..d8a7e44c4cd4e 100644
--- a/java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/h2/JdbcToArrowDataTypesTest.java
+++ b/java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/h2/JdbcToArrowDataTypesTest.java
@@ -32,19 +32,28 @@
import static org.apache.arrow.adapter.jdbc.JdbcToArrowTestHelper.assertTinyIntVectorValues;
import static org.apache.arrow.adapter.jdbc.JdbcToArrowTestHelper.assertVarBinaryVectorValues;
import static org.apache.arrow.adapter.jdbc.JdbcToArrowTestHelper.assertVarcharVectorValues;
+import static org.junit.jupiter.api.Assertions.assertDoesNotThrow;
+import static org.junit.jupiter.api.Assertions.assertEquals;
import java.io.IOException;
+import java.sql.DriverManager;
+import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.sql.SQLException;
+import java.sql.Statement;
import java.util.Arrays;
import java.util.Calendar;
+import java.util.List;
+import java.util.function.Function;
import java.util.stream.Stream;
import org.apache.arrow.adapter.jdbc.AbstractJdbcToArrowTest;
+import org.apache.arrow.adapter.jdbc.JdbcFieldInfo;
import org.apache.arrow.adapter.jdbc.JdbcToArrowConfig;
import org.apache.arrow.adapter.jdbc.JdbcToArrowConfigBuilder;
import org.apache.arrow.adapter.jdbc.JdbcToArrowTestHelper;
import org.apache.arrow.adapter.jdbc.JdbcToArrowUtils;
import org.apache.arrow.adapter.jdbc.Table;
+import org.apache.arrow.memory.BufferAllocator;
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.BigIntVector;
import org.apache.arrow.vector.BitVector;
@@ -62,7 +71,12 @@
import org.apache.arrow.vector.VarCharVector;
import org.apache.arrow.vector.VectorSchemaRoot;
import org.apache.arrow.vector.complex.ListVector;
+import org.apache.arrow.vector.extension.UnknownType;
+import org.apache.arrow.vector.types.Types;
+import org.apache.arrow.vector.types.pojo.ArrowType;
+import org.apache.arrow.vector.types.pojo.Field;
import org.apache.arrow.vector.types.pojo.Schema;
+import org.junit.jupiter.api.Test;
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.Arguments;
import org.junit.jupiter.params.provider.MethodSource;
@@ -189,6 +203,44 @@ public void testJdbcSchemaMetadata(Table table) throws SQLException, ClassNotFou
JdbcToArrowTestHelper.assertFieldMetadataMatchesResultSetMetadata(rsmd, schema);
}
+ @Test
+ void testUnknownType() throws SQLException, ClassNotFoundException {
+ try (BufferAllocator allocator = new RootAllocator()) {
+ String url = "jdbc:h2:mem:JdbcToArrowTest";
+ String driver = "org.h2.Driver";
+ Class.forName(driver);
+ conn = DriverManager.getConnection(url);
+ try (Statement stmt = conn.createStatement()) {
+ stmt.executeUpdate("CREATE TABLE unknowntype (a GEOMETRY, b INT)");
+ }
+
+ String query = "SELECT * FROM unknowntype";
+ Calendar calendar = Calendar.getInstance();
+ Function typeConverter =
+ (field) -> JdbcToArrowUtils.getArrowTypeFromJdbcType(field, calendar);
+ JdbcToArrowConfig config =
+ new JdbcToArrowConfigBuilder()
+ .setAllocator(allocator)
+ .setJdbcToArrowTypeConverter(
+ JdbcToArrowUtils.reportUnsupportedTypesAsUnknown(typeConverter, "H2"))
+ .build();
+ Schema schema;
+ try (Statement stmt = conn.createStatement();
+ ResultSet rs = stmt.executeQuery(query)) {
+ schema =
+ assertDoesNotThrow(() -> JdbcToArrowUtils.jdbcToArrowSchema(rs.getMetaData(), config));
+ }
+
+ Schema expected =
+ new Schema(
+ List.of(
+ Field.nullable(
+ "A", new UnknownType(Types.MinorType.NULL.getType(), "GEOMETRY", "H2")),
+ Field.nullable("B", Types.MinorType.INT.getType())));
+ assertEquals(expected, schema);
+ }
+ }
+
/**
* This method calls the assert methods for various DataSets.
*
diff --git a/java/vector/src/main/java/org/apache/arrow/vector/extension/InvalidExtensionMetadataException.java b/java/vector/src/main/java/org/apache/arrow/vector/extension/InvalidExtensionMetadataException.java
new file mode 100644
index 0000000000000..aaf1b4c7fa39c
--- /dev/null
+++ b/java/vector/src/main/java/org/apache/arrow/vector/extension/InvalidExtensionMetadataException.java
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.arrow.vector.extension;
+
+/** */
+public class InvalidExtensionMetadataException extends RuntimeException {
+ public InvalidExtensionMetadataException(String message) {
+ super(message);
+ }
+
+ public InvalidExtensionMetadataException(String message, Throwable cause) {
+ super(message, cause);
+ }
+}
diff --git a/java/vector/src/main/java/org/apache/arrow/vector/extension/UnknownType.java b/java/vector/src/main/java/org/apache/arrow/vector/extension/UnknownType.java
new file mode 100644
index 0000000000000..09759e15d88e9
--- /dev/null
+++ b/java/vector/src/main/java/org/apache/arrow/vector/extension/UnknownType.java
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.arrow.vector.extension;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.JsonNode;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.node.ObjectNode;
+import java.util.Collections;
+import java.util.Objects;
+import java.util.concurrent.atomic.AtomicBoolean;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.vector.BitVector;
+import org.apache.arrow.vector.DateDayVector;
+import org.apache.arrow.vector.DateMilliVector;
+import org.apache.arrow.vector.Decimal256Vector;
+import org.apache.arrow.vector.DecimalVector;
+import org.apache.arrow.vector.DurationVector;
+import org.apache.arrow.vector.FieldVector;
+import org.apache.arrow.vector.FixedSizeBinaryVector;
+import org.apache.arrow.vector.Float2Vector;
+import org.apache.arrow.vector.Float4Vector;
+import org.apache.arrow.vector.Float8Vector;
+import org.apache.arrow.vector.IntVector;
+import org.apache.arrow.vector.IntervalDayVector;
+import org.apache.arrow.vector.IntervalMonthDayNanoVector;
+import org.apache.arrow.vector.IntervalYearVector;
+import org.apache.arrow.vector.LargeVarBinaryVector;
+import org.apache.arrow.vector.LargeVarCharVector;
+import org.apache.arrow.vector.NullVector;
+import org.apache.arrow.vector.TimeMicroVector;
+import org.apache.arrow.vector.TimeMilliVector;
+import org.apache.arrow.vector.TimeNanoVector;
+import org.apache.arrow.vector.TimeSecVector;
+import org.apache.arrow.vector.TimeStampMicroTZVector;
+import org.apache.arrow.vector.TimeStampMicroVector;
+import org.apache.arrow.vector.TimeStampMilliTZVector;
+import org.apache.arrow.vector.TimeStampMilliVector;
+import org.apache.arrow.vector.TimeStampNanoTZVector;
+import org.apache.arrow.vector.TimeStampNanoVector;
+import org.apache.arrow.vector.TimeStampSecTZVector;
+import org.apache.arrow.vector.TimeStampSecVector;
+import org.apache.arrow.vector.VarBinaryVector;
+import org.apache.arrow.vector.VarCharVector;
+import org.apache.arrow.vector.ViewVarBinaryVector;
+import org.apache.arrow.vector.ViewVarCharVector;
+import org.apache.arrow.vector.types.Types;
+import org.apache.arrow.vector.types.pojo.ArrowType;
+import org.apache.arrow.vector.types.pojo.ExtensionTypeRegistry;
+import org.apache.arrow.vector.types.pojo.Field;
+import org.apache.arrow.vector.types.pojo.FieldType;
+
+/** Unknown represents */
+public class UnknownType extends ArrowType.ExtensionType {
+ private static final AtomicBoolean registered = new AtomicBoolean(false);
+ public static final String EXTENSION_NAME = "arrow.unknown";
+ private final ArrowType storageType;
+ private final String typeName;
+ private final String vendorName;
+
+ /** Register the extension type so it can be used globally. */
+ public static void ensureRegistered() {
+ if (!registered.getAndSet(true)) {
+ // The values don't matter, we just need an instance
+ ExtensionTypeRegistry.register(new UnknownType(Types.MinorType.NULL.getType(), "", ""));
+ }
+ }
+
+ /**
+ * Create a new type instance.
+ *
+ * @param storageType The underlying Arrow type.
+ * @param typeName The name of the unknown type.
+ * @param vendorName The name of the originating system of the unknown type.
+ */
+ public UnknownType(ArrowType storageType, String typeName, String vendorName) {
+ this.storageType = Objects.requireNonNull(storageType, "storageType");
+ this.typeName = Objects.requireNonNull(typeName, "typeName");
+ this.vendorName = Objects.requireNonNull(vendorName, "vendorName");
+ }
+
+ @Override
+ public ArrowType storageType() {
+ return storageType;
+ }
+
+ public String typeName() {
+ return typeName;
+ }
+
+ public String vendorName() {
+ return vendorName;
+ }
+
+ @Override
+ public String extensionName() {
+ return EXTENSION_NAME;
+ }
+
+ @Override
+ public boolean extensionEquals(ExtensionType other) {
+ return other != null
+ && EXTENSION_NAME.equals(other.extensionName())
+ && other instanceof UnknownType
+ && storageType.equals(other.storageType())
+ && typeName.equals(((UnknownType) other).typeName())
+ && vendorName.equals(((UnknownType) other).vendorName());
+ }
+
+ @Override
+ public String serialize() {
+ ObjectMapper mapper = new ObjectMapper();
+ ObjectNode object = mapper.createObjectNode();
+ object.put("type_name", typeName);
+ object.put("vendor_name", vendorName);
+ try {
+ return mapper.writeValueAsString(object);
+ } catch (JsonProcessingException e) {
+ throw new RuntimeException("Could not serialize " + this, e);
+ }
+ }
+
+ @Override
+ public ArrowType deserialize(ArrowType storageType, String serializedData) {
+ ObjectMapper mapper = new ObjectMapper();
+ JsonNode object;
+ try {
+ object = mapper.readTree(serializedData);
+ } catch (JsonProcessingException e) {
+ throw new InvalidExtensionMetadataException("Extension metadata is invalid", e);
+ }
+ JsonNode typeName = object.get("type_name");
+ JsonNode vendorName = object.get("vendor_name");
+ if (typeName == null) {
+ throw new InvalidExtensionMetadataException("typeName is missing");
+ }
+ if (vendorName == null) {
+ throw new InvalidExtensionMetadataException("vendorName is missing");
+ }
+ if (!typeName.isTextual()) {
+ throw new InvalidExtensionMetadataException("typeName should be string, was " + typeName);
+ }
+ if (!vendorName.isTextual()) {
+ throw new InvalidExtensionMetadataException("vendorName should be string, was " + vendorName);
+ }
+ return new UnknownType(storageType, typeName.asText(), vendorName.asText());
+ }
+
+ @Override
+ public FieldVector getNewVector(String name, FieldType fieldType, BufferAllocator allocator) {
+ // XXX: fieldType is supposed to be the extension type
+ final Field field = new Field(name, fieldType, Collections.emptyList());
+ final FieldVector underlyingVector =
+ storageType.accept(new UnderlyingVectorTypeVisitor(name, allocator));
+ return new UnknownVector(field, allocator, underlyingVector);
+ }
+
+ @Override
+ public int hashCode() {
+ return Objects.hash(super.hashCode(), storageType, typeName, vendorName);
+ }
+
+ @Override
+ public String toString() {
+ return "UnknownType("
+ + storageType
+ + ", typeName='"
+ + typeName
+ + '\''
+ + ", vendorName='"
+ + vendorName
+ + '\''
+ + ')';
+ }
+
+ private static class UnderlyingVectorTypeVisitor implements ArrowTypeVisitor {
+ private final String name;
+ private final BufferAllocator allocator;
+
+ UnderlyingVectorTypeVisitor(String name, BufferAllocator allocator) {
+ this.name = name;
+ this.allocator = allocator;
+ }
+
+ @Override
+ public FieldVector visit(Null type) {
+ return new NullVector(name);
+ }
+
+ private RuntimeException unsupported(ArrowType type) {
+ throw new UnsupportedOperationException(
+ "UnknownType#getUnderlyingVector is not supported for storage type: " + type);
+ }
+
+ @Override
+ public FieldVector visit(Struct type) {
+ throw unsupported(type);
+ }
+
+ @Override
+ public FieldVector visit(List type) {
+ throw unsupported(type);
+ }
+
+ @Override
+ public FieldVector visit(LargeList type) {
+ throw unsupported(type);
+ }
+
+ @Override
+ public FieldVector visit(FixedSizeList type) {
+ throw unsupported(type);
+ }
+
+ @Override
+ public FieldVector visit(Union type) {
+ throw unsupported(type);
+ }
+
+ @Override
+ public FieldVector visit(Map type) {
+ throw unsupported(type);
+ }
+
+ @Override
+ public FieldVector visit(Int type) {
+ return new IntVector(name, allocator);
+ }
+
+ @Override
+ public FieldVector visit(FloatingPoint type) {
+ switch (type.getPrecision()) {
+ case HALF:
+ return new Float2Vector(name, allocator);
+ case SINGLE:
+ return new Float4Vector(name, allocator);
+ case DOUBLE:
+ return new Float8Vector(name, allocator);
+ }
+ throw unsupported(type);
+ }
+
+ @Override
+ public FieldVector visit(Utf8 type) {
+ return new VarCharVector(name, allocator);
+ }
+
+ @Override
+ public FieldVector visit(Utf8View type) {
+ return new ViewVarCharVector(name, allocator);
+ }
+
+ @Override
+ public FieldVector visit(LargeUtf8 type) {
+ return new LargeVarCharVector(name, allocator);
+ }
+
+ @Override
+ public FieldVector visit(Binary type) {
+ return new VarBinaryVector(name, allocator);
+ }
+
+ @Override
+ public FieldVector visit(BinaryView type) {
+ return new ViewVarBinaryVector(name, allocator);
+ }
+
+ @Override
+ public FieldVector visit(LargeBinary type) {
+ return new LargeVarBinaryVector(name, allocator);
+ }
+
+ @Override
+ public FieldVector visit(FixedSizeBinary type) {
+ return new FixedSizeBinaryVector(Field.nullable(name, type), allocator);
+ }
+
+ @Override
+ public FieldVector visit(Bool type) {
+ return new BitVector(name, allocator);
+ }
+
+ @Override
+ public FieldVector visit(Decimal type) {
+ if (type.getBitWidth() == 128) {
+ return new DecimalVector(Field.nullable(name, type), allocator);
+ } else if (type.getBitWidth() == 256) {
+ return new Decimal256Vector(Field.nullable(name, type), allocator);
+ }
+ throw unsupported(type);
+ }
+
+ @Override
+ public FieldVector visit(Date type) {
+ switch (type.getUnit()) {
+ case DAY:
+ return new DateDayVector(name, allocator);
+ case MILLISECOND:
+ return new DateMilliVector(name, allocator);
+ }
+ throw unsupported(type);
+ }
+
+ @Override
+ public FieldVector visit(Time type) {
+ switch (type.getUnit()) {
+ case SECOND:
+ return new TimeSecVector(name, allocator);
+ case MILLISECOND:
+ return new TimeMilliVector(name, allocator);
+ case MICROSECOND:
+ return new TimeMicroVector(name, allocator);
+ case NANOSECOND:
+ return new TimeNanoVector(name, allocator);
+ }
+ throw unsupported(type);
+ }
+
+ @Override
+ public FieldVector visit(Timestamp type) {
+ if (type.getTimezone() == null || type.getTimezone().isEmpty()) {
+ switch (type.getUnit()) {
+ case SECOND:
+ return new TimeStampSecVector(Field.nullable(name, type), allocator);
+ case MILLISECOND:
+ return new TimeStampMilliVector(Field.nullable(name, type), allocator);
+ case MICROSECOND:
+ return new TimeStampMicroVector(Field.nullable(name, type), allocator);
+ case NANOSECOND:
+ return new TimeStampNanoVector(Field.nullable(name, type), allocator);
+ }
+ } else {
+ switch (type.getUnit()) {
+ case SECOND:
+ return new TimeStampSecTZVector(Field.nullable(name, type), allocator);
+ case MILLISECOND:
+ return new TimeStampMilliTZVector(Field.nullable(name, type), allocator);
+ case MICROSECOND:
+ return new TimeStampMicroTZVector(Field.nullable(name, type), allocator);
+ case NANOSECOND:
+ return new TimeStampNanoTZVector(Field.nullable(name, type), allocator);
+ }
+ }
+ throw unsupported(type);
+ }
+
+ @Override
+ public FieldVector visit(Interval type) {
+ switch (type.getUnit()) {
+ case YEAR_MONTH:
+ return new IntervalYearVector(name, allocator);
+ case DAY_TIME:
+ return new IntervalDayVector(name, allocator);
+ case MONTH_DAY_NANO:
+ return new IntervalMonthDayNanoVector(name, allocator);
+ }
+ throw unsupported(type);
+ }
+
+ @Override
+ public FieldVector visit(Duration type) {
+ return new DurationVector(Field.nullable(name, type), allocator);
+ }
+
+ @Override
+ public FieldVector visit(ListView type) {
+ throw unsupported(type);
+ }
+ }
+}
diff --git a/java/vector/src/main/java/org/apache/arrow/vector/extension/UnknownVector.java b/java/vector/src/main/java/org/apache/arrow/vector/extension/UnknownVector.java
new file mode 100644
index 0000000000000..f211de0d97fe4
--- /dev/null
+++ b/java/vector/src/main/java/org/apache/arrow/vector/extension/UnknownVector.java
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.arrow.vector.extension;
+
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.util.hash.ArrowBufHasher;
+import org.apache.arrow.vector.ExtensionTypeVector;
+import org.apache.arrow.vector.FieldVector;
+import org.apache.arrow.vector.ValueIterableVector;
+import org.apache.arrow.vector.types.pojo.Field;
+
+public class UnknownVector extends ExtensionTypeVector
+ implements ValueIterableVector