-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Snapshot 123 is not an ancestor of 456 when running DELETE query on Trino (parallel writes) #10013
Comments
@paul-bormans-pcgw : since PyIceberg is involved, I assume the catalog interactions were done via the Iceberg REST API. Could you please confirm this (especially with Trino)? |
Hello @dimas-b, You are correct; iceberg connector configuration is as follows:
This is btw the Nessie generated config:
Paul |
Looks related/similar to #9969 |
Please confirm, you mean that I should run this DELETE query using catalog type="nessie"? (and not over type="rest")? |
The current behavior of Nessie's Iceberg REST is to return only the most recent Iceberg snapshot. However, this seems to conflict with some Iceberg operations, which are not only maintenance operations, but rather related to "merge on read" / (equality) deletes. This change changes Nessie's behavior by returning older snapshots from a load-table and update-table operations. Register-table operation however do not change, because only the latest snapshot is actually imported. The behavior does change by returning an error if the table to be registered has more than 1 snapshots. Fixes projectnessie#10013 Fixes projectnessie#9969
The current behavior of Nessie's Iceberg REST is to return only the most recent Iceberg snapshot. However, this seems to conflict with some Iceberg operations, which are not only maintenance operations, but rather related to "merge on read" / (equality) deletes. This change changes Nessie's behavior by returning older snapshots from a load-table and update-table operations. Register-table operation however do not change, because only the latest snapshot is actually imported. The behavior does change by returning an error if the table to be registered has more than 1 snapshots. Fixes projectnessie#10013 Fixes projectnessie#9969
The current behavior of Nessie's Iceberg REST is to return only the most recent Iceberg snapshot. However, this seems to conflict with some Iceberg operations, which are not only maintenance operations, but rather related to "merge on read" / (equality) deletes. This change changes Nessie's behavior by returning older snapshots from a load-table and update-table operations. Register-table operation however do not change, because only the latest snapshot is actually imported. The behavior does change by returning an error if the table to be registered has more than 1 snapshots. Fixes projectnessie#10013 Fixes projectnessie#9969
The current behavior of Nessie's Iceberg REST is to return only the most recent Iceberg snapshot. However, this seems to conflict with some Iceberg operations, which are not only maintenance operations, but rather related to "merge on read" / (equality) deletes. This change changes Nessie's behavior by returning older snapshots from a load-table and update-table operations. Register-table operation however do not change, because only the latest snapshot is actually imported. The behavior does change by returning an error if the table to be registered has more than 1 snapshots. Fixes projectnessie#10013 Fixes projectnessie#9969
The current behavior of Nessie's Iceberg REST is to return only the most recent Iceberg snapshot. However, this seems to conflict with some Iceberg operations, which are not only maintenance operations, but rather related to "merge on read" / (equality) deletes. This change changes Nessie's behavior by returning older snapshots from a load-table and update-table operations. Register-table operation however do not change, because only the latest snapshot is actually imported. The behavior does change by returning an error if the table to be registered has more than 1 snapshots. Fixes projectnessie#10013 Fixes projectnessie#9969
The current behavior of Nessie's Iceberg REST is to return only the most recent Iceberg snapshot. However, this seems to conflict with some Iceberg operations, which are not only maintenance operations, but rather related to "merge on read" / (equality) deletes. This change changes Nessie's behavior by returning older snapshots from a load-table and update-table operations. Register-table operation however do not change, because only the latest snapshot is actually imported. The behavior does change by returning an error if the table to be registered has more than 1 snapshots. Fixes projectnessie#10013 Fixes projectnessie#9969
The current behavior of Nessie's Iceberg REST is to return only the most recent Iceberg snapshot. However, this seems to conflict with some Iceberg operations, which are not only maintenance operations, but rather related to "merge on read" / (equality) deletes. This change changes Nessie's behavior by returning older snapshots from a load-table and update-table operations. Register-table operation however do not change, because only the latest snapshot is actually imported. The behavior does change by returning an error if the table to be registered has more than 1 snapshots. Fixes projectnessie#10013 Fixes projectnessie#9969
What happened
Every so many seconds we append new data to a Table using pyIceberg (using Arrow tables).
On the background we run some Table maintenance operations; specifically removing old data is important (since we have limited space available). We do this using Trino:
Now this query takes a few minutes to complete but in the end it fails to create a new snapshot:
I'm sure it's related to the parallel write operations, because when I stop the ingest activities (Trino is then the only remaining writer); the query succeeds.
I know with use of Nessie you do not have table history (can't see older snapshots) since Nessie maintains these are catalog level instead; i'm wondering if this has anything to do with my finding?
Or are there additional iceberg configuration to set?
Can I extend the query to claim exclusive write access to the Table?
How to reproduce it
Nessie server type (docker/uber-jar/built from source) and version
docker: quay.io/projectnessie/nessie:0.100.1
Client type (Ex: UI/Spark/pynessie ...) and version
PyIceberg + Trino
Additional information
No response
The text was updated successfully, but these errors were encountered: