Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Writing Iceberg metadata #21

Open
zhousun opened this issue Nov 8, 2024 · 4 comments
Open

Support Writing Iceberg metadata #21

zhousun opened this issue Nov 8, 2024 · 4 comments
Labels
Milestone

Comments

@zhousun
Copy link
Contributor

zhousun commented Nov 8, 2024

What feature are you requesting?

Currently pg_mooncake is wring delta metadata to the object store. But it would be ideal to also write iceberg version.

Why are you requesting this feature?

Fit better into the iceberg ecosystem.

What is your proposed implementation for this feature?

#[cxx::bridge]
mod ffi {
    extern "Rust" {
        fn DeltaInit();

        fn DeltaCreateTable(
            table_name: &CxxString,
            path: &CxxString,
            options: &CxxString,
            column_names: &CxxVector<CxxString>,
            column_types: &CxxVector<CxxString>,
        ) -> Result<()>;

        fn DeltaModifyFiles(
            path: &CxxString,
            options: &CxxString,
            file_paths: &CxxVector<CxxString>,
            file_sizes: &CxxVector<i64>,
            is_add_files: &CxxVector<i8>,
        ) -> Result<()>;
    }
}



Implement the corresponding functions for iceberg, with same semantics.
Catalog support for iceberg is more important than delta, but for MVP it is fine to just write to filesystem.

@zhousun zhousun added the feature label Nov 8, 2024
@dpxcc dpxcc added this to the tbd milestone Nov 13, 2024
@umeshkacha
Copy link

is there any ETA for this?

@dpxcc
Copy link
Contributor

dpxcc commented Jan 3, 2025

It's on our shortlist, but we don't have a concrete ETA yet.

The challenges are:

  1. There isn't a good library for writing Iceberg metadata, except by spinning up a JVM. We're hoping that iceberg-rust will soon make this easier so we can adopt it. Alternatively, we could manually write the metadata ourselves since we only use a small subset of the Iceberg spec, though that's not ideal.
  2. Writing Iceberg metadata is more expensive than Delta Lake because it's based on snapshots. This would require batched writes and compaction to deliver a better user experience.

@zhousun
Copy link
Contributor Author

zhousun commented Jan 17, 2025

@umeshkacha
Another big difference is Iceberg usually requires a catalog to work, which makes the setup a lot more complex for regular postgres user.
This is changing with S3 Table Bucket, I am working on a prototype with S3 Table Bucket.

@umeshkacha
Copy link

umeshkacha commented Jan 19, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants