You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Depending on the number and sizes of the compression blocks, performing compression/decompression on the CPU can be faster than on the GPU. This is more emphasized for the compression side. Libcudf should have this as a run-time option. Ideally, the implementation would dynamically select how the operation is performed based on the parameters and the system.
The implementation should transparently dispatch between kernel and host operations to avoid having all readers/writers depend on this feature. For optimal performance, the host path should use a thread pool.
Compression
#17656 implements opt-in host side compression. It is limited to the GZIP format, as it's the only host-side compression we currently have available in libcudf. We need to look into ways to add support for other widely used formats, namely
Snappy: we already have a 300-line snap kernel, a (uncomplicated) C++ implementation should be easy to maintain.
Zstandard: use libzstd as a (runtime) dependency - need to evaluate feasibility.
Current implementation of the host compression copies the data back to device. This made it easy to integrate the API, but adds unnecessary H2D (and later D2H) copies. Writers do not further process the compressed chunks apart from two steps:
Select between compression and the original chunks (currently selects the smaller one);
Compact the selected (potential) mix of chunks into contiguous chunks/streams.
If we move these steps from the writers to the compression API, we can return host data and avoid the round-trip. Significant changes to the writers are required to make this work.
Decompression
In libcudf we currently have support for GZIP, ZLIB, and Snappy host decompression.
Similar changes to #17656 could make these available through a generic decompression API.
Here, the challenge of the data location is the opposite - we want to avoid ingesting the data to the device, only to copy it back to host when using host decompression. To resolve this, we can combine ingest and decompression into a higher-level abstraction. [TODO] Document how this abstraction could impact ingest as well (e.g. coalescing reads).
The text was updated successfully, but these errors were encountered:
Depending on the number and sizes of the compression blocks, performing compression/decompression on the CPU can be faster than on the GPU. This is more emphasized for the compression side. Libcudf should have this as a run-time option. Ideally, the implementation would dynamically select how the operation is performed based on the parameters and the system.
The implementation should transparently dispatch between kernel and host operations to avoid having all readers/writers depend on this feature. For optimal performance, the host path should use a thread pool.
Compression
#17656 implements opt-in host side compression. It is limited to the GZIP format, as it's the only host-side compression we currently have available in libcudf. We need to look into ways to add support for other widely used formats, namely
Current implementation of the host compression copies the data back to device. This made it easy to integrate the API, but adds unnecessary H2D (and later D2H) copies. Writers do not further process the compressed chunks apart from two steps:
If we move these steps from the writers to the compression API, we can return host data and avoid the round-trip. Significant changes to the writers are required to make this work.
Decompression
In libcudf we currently have support for GZIP, ZLIB, and Snappy host decompression.
Similar changes to #17656 could make these available through a generic decompression API.
Here, the challenge of the data location is the opposite - we want to avoid ingesting the data to the device, only to copy it back to host when using host decompression. To resolve this, we can combine ingest and decompression into a higher-level abstraction. [TODO] Document how this abstraction could impact ingest as well (e.g. coalescing reads).
The text was updated successfully, but these errors were encountered: