Add SourcePage interface for delayed materialization of ConnectorSourceData #24011

dain · 2024-11-03T00:20:45Z

Description

This adds a new interface to the SPI, SourcePage, which will be the eventual replacement for Page in ConnectorPageSource. Since SourcePage is an interface it allows the connector to directly know when columns are being accessed.

Additionally, SourcePage is not intended to be thread safe, so it can be mutable. Specifically, the interface contains the method:

void selectPositions(int[] positions, int offset, int size);

This reduces the positions that will be returned from the SourcePage, and since this is a mutation operation the connector knows that only the specified positions can be accessed. This allows data sources to use this information for skipping unnecessary reads.

This is based in #24062, so ignore the first three commits. The first commit in this PR is Move Iceberg reader early exit checks to start of method.

Additional Changes

Add `TransformConnectorPageSource`

This utility class in Hive is used by all object store connectors to transform the raw data from file format readers into the final for needed for the query. Specifically, this class has methods for remapping columns, adding constant values, transforming blocks, and most importantly dereferencing fields. The TransformConnectorPageSource has replaced the custom adapters in ORC and Parquet.

Removal of Hive, Iceberg, Hudi, and Delta ConnectorPageSource

All of these implementations are were doing some simple transforms, and have been replaced with TransformConnectorPageSource.

Removal of `ReaderColumns` and `ReaderPageSource`

With the introduction of TransformConnectorPageSource, the existing code for managing field dereference pushdown is no longer needed. All places where these classes were used have been updated to use TransformConnectorPageSource instead. This has the added benefit of simplifying the code by consolidating the multiple layers of transforms into a single place that creates the transformer, which is much easier to read.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## SPI
* Add `SourcePage` interface and `ConnectorPageSource.getNextSourcePage()`. ({issue}`24011`)
* Deprecate `ConnectorPageSource.getNextPage()` for removal. ({issue}`24011`)

core/trino-spi/src/main/java/io/trino/spi/connector/NoChannelsSourcePage.java

core/trino-spi/src/main/java/io/trino/spi/connector/StaticSourcePage.java

raunaqmorarka · 2024-11-13T15:36:11Z

core/trino-spi/src/main/java/io/trino/spi/connector/SourcePage.java

+    /**
+     * Gets all data.
+     */
+    Page getPage();


Could we call it getLoadedPage to make it more obvious that this method will load the underlying data ?

I have been thinking we call this getAllColumns and generally use the term column instead of Block or Page.

Anyway, the next PR after this removes lazy loading entirelly, so I don't really want to use that term in the codebase for a while.

raunaqmorarka · 2024-11-13T15:41:05Z

core/trino-spi/src/main/java/io/trino/spi/connector/SourcePage.java

+     * and {@link Page#getPositions(int[], int, int)} where possible, as this allows
+     * the underlying reader to filter positions on subsequent reads.
+     */
+    void selectPositions(int[] positions, int offset, int size);


This is forcing the selected positions to be a positions list, why not use SelectedPositions here instead of int[] positions to allow ranges to be passed where that is cheaper ?
I expect that for file format readers it will be more efficient to choose to decode/skip batches of positions rather than making that decision at the granularity of each row

I think we will be able to use this API more easily within the new columnar filter evaluation if it takes SelectedPositions as input, otherwise we'd need to always convert to positions list.

SelectedPositions isn't in the SPI. We could move I wasn't sure that was something we wanted.

Generally the APIs for SourcePage were created directly from Page with all unnecessary functions removed. Later in the development process I made selected positions a mutation operation and ended up with this API.

I understand it's not in the SPI today, but can we consider moving it there given my rationale above ? Or do you prefer deferring that to a future PR ?
Also, does selectPositions necessarily have to be a mutation operation ? Why not return a new Page ?

I would prefer to delay to a future PR. We also need to decide if we want to have selected positions or just a selectRange method. I don't have strong feelings either way.

As for why select positions is a mutation has to do with the desire to allow readers to skip data. If it is not a mutation operation, the reader is not free to skip positions because the original object exists. We could make it create a new object and at the same time destroy the original, but that seems worse in practice.

raunaqmorarka · 2024-11-13T16:04:07Z

core/trino-spi/src/main/java/io/trino/spi/connector/StaticSourcePage.java

+    @Override
+    public void selectPositions(int[] positions, int offset, int size)
+    {
+        page = page.getPositions(positions, offset, size);


Is converting to dictionary blocks (that happens internally in getPositions) always a good idea ?
I think dictionary blocks created this way won't benefit from dictionary processing optimizations and will have overhead of dictionary look-ups along with higher memory usage, compared to blocks created from copyPositions.
Also, most of dictionary optimizations around re-using work done on the dictionary is based on reference check on the dictionary in DictionaryBlock, so we might need to think about how to avoid affecting that optimization due to the change in dictionary reference from using getPositions/copyPositions on original DictionaryBlock produced by the page source.

I understand all of that. The code in this PR doesn't try to make significant performance changes like this. I think we could look at making the change you mention, but I think it requires a lot more thought and performance analysis. Or said another way, this is what our code already does today.

raunaqmorarka · 2024-11-13T16:14:27Z

...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java

@@ -213,10 +211,6 @@ public CheckpointEntryIterator(
            HiveColumnHandle column = buildColumnHandle(field, checkpointSchemaManager, this.metadataEntry, this.protocolEntry, addStatsMinMaxColumnFilter).toHiveColumnHandle();
            columnsBuilder.add(column);
            disjunctDomainsBuilder.add(buildTupleDomainColumnHandle(field, column));
-            if (field == ADD) {


cc: @ebyhr @findinpath for this commit

Please read the commit comment for more details. This code was challenging to figure out (hours in a debugger), but I think figured out the intent.

shlomi-alfasi · 2024-11-14T19:04:33Z

core/trino-spi/src/main/java/io/trino/spi/connector/SourcePage.java

+    /**
+     * Gets the number of positions in the page.
+     */
+    int getPositionCount();


With SourcePage now being mutable, there's a potential issue where you might retrieve the positionCount, but then another operation (like calling selectPositions) alters the source, causing the positions to no longer align with the current state.

Yes. There are lots of scenarios where you can get in trouble. The interface is single threaded so there should be no worries about external actors modifying the contents. The interface design is a compromise between simple usability and performance.
I considered designs where you select positions resulted in a new object, but it has the problem that it does not allow the reading code to skip data, because the original object still exists and someone may decide to use that object.
Users of this interface need to be aware of what they are doing, and if they don't want to deal with stuff chaning they can simply materialize the whole page.

On the other hand, can we leverage this behavior and avoid setting the position count until we materialize the Page?
e.g.
When a page needs to return as many rows as possible while keeping the total size under 1MB, determining the number of positions is straightforward if the page contains only fixed-size columns. However, if it includes non-fixed-size columns, the number of rows must be estimated, typically using a worst-case scenario.
if we won't need to commit the positionCount for the SourcePage this problem can be solved

Maybe. The implementation could delay the position count determinization until this method is called... but the position count is needed when any block is fetched, so I'm not sure if this will help as much as you thing. The most common scenario will be:

execute filter - load one or more blocks and filter

select filtered positions - reduce page to a set of positions

project remaining blocks - load the remaining blocks for the selected positions

or there is no filtering so all blocks just get loaded. Either way, the first piece of information you need is the number of positions to return.

raunaqmorarka · 2024-12-04T05:53:54Z

...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java

+            Map<String, String> partitionValues = addReader.getMap(stringMap, "partitionValues");
+            Map<String, Optional<String>> canonicalPartitionValues = canonicalizePartitionValues(partitionValues);
+            if (!partitionConstraint.isAll() && !partitionMatchesPredicate(canonicalPartitionValues, partitionConstraint.getDomains().orElseThrow())) {
+                return null;
+            }


Can we move this above the Materialize from Parquet the information needed to build the AddEntry instance part ?
I think the idea here was to avoid materializing AddEntry related info from parquet when we can prune partition based on partitionValues

I think I understand what you are saying, but no, this can't move and I don't think it would matter. It cannot move because this code is using the addReader variable. This variable is created in the line before this and it uses the addEntryRow variable which is read from the addBlock. Or said another way, the previous block is creating all of the data used in this block. As for why this doesn't matter, all of this work is to remove the whole concept of lazy blocks from Trino. This means that when you have a Block is it always materialized. The PR for that change is queued waiting for this PR.

I get that we want to remove LazyBlock construct to simplify code, but does that mean that the feature itself is going away and there is going to be no other way to achieve the same outcome of lazy materialization ? (btw engines which don't have it, want to have it https://issues.apache.org/jira/browse/SPARK-42256).
As for this specific code, this was implemented as an optimization in #19795 and it is impactful for delta lake query planning.
fyi @findinpath @ebyhr

@raunaqmorarka I did some more digging. The current code always materializes the blocks. It happens because the code is calling addBlock.isNull and to compute that the block must be loaded.

BTW this code is super common in our codebase. Folks go out of there way to try to delay materialization.

raunaqmorarka · 2024-12-04T05:55:53Z

plugin/trino-hive/src/main/java/io/trino/plugin/hive/orc/OrcPageSourceFactory.java

@@ -188,7 +187,7 @@ public Optional<ReaderPageSource> createPageSource(
        if (readerColumns.isPresent()) {
            readerColumnHandles = readerColumns.get().get().stream()
                    .map(HiveColumnHandle.class::cast)
-                    .collect(toUnmodifiableList());
+                    .toList();


collect(toImmutableList())

This was changed in a later commit... I'd prefer not to hunt this down in the 20 commits

shlomi-alfasi · 2024-12-04T09:11:07Z

core/trino-main/src/main/java/io/trino/operator/ScanFilterAndProjectOperator.java

+        void update()
+        {
+            long newProcessedBytes = page.getSizeInBytes();
+            processedBytesConsumer.accept(newProcessedBytes - localProcessedBytes);


when the new page is smaller than the one before the update the value of processedBytesConsumer will be decreased and even get to 0 when all rows were deleted. Does that make sense or we should use something like Math.abs(newProcessedBytes - localProcessedBytes)

@shlomi-alfasi I don't follow. page.getSizeInBytes() is a value that should only increase in size. It represents the loaded size of the page, and you can't "unload" data from a page. processedBytesConsumer is an accumulator so we need deltas. If there a buggy page that reduces the value, I don't want to try to mask that over here because it could end up with an ever increasing value (think of the sequence 100, 0, 100, 0...)

shlomi-alfasi

LGTM

shlomi-alfasi · 2024-12-04T09:16:47Z

core/trino-spi/src/main/java/io/trino/spi/connector/SourcePage.java

+public interface SourcePage
+{
+    /**
+     * Creates a new SourcePage from the specified block.


nit: fix comment

I'll fix this after this PR. I'm going to rename the methods also.

Instead of monitoring for lazy block loading, the size page size can be checked after state changes in SFP.

This data source can be used to transform raw file output to the shape required for the query.

Make BucketAdapter and BucketValidator top level classes

The AddFileEntryExtractor was relying on a side effect of the Parquet that merged columns with same name and different fields into a base column. The proper way is to use a dereference projection, but this is not needed here. Instead this code only needs one base column with the correct field names. With this change CheckpointFieldExtractor only need a single block.

Rename variables to match actual meaning Set useOrcColumnNames when ORC full acid is used Simplify code structure and fix typos in docs

github-actions · 2025-01-01T17:03:24Z

This pull request has gone a while without any activity. Tagging for triage help: @mosabua

github-actions · 2025-01-23T17:03:11Z

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

cla-bot bot added the cla-signed label Nov 3, 2024

github-actions bot added hudi Hudi connector iceberg Iceberg connector delta-lake Delta Lake connector hive Hive connector bigquery BigQuery connector mongodb MongoDB connector labels Nov 3, 2024

dain force-pushed the dain/source-page branch 13 times, most recently from e789d3e to 8c4733c Compare November 9, 2024 21:41

dain mentioned this pull request Nov 9, 2024

Fix Iceberg merge, update, delete, for tables with equality deletes #24062

Merged

dain force-pushed the dain/source-page branch 2 times, most recently from 2aea253 to 48fd73c Compare November 10, 2024 02:44

dain changed the title ~~[WIP] add SourcePage interface for delayed materialization of ConnectorSourceData~~ Add SourcePage interface for delayed materialization of ConnectorSourceData Nov 10, 2024

dain force-pushed the dain/source-page branch from 48fd73c to 1481ad5 Compare November 10, 2024 02:55

dain marked this pull request as ready for review November 11, 2024 00:17

raunaqmorarka reviewed Nov 13, 2024

View reviewed changes

shlomi-alfasi reviewed Nov 14, 2024

View reviewed changes

dain force-pushed the dain/source-page branch from 1481ad5 to 21a3527 Compare December 3, 2024 22:07

dain requested a review from raunaqmorarka December 3, 2024 22:07

raunaqmorarka reviewed Dec 4, 2024

View reviewed changes

shlomi-alfasi reviewed Dec 4, 2024

View reviewed changes

dain force-pushed the dain/source-page branch from 730a642 to ac3a994 Compare December 7, 2024 21:00

dain added 19 commits December 7, 2024 17:22

Move Iceberg reader early exit checks to start of method

f9daf97

Replace process bytes lazy load monitor with state monitor in SFP

f52950a

Instead of monitoring for lazy block loading, the size page size can be checked after state changes in SFP.

Add SourcePage interface for ConnectorPageSource

2bad958

Convert non-object store connectors to SourcePage

8481a66

Convert ORC and Parquet to SourcePage

6fb8ae6

Simplify Hive and Iceberg projectSufficientColumns

3ea817d

Add TransformConnectorPageSource to Hive

cbe426b

This data source can be used to transform raw file output to the shape required for the query.

Convert Iceberg to SourePage and TransformConnectorPageSource

e42a3de

Move constants and static classes out of HivePageSource

50a3e27

Make BucketAdapter and BucketValidator top level classes

Handle column dereference directly in Hive ParquetPageSourceFactory

f20a314

Cleanup Hive OrcPageSourceFactory

e78b2ca

Rename variables to match actual meaning Set useOrcColumnNames when ORC full acid is used Simplify code structure and fix typos in docs

Handle column dereference directly in Hive OrcPageSourceFactory

9decc3f

Convert Hive AvroPageSourceFactory to projectColumnDereferences

ec21803

Convert Hive LinePageSourceFactory to projectColumnDereferences

63e7d06

Convert Hive RcFilePageSourceFactory to projectColumnDereferences

47a03b7

Remove unused Hive ReaderColumns and ReaderPageSource

aad4732

Convert Hive, Hudi, and Delta to SourcePage and TransformCPS

54786d8

Mark ConnectorPageSource.getNextPage() deprecated for removal

0634b4a

dain force-pushed the dain/source-page branch from ac3a994 to 0634b4a Compare December 8, 2024 01:22

raunaqmorarka approved these changes Dec 11, 2024

View reviewed changes

github-actions bot added the stale label Jan 1, 2025

github-actions bot closed this Jan 23, 2025

martint reopened this Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SourcePage interface for delayed materialization of ConnectorSourceData #24011

Add SourcePage interface for delayed materialization of ConnectorSourceData #24011

dain commented Nov 3, 2024 •

edited

Loading

raunaqmorarka Nov 13, 2024

dain Nov 13, 2024

raunaqmorarka Nov 13, 2024

raunaqmorarka Nov 13, 2024

dain Nov 13, 2024

raunaqmorarka Nov 14, 2024

dain Nov 15, 2024

raunaqmorarka Nov 13, 2024

dain Nov 13, 2024

raunaqmorarka Nov 13, 2024

dain Nov 13, 2024

shlomi-alfasi Nov 14, 2024

dain Nov 15, 2024

shlomi-alfasi Nov 17, 2024

dain Nov 19, 2024 •

edited

Loading

raunaqmorarka Dec 4, 2024

dain Dec 5, 2024

raunaqmorarka Dec 5, 2024

dain Dec 7, 2024

raunaqmorarka Dec 4, 2024

dain Dec 5, 2024

shlomi-alfasi Dec 4, 2024

dain Dec 5, 2024

shlomi-alfasi left a comment

shlomi-alfasi Dec 4, 2024

dain Dec 5, 2024

github-actions bot commented Jan 1, 2025

github-actions bot commented Jan 23, 2025

Add SourcePage interface for delayed materialization of ConnectorSourceData #24011

Are you sure you want to change the base?

Add SourcePage interface for delayed materialization of ConnectorSourceData #24011

Conversation

dain commented Nov 3, 2024 • edited Loading

Description

Additional Changes

Add TransformConnectorPageSource

Removal of Hive, Iceberg, Hudi, and Delta ConnectorPageSource

Removal of ReaderColumns and ReaderPageSource

Release notes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dain Nov 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shlomi-alfasi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jan 1, 2025

github-actions bot commented Jan 23, 2025

dain commented Nov 3, 2024 •

edited

Loading

Add `TransformConnectorPageSource`

Removal of `ReaderColumns` and `ReaderPageSource`

dain Nov 19, 2024 •

edited

Loading