Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rust] add substrait for flink and be compatible for other engines #454

Merged
merged 7 commits into from
Apr 7, 2024

Conversation

mag1c1an1
Copy link
Contributor

add flink expression to substrait

add more functions

add more tests

add base schema for namedscan, substriat type to arrow type

compatibility

switch to java8

@mag1c1an1 mag1c1an1 force-pushed the sub branch 3 times, most recently from fdfad6b to f06e75f Compare March 21, 2024 06:02
@@ -235,6 +235,7 @@ public DynamicTableSource copy() {
lsts.projectedFields = this.projectedFields;
lsts.remainingPartitions = this.remainingPartitions;
lsts.filter = this.filter;
lsts.filter = this.filter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate code

@@ -52,15 +55,18 @@ public LakeSoulSource(TableId tableId,
List<String> pkColumns,
Map<String, String> optionParams,
@Nullable List<Map<String, String>> remainingPartitions,
@Nullable FilterPredicate filter) {
@Nullable FilterPredicate filterStr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

improper name 'filterStr'

@@ -129,7 +133,11 @@ private void initializeReader() throws IOException {
}

if (filter != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will two kinds of filter cause conflict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, filterSter is null forever in above code. It is used to debug the difference of result datafusion::expr of two kinds filters.

return Tuple2.of(SupportsFilterPushDown.Result.of(accepted, remaining), planToProto(filter));
}

static Schema toArrowSchema(String tableSchema) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No diff from Schema.fromJSON

import java.util.stream.Collectors;
import java.util.stream.Stream;

public class SubstraitUtil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to lakesoul-io-java maybe better.

*
* @param plan Filter{}
*/
public void addFilterProto(Plan plan) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refer com.dmetasoul.lakesoul.meta.jnr.NativeMetadataJavaClient#executeInsert

@@ -255,20 +255,36 @@ pub async fn prune_filter_and_execute(
df: DataFrame,
request_schema: SchemaRef,
filter_str: Vec<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Vec as input directly

@mag1c1an1 mag1c1an1 force-pushed the sub branch 3 times, most recently from 21bd868 to b7104da Compare March 21, 2024 15:58
@mag1c1an1
Copy link
Contributor Author

rebase from main, please review the dependency

Copy link
Contributor

@Ceng23333 Ceng23333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide some flink test cases.

@@ -73,7 +74,8 @@ public class LakeSoulOneSplitRecordsReader implements RecordsWithSplitIds<RowDat
// arrow batch -> row, with requested schema
private ArrowReader curArrowReaderRequestedSchema;

private final FilterPredicate filter;
private final FilterPredicate _filterPredicate;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use underline as var name

return Tuple2.of(SupportsFilterPushDown.Result.of(accepted, remaining), planToProto(filter));
}

public static Expression doTransform(ResolvedExpression flinkExpression, Schema arrow_schema) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arrow_schema should use CamelCase name in Java

}
return ExpressionCreator.binary(nullable, b);
}
case TINYINT:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use integer type with exactly bit-width

}
return ExpressionCreator.fp64(nullable, d);
}
case DATE: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any unit test for date/timestamp case?

@mag1c1an1 mag1c1an1 force-pushed the sub branch 5 times, most recently from 2873000 to 1e49766 Compare March 29, 2024 09:55
Signed-off-by: mag1c1an1 <[email protected]>

add flink expression to substrait

Signed-off-by: mag1c1an1 <[email protected]>

add more functions

Signed-off-by: mag1c1an1 <[email protected]>

add more tests

Signed-off-by: mag1c1an1 <[email protected]>

add base schema for namedscan, substriat type to arrow type

Signed-off-by: mag1c1an1 <[email protected]>

compatibility

Signed-off-by: mag1c1an1 <[email protected]>

switch to java8

Signed-off-by: mag1c1an1 <[email protected]>

before apply cargo fix

Signed-off-by: mag1c1an1 <[email protected]>

cargo clippy && cargo fmt

Signed-off-by: mag1c1an1 <[email protected]>

fix ci

Signed-off-by: mag1c1an1 <[email protected]>

rebase

Signed-off-by: mag1c1an1 <[email protected]>

refactor

Signed-off-by: mag1c1an1 <[email protected]>
Signed-off-by: mag1c1an1 <[email protected]>
Signed-off-by: mag1c1an1 <[email protected]>
Signed-off-by: mag1c1an1 <[email protected]>
Signed-off-by: mag1c1an1 <[email protected]>
import java.util.stream.Stream;

public class SubstraitUtil {
public static final SimpleExtension.ExtensionCollection Se;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace 'Se' with recognizable name.


public class SubstraitUtil {
public static final SimpleExtension.ExtensionCollection Se;
public static final SubstraitBuilder Builder;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename static constants with UPPERCASE_UNDERSCORE format

@@ -466,6 +466,8 @@ SPDX-License-Identifier: Apache-2.0
<include>com.google.code.gson:gson</include>
<include>dev.failsafe:failsafe</include>
<include>com.google.protobuf:protobuf-java</include>
<!--substrait-->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try removing org.apache.parquet:parquet-column from pom

createLakeSoulSourceTableWithDateType(createTableEnv);
// not supported
// String testSql = "select * from type_info where modifyTime=TO_TIMESTAMP_LTZ(1612176000,0)";
String testSql = "select * from type_info " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark not supported datatype with comment.


public class SubstraitTest extends AbstractTestBase {

private final String BATCH_TYPE = "batch";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add tests which filter on hash column/range column

@xuchen-plus xuchen-plus merged commit 499de72 into lakesoul-io:main Apr 7, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants