-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add two new methods in ScalarFunction return_type_from_args
and is_nullable_from_args_nullable
#14094
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Jay Zhan <[email protected]>
Signed-off-by: Jay Zhan <[email protected]>
Signed-off-by: Jay Zhan <[email protected]>
Signed-off-by: Jay Zhan <[email protected]>
Signed-off-by: Jay Zhan <[email protected]>
Signed-off-by: Jay Zhan <[email protected]>
Signed-off-by: Jay Zhan <[email protected]>
Signed-off-by: Jay Zhan <[email protected]>
Signed-off-by: Jay Zhan <[email protected]>
pub fn is_nullable(&self, args: &[Expr], schema: &dyn ExprSchema) -> bool { | ||
self.inner.is_nullable(args, schema) | ||
} | ||
|
||
pub fn is_nullable_from_args_nullable(&self, args_nullables: &[bool]) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove Expr
dependency
/// The data types of the arguments to the function | ||
pub arg_types: &'a [DataType], | ||
/// The Utf8 arguments to the function, if the expression is not Utf8, it will be empty string | ||
pub arguments: &'a [String], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better name 🤔 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to unify the argument handling so that both return type and nullability are returned the same?
I wonder if it would somehow be possible to add the input nullable information here too 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also not sure about only supporting string args, that is likely a regression in behavior for some users (For example, maybe they look for constant integers as well)
@@ -86,22 +87,36 @@ impl ScalarUDFImpl for ArrowCastFunc { | |||
} | |||
|
|||
fn return_type(&self, _arg_types: &[DataType]) -> Result<DataType> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this change looks good, we can deprecate this too
let name_column = &chunk[0]; | ||
let name = match name_column { | ||
ColumnarValue::Scalar(ScalarValue::Utf8(Some(name_scalar))) => name_scalar, | ||
_ => return exec_err!("named_struct even arguments must be string literals, got {name_column:?} instead at position {}", i * 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name_column
output less readable array in this change, remove it for now.
As stated in #13717 (comment) , this new method doesn't necessarily simplify anything. Can you please fill "Rationale for this change"? What problem are we solving? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jayzhan211 -- I think this is a step in the right direction, but I am worried it just makes the API more complicated (adds as many functions as it deprecates)
Challenge: Exprs / constants
It seems to me one challenge is that different information is known for computing return types at different points in the plan (e.g. sometimes we have Expr
and sometimes we don't)
What would you think about making this more explicit in ReturnTypeArgs by making it an enum:
#[derive(Debug)]
pub enum ReturnTypeArgs<'a> {
/// information known at logical planning time
/// Note you can get get type and nullability for each arg
// using the specified ExprSchema
Planning {
pub args: &'a[Expr],
pub schema: &'a dyn ExprSchema
},
/// Information known during Execution
Execution {
/// The data types of the arguments to the function
pub arg_types: &'a [DataType],
pub arg_nullability: [bool],
}
}
Challenge: Multiple APIs (Nullability and return type)
It is somewhat akward to have two functions, one for nullability and one for return type. Also I can imagine that the nullability calculation depends on the input type of arguments too (not just the input nullability) I wonder if we can combine them into a single API:
Maybe something like
/// Information about the output of the function
/// including the data type and nullability:
struct ReturnTypeInfo {
data_type: DataType,
nullable: bool,
}
trait ScalarUDFImpl {
/// Returns the
pub fn return_type_from_args(&self, args: ReturnTypeArgs) -> Result<ReturnTypeInfo>
}
/// The data types of the arguments to the function | ||
pub arg_types: &'a [DataType], | ||
/// The Utf8 arguments to the function, if the expression is not Utf8, it will be empty string | ||
pub arguments: &'a [String], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to unify the argument handling so that both return type and nullability are returned the same?
I wonder if it would somehow be possible to add the input nullable information here too 🤔
/// The data types of the arguments to the function | ||
pub arg_types: &'a [DataType], | ||
/// The Utf8 arguments to the function, if the expression is not Utf8, it will be empty string | ||
pub arguments: &'a [String], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also not sure about only supporting string args, that is likely a regression in behavior for some users (For example, maybe they look for constant integers as well)
Great, I also want this too. |
Yes, it might be, since I assume we don't really need If the constant integer is the only concern, |
#[derive(Debug)]
pub enum ReturnTypeArgs<'a> {
/// information known at logical planning time
/// Note you can get get type and nullability for each arg
// using the specified ExprSchema
Planning {
pub args: &'a[Expr],
pub schema: &'a dyn ExprSchema
},
/// Information known during Execution
Execution {
/// The data types of the arguments to the function
pub arg_types: &'a [DataType],
pub arg_nullability: [bool],
}
} One good thing in this PR is that we don't need Do we really need |
Which issue does this PR close?
Part of #13717
Rationale for this change
return_type_from_args
that has less dependencies onExpr
itself but the computed properties ofExpr
andSchema
includingdata_type
andnullability
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
TODO
Combine
return_type_from_args
andis_nullable_from_args_nullable