Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataValue{Any} is not converted to missing #109

Closed
davidanthoff opened this issue Jul 9, 2019 · 6 comments · Fixed by #110
Closed

DataValue{Any} is not converted to missing #109

davidanthoff opened this issue Jul 9, 2019 · 6 comments · Fixed by #110

Comments

@davidanthoff
Copy link
Collaborator

This is with v0.2.8:

julia> DataFrame([(a=DataValue{Any}(), b=DataValue{Int}())])
1×2 DataFrame
│ Row │ a   │ b       │
│     │ Any │ Int64⍰  │
├─────┼─────┼─────────┤
│ 1#NA │ missing │

Here column a should be of type Union{Any,Missing}, not Any, and the value in (1,1) should be missing, not NA.

@davidanthoff
Copy link
Collaborator Author

Actually, Vector{Union{Any,Missing}} is not a thing, so the type of that vector is fine as Vector{Any}, but NA values should still show up as missing.

@quinnj
Copy link
Member

quinnj commented Jul 9, 2019

Hmmm........I'm not sure what we can really do here. If the column type is DataValue{Any} then the nondatavaluetype is Any, but I'm not sure if we want to define:

Base.convert(::Type{Any}, x::DataValue{T}) where {T} = isna(x) ? missing : unsafe_get(x)

For one, I think convert methods to Any are discouraged, and two, I'm not even sure it makes sense: if you convert to Any, then the input argument itself should just be returned, since that's valid. So as weird as it might seem, I think having the final column type be Any and a value of DataValue{Any}() is technically correct. I think it's more on library/source authors to try to minimize cases where DataValue{Any} is a valid value (or tbh, it should probably just use Any as the column type, and use missing directly, since that's a case that's not going to be inference friendly anyway).

@nalimilan
Copy link
Member

Maybe we should use another function than convert which would unwrap DataValue when the destination type is Any? It would call convert for other destination types.

@quinnj
Copy link
Member

quinnj commented Jul 11, 2019

Well, we used to have unwrap, but @davidanthoff and I were hoping to not need it in favor of using convert. @davidanthoff, any thoughts here?

@davidanthoff
Copy link
Collaborator Author

This is tricky... I agree we shouldn't define the convert function for this case, that would be too broad...

I do think we'll come across DataValue{Any} a fair bit, mostly from ExcelFiles.jl: heterogeneous columns with missing values are a common and valid thing in Excel files, and that does seem the best way to represent them.

In TableTraitsUtils I now just special cased this, but that is of course more difficult without taking a direct dependency on DataValues.jl...

But maybe there is a way to detect this case even with just the existing infrastructure? The core problem seems to be that we can't represent the equivalent of DataValue{Any} as a union, but maybe nondatavaluetype could return for example Union{Some{Any},Mising} for just that case. Tables.jl would have to special case that situation then. And then it could use get(val, missing to do the actual conversion (which should work because that is a base function).

@nalimilan
Copy link
Member

But maybe there is a way to detect this case even with just the existing infrastructure? The core problem seems to be that we can't represent the equivalent of DataValue{Any} as a union, but maybe nondatavaluetype could return for example Union{Some{Any},Mising} for just that case. Tables.jl would have to special case that situation then. And then it could use get(val, missing to do the actual conversion (which should work because that is a base function).

What's the advantage over returning Any? In both cases you need to special-case that situation anyway.

quinnj added a commit that referenced this issue Jul 12, 2019
…he normal convert-to-nondatavaluetype paradigm. Here, we call nondatavaluetype, and if the result is a different type, we know the value we're dealing w/ is a DataValue. We then call a new method, unwrap, if it's a DataValue, which has a special overload for Any. If it's not a DataValue type, no conversion is needed. Fixes #109
quinnj added a commit that referenced this issue Jul 15, 2019
…he normal convert-to-nondatavaluetype paradigm. Here, we call nondatavaluetype, and if the result is a different type, we know the value we're dealing w/ is a DataValue. We then call a new method, unwrap, if it's a DataValue, which has a special overload for Any. If it's not a DataValue type, no conversion is needed. Fixes #109 (#110)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants