-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-38045: [C#] Add flight dictionary support #38046
Conversation
|
@westonpace you're listed as codeowner, could you take a look please? |
I can try to take a look at this over the US holiday; I've avoided it so far because it changes code I'm not familiar with. |
(still working my way up to this; it looks like dictionaries are problematic in general for C# IPC and so I want to understand what's going wrong there before looking at a Flight-specific dictionary batch issue) |
@@ -24,13 +24,13 @@ class DictionaryMemo | |||
{ | |||
private readonly Dictionary<long, IArrowArray> _idToDictionary; | |||
private readonly Dictionary<long, IArrowType> _idToValueType; | |||
private readonly Dictionary<Field, long> _fieldToId; | |||
private readonly Dictionary<string, long> _fieldToId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this change is technically correct because it ignores the fact that e.g. two nested structs could have dictionary-encoded fields with the same name and these would now collide. Obviously the existing implementation isn't great either in that it implicitly relies on reference equality (which I assume is what causes problems for Flight). I don't have any immediate suggestions though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. When I have time to get back to this I will write a test for this and look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can check out my attempt at an implementation at https://github.com/CurtHagenlocher/arrow/tree/dev/curth/FlightDictionaries. It changes the write side of flight server so that the same schema is used to collect the dictionaries for each batch. This continues to let reference equality work for calculating dictionary IDs. I also refactored the "Post" changes to be (what I think is) a little cleaner.
Disclaimer: I am not very familiar with the Flight code or protocol, but your changed tests are passing.
FYI @alexwilcoxson-rel I made substantial changes to dictionaries in #39146 because they weren't working for files or in-memory data. I don't think there's much overlap between that change and this one; the sole exception is probably the hooks to use for tracking dictionary writes. These may need to be reconciled. |
@alexwilcoxson-rel , are you still interested in completing this PR? Should I pursue the work I started in my branch? |
Hi @CurtHagenlocher apologies for not following up, we have had other priorities come up and needing dictionary support for flight dotnet is on our backlog. I will close this and feel free to pick anything up on your end. Once we get to that ticket on our end I will check in. |
Rationale for this change
This PR adds initial support for Dictionary arrays over Arrow flight in C# server and client.
What changes are included in this PR?
Updated FlightDataStream and Flight record batch reader to support dictionaries. Updated ArrowStreamWriter to support resending dictionaries, only enabled for Flight use case. Updated FlightTests.
Are these changes tested?
Updated FlightTests to include DictionaryArrays
Are there any user-facing changes?
No