You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Assume this snipped to create a downsampling query and write the result back to a new downsampling bucket
start_ts= ...
end_ts= ...
tags_to_preserve= ['currency', 'marketPlace'] # NOTE: the error is hidden in this line!query=f""" SELECT date_bin_wallclock(INTERVAL '1 hour', tz(time, 'Europe/Berlin')) AS time, SUM(price) as price,{', '.join(tags_to_preserve)} FROM "marketdata" WHERE time >= timestamp '{start_ts}' AND time <= timestamp '{end_ts}' GROUP BY 1, {', '.join(tags_to_preserve)} """table=client_for_original_data_bucket.query(query=query, language="sql")
data_frame=table.to_pandas()
data_frame=data_frame.sort_values(by="time")
client_with_bucket_to_write_downsampled_data_to.write(
record=data_frame,
data_frame_measurement_name="marketdata",
data_frame_timestamp_column="time",
data_frame_tag_columns=tags_to_preserve,
)
Expected behavior
I would expect the above code snippet to work.
Actual behavior
The snippet results in an error message.
However, this error is thrown:
pyarrow.lib.ArrowInvalid: Flight returned invalid argument error, with message: Error while planning query: Schema error: No field named marketplace. Valid fields are marketdata.price, marketdata.currency, marketdata."marketPlace", marketdata.time.. gRPC client debug context: UNKNOWN
As you can see the reason is the tag called marketPlace. The client ignores the case of the string and changes it to "marketplace" for the query - which is a tag that does not exist and results in an error.
because tags_to_preserve = ['currency', '"marketPlace"'] contains the column name "marketPlace" but only column marketPlace exists in the data frame.
So the solution here is pretty funny:
Add quotes around every tag in the list of tags before executing the query
Remove all those quotes again when passing them as column names to the client because otherwise it won't be able to identify the columns.
Meaning following adjustments need to be made to code that injects tags into a query string like this:
...
tags_to_preserve= ['currency', 'marketPlace']
# prepare tags to be inserted into query stringtags_to_preserve= [f'"{tag}"'fortagintags_to_preserve]
...
# execute query
...
# prepare tags to be used as column accessors by influx clienttags_to_preserve= [tag.replace('"', '') fortagintags_to_preserve]
...
# write data
...
So overall we have two issues:
I was not aware tags are case sensitive and especially that the influxdb3 python client ignores case sensitivity except for tags that are wrapped in double quotes (is any of this documented anywhere? couldn't find it). This behaviour is very annoying in cases like above and I feel like the client should be able to deal with this on it's own (or at least be consistent about it so subsequent actions remain compatible).
Supplying a tag column that does not exist results in an unhandled error.
Additional info
No response
The text was updated successfully, but these errors were encountered:
Specifications
Code sample to reproduce problem
Assume this snipped to create a downsampling query and write the result back to a new downsampling bucket
Expected behavior
I would expect the above code snippet to work.
Actual behavior
The snippet results in an error message.
However, this error is thrown:
pyarrow.lib.ArrowInvalid: Flight returned invalid argument error, with message: Error while planning query: Schema error: No field named marketplace. Valid fields are marketdata.price, marketdata.currency, marketdata."marketPlace", marketdata.time.. gRPC client debug context: UNKNOWN
As you can see the reason is the tag called marketPlace. The client ignores the case of the string and changes it to "marketplace" for the query - which is a tag that does not exist and results in an error.
Let's try to fix above code like this:
By adding double quotes around the case sensitive tag, the query works!
Well...except it doesn't. Now we get another error:
Reason: Internal Server Error
HTTP response body: {"code":"internal error","message":"dml handler error: rejected write: Timeout expired (the operation was cancelled)"}
Besides the fact that this error message is not helpful, the reason is the resulting data frame.
If we run
data_frame.columns
it returns this:Index(['time', 'price', 'currency', 'marketPlace'], dtype='object')
See the issue? Now the last line of the code does not work anymore:
because
tags_to_preserve = ['currency', '"marketPlace"']
contains the column name"marketPlace"
but only columnmarketPlace
exists in the data frame.So the solution here is pretty funny:
Meaning following adjustments need to be made to code that injects tags into a query string like this:
So overall we have two issues:
Additional info
No response
The text was updated successfully, but these errors were encountered: