Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tag query/write inconsistent quotes with case sensitive tags and unhandled error #116

Open
DataCerealz opened this issue Jan 13, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@DataCerealz
Copy link

DataCerealz commented Jan 13, 2025

Specifications

  • Client Version: 0.10.0
  • InfluxDB Version: 3
  • Platform: Cloud

Code sample to reproduce problem

Assume this snipped to create a downsampling query and write the result back to a new downsampling bucket

start_ts = ...
end_ts = ...

tags_to_preserve = ['currency', 'marketPlace']  # NOTE: the error is hidden in this line!

query = f"""
    SELECT 
        date_bin_wallclock(INTERVAL '1 hour', tz(time, 'Europe/Berlin')) AS time,
        SUM(price) as price,
        {', '.join(tags_to_preserve)}
    FROM
        "marketdata"
    WHERE
        time >= timestamp '{start_ts}' AND time <= timestamp '{end_ts}'
    GROUP BY 
        1, {', '.join(tags_to_preserve)}
    """

table = client_for_original_data_bucket.query(query=query, language="sql")
data_frame = table.to_pandas()
data_frame = data_frame.sort_values(by="time")

client_with_bucket_to_write_downsampled_data_to.write(
      record=data_frame,
      data_frame_measurement_name="marketdata",
      data_frame_timestamp_column="time",
      data_frame_tag_columns=tags_to_preserve,
  )

Expected behavior

I would expect the above code snippet to work.

Actual behavior

The snippet results in an error message.

However, this error is thrown:

pyarrow.lib.ArrowInvalid: Flight returned invalid argument error, with message: Error while planning query: Schema error: No field named marketplace. Valid fields are marketdata.price, marketdata.currency, marketdata."marketPlace", marketdata.time.. gRPC client debug context: UNKNOWN

As you can see the reason is the tag called marketPlace. The client ignores the case of the string and changes it to "marketplace" for the query - which is a tag that does not exist and results in an error.

Let's try to fix above code like this:

...

tags_to_preserve = ['currency', '"marketPlace"']

...

By adding double quotes around the case sensitive tag, the query works!
Well...except it doesn't. Now we get another error:

Reason: Internal Server Error
HTTP response body: {"code":"internal error","message":"dml handler error: rejected write: Timeout expired (the operation was cancelled)"}

Besides the fact that this error message is not helpful, the reason is the resulting data frame.

If we run data_frame.columns it returns this: Index(['time', 'price', 'currency', 'marketPlace'], dtype='object')

See the issue? Now the last line of the code does not work anymore:

client_with_bucket_to_write_downsampled_data_to.write(
      record=data_frame,
      data_frame_measurement_name="marketdata",
      data_frame_timestamp_column="time",
      data_frame_tag_columns=tags_to_preserve,
  )

because tags_to_preserve = ['currency', '"marketPlace"'] contains the column name "marketPlace" but only column marketPlace exists in the data frame.

So the solution here is pretty funny:

  1. Add quotes around every tag in the list of tags before executing the query
  2. Remove all those quotes again when passing them as column names to the client because otherwise it won't be able to identify the columns.

Meaning following adjustments need to be made to code that injects tags into a query string like this:

...
tags_to_preserve = ['currency', 'marketPlace']

# prepare tags to be inserted into query string
tags_to_preserve = [f'"{tag}"' for tag in tags_to_preserve]

...
# execute query
...

# prepare tags to be used as column accessors by influx client
tags_to_preserve = [tag.replace('"', '') for tag in tags_to_preserve]

...
# write data
...

So overall we have two issues:

  • I was not aware tags are case sensitive and especially that the influxdb3 python client ignores case sensitivity except for tags that are wrapped in double quotes (is any of this documented anywhere? couldn't find it). This behaviour is very annoying in cases like above and I feel like the client should be able to deal with this on it's own (or at least be consistent about it so subsequent actions remain compatible).
  • Supplying a tag column that does not exist results in an unhandled error.

Additional info

No response

@DataCerealz DataCerealz added the bug Something isn't working label Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant