Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Weather data has duplicate index values, rows and object column types #669

Open
psychemedia opened this issue Dec 20, 2024 · 4 comments
Labels
bug Something isn't working good first issue Good for newcomers
Milestone

Comments

@psychemedia
Copy link

psychemedia commented Dec 20, 2024

Describe the issue:

The dataframe that contains weather data:

  • may contain duplicate rows;
  • may contain rows with duplicate index values
  • has columns datatypes as "object" even though columns contain uniformly typed elements (eg all np.float, boolean or Timedelta values)

Reproduce the code example:

import fastf1
fastf1.set_log_level('ERROR')

fp1 = fastf1.get_session(2024, 'Bahrain', 'FP1')
fp1.load(laps=True)

w = fp1.laps.get_weather_data()

w.dtypes, w.index.value_counts()

"""
# Temporary hackfix
import pandas as pd
w.drop_duplicates(inplace=True)
w.sort_values(by='Time', inplace=True)
w.reset_index(drop=True, inplace=True)

for c in w.columns:
    if c=='Time':
        w[c] = pd.to_timedelta(w[c])
    else:
        w[c] = w[c].astype(type(w[c].iloc[0]))
"""

Error message:

(Time             object
 AirTemp          object
 Humidity         object
 Pressure         object
 Rainfall         object
 TrackTemp        object
 WindDirection    object
 WindSpeed        object
 dtype: object,
 66    14
 68    12
 18    12
...
@theOehrly
Copy link
Owner

The relevant code just picks the first weather data value for each lap and concatenates all the values. If there are multiple drivers, laps can of course take place simultaneously and the same weather data info can be returned multiple times. I probably didn't consider the case of calling this function on a Laps object that contains multiple drivers when writing this.

Your "hacky" fix seems pretty OK, to be honest. Except that I'd prefer to not reindex. This way, the data stays a slice of Session.weather_data. But drop duplicates and sort by time seems like the way to go here.

Disregarding my opinion here, what did you expect the function to return? Maybe we need to reconsider this and change the behaviour to better match expectations. I'd like to hear your opinion on that.

In general, I think this would be a relatively simple fix. So once we agree on what should be done here, I'm open to a pull request for this :)

@theOehrly theOehrly added bug Something isn't working good first issue Good for newcomers labels Dec 20, 2024
@theOehrly theOehrly added this to the v3.5.0 milestone Dec 20, 2024
@psychemedia
Copy link
Author

My possible use case was a quick weather chart under a chart showing when cars were out on track during a practice session.

The reindex was just to clean things up a bit. One of the major issues with the returned df as is was the duped index values.

@theOehrly
Copy link
Owner

My possible use case was a quick weather chart under a chart showing when cars were out on track during a practice session.

With this approach, there will be gaps in the weather chart whenever no cars are on track. Is this what you intended to achieve?

I'm not sure yet, if this discussion is relevant to the bug itself, but I'm trying to understand the usage of this function a bit more. The bug is easy enough to understand 😅

@psychemedia
Copy link
Author

Re: gaps.. probably not.. (I've only just started exploring how I can use the package, but that's often the time when I'm likely to use things in a "not meant to be used like that" way...;-)

( The way I use data is to look for stories. So the gappy data is useful for talking about weather when a driver is on track, eg on a long run, but maybe not so useful to explain eg why there are no cars running (tho' if it starts raining and everyone pits that's a clue!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants