[BUG] Weather data has duplicate index values, rows and object column types #669

psychemedia · 2024-12-20T02:47:51Z

Describe the issue:

The dataframe that contains weather data:

may contain duplicate rows;
may contain rows with duplicate index values
has columns datatypes as "object" even though columns contain uniformly typed elements (eg all np.float, boolean or Timedelta values)

Reproduce the code example:

import fastf1
fastf1.set_log_level('ERROR')

fp1 = fastf1.get_session(2024, 'Bahrain', 'FP1')
fp1.load(laps=True)

w = fp1.laps.get_weather_data()

w.dtypes, w.index.value_counts()

"""
# Temporary hackfix
import pandas as pd
w.drop_duplicates(inplace=True)
w.sort_values(by='Time', inplace=True)
w.reset_index(drop=True, inplace=True)

for c in w.columns:
    if c=='Time':
        w[c] = pd.to_timedelta(w[c])
    else:
        w[c] = w[c].astype(type(w[c].iloc[0]))
"""

Error message:

(Time             object
 AirTemp          object
 Humidity         object
 Pressure         object
 Rainfall         object
 TrackTemp        object
 WindDirection    object
 WindSpeed        object
 dtype: object,
 66    14
 68    12
 18    12
...

The text was updated successfully, but these errors were encountered:

theOehrly · 2024-12-20T18:30:31Z

The relevant code just picks the first weather data value for each lap and concatenates all the values. If there are multiple drivers, laps can of course take place simultaneously and the same weather data info can be returned multiple times. I probably didn't consider the case of calling this function on a Laps object that contains multiple drivers when writing this.

Your "hacky" fix seems pretty OK, to be honest. Except that I'd prefer to not reindex. This way, the data stays a slice of Session.weather_data. But drop duplicates and sort by time seems like the way to go here.

Disregarding my opinion here, what did you expect the function to return? Maybe we need to reconsider this and change the behaviour to better match expectations. I'd like to hear your opinion on that.

In general, I think this would be a relatively simple fix. So once we agree on what should be done here, I'm open to a pull request for this :)

psychemedia · 2024-12-20T20:54:55Z

My possible use case was a quick weather chart under a chart showing when cars were out on track during a practice session.

The reindex was just to clean things up a bit. One of the major issues with the returned df as is was the duped index values.

theOehrly · 2024-12-21T20:56:24Z

My possible use case was a quick weather chart under a chart showing when cars were out on track during a practice session.

With this approach, there will be gaps in the weather chart whenever no cars are on track. Is this what you intended to achieve?

I'm not sure yet, if this discussion is relevant to the bug itself, but I'm trying to understand the usage of this function a bit more. The bug is easy enough to understand 😅

psychemedia · 2024-12-22T10:41:28Z

Re: gaps.. probably not.. (I've only just started exploring how I can use the package, but that's often the time when I'm likely to use things in a "not meant to be used like that" way...;-)

( The way I use data is to look for stories. So the gappy data is useful for talking about weather when a driver is on track, eg on a long run, but maybe not so useful to explain eg why there are no cars running (tho' if it starts raining and everyone pits that's a clue!)

theOehrly · 2025-02-01T14:09:04Z

I have thought about this some more and about potential use cases (and looked at my own documentation😅). I now think that it's best to leave this as it is. There are advantages to having the duplicates, and there are advantages not having them. For example, the documentation shows an example where weather data is concatenated with timing data such that there is one weather data value for each lap. This requires the duplicates to exist.
The intention of this function is/was/will be to return one value for each lap. When laps overlap, there may be duplicate weather data values in the returned data frame. Users can drop them themselves if they need to (as you did).

Changing this behaviour would be an API change that breaks backwards compatibility. I no longer think this is a bug. And I don't see a compelling enough reason to change the API behaviour here. Specifically because there are reasons for both ways to be useful.

Instead, I'll clarify the documentation to make this behaviour more clear so that users aren't surprised or confused by it.

theOehrly added bug Something isn't working good first issue Good for newcomers labels Dec 20, 2024

theOehrly added this to the v3.5.0 milestone Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Weather data has duplicate index values, rows and object column types #669

[BUG] Weather data has duplicate index values, rows and object column types #669

psychemedia commented Dec 20, 2024 •

edited

Loading

theOehrly commented Dec 20, 2024

psychemedia commented Dec 20, 2024

theOehrly commented Dec 21, 2024

psychemedia commented Dec 22, 2024

theOehrly commented Feb 1, 2025

[BUG] Weather data has duplicate index values, rows and object column types #669

[BUG] Weather data has duplicate index values, rows and object column types #669

Comments

psychemedia commented Dec 20, 2024 • edited Loading

Describe the issue:

Reproduce the code example:

Error message:

theOehrly commented Dec 20, 2024

psychemedia commented Dec 20, 2024

theOehrly commented Dec 21, 2024

psychemedia commented Dec 22, 2024

theOehrly commented Feb 1, 2025

psychemedia commented Dec 20, 2024 •

edited

Loading