Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Include underlying compound information #332

Open
Casper-Guo opened this issue Mar 15, 2023 · 18 comments
Open

[ENH] Include underlying compound information #332

Casper-Guo opened this issue Mar 15, 2023 · 18 comments
Labels
blocked:ergast enhancement New feature or request
Milestone

Comments

@Casper-Guo
Copy link
Contributor

Proposed new feature or change:

Opening this issue because a quick search doesn't turn up any prior discussion.

Per documentation for Laps object Compound column:

the actual underlying compounds C1 to C5 are not differentiated

This information is critical for comparing tyre performance across races. Although it may be redundant/repetitive at the Laps level, it would still be good to have it as a part of either Session or Event.

My own investigation has not turned up a good way to fetch this information automatically.

@theOehrly
Copy link
Owner

One of the main reasons why I haven't worked on this is the lack of a good source. As you say, there does not seem to be a good way to fetch that information automatically. The only reliable way I have found so far is to keep an eye on Pirelli Motorsports Twitter feed before each race. I need to check if there aren't any FIA documents about this.
But it looks like this would be a manual process before each race weekend. Or one could do automatic image processing of the Twitter feed. But as interesting as that task would be, it's quite a waste of time to develop that feature.

@Casper-Guo
Copy link
Contributor Author

Casper-Guo commented Mar 21, 2023

The Pirelli Motorsports Twitter will definitely be a hard source to use. I doubt their tweets will be formatted in a set way each week and it would be challenging to automatically extract information from their graphics.

I have somehow never thought of using FIA documents for this. I took a quick look and the compounds are included in the event notes for each race under the Pirelli preview. Example

These are already uploaded to Twitter automatically sample Tweet. Since these are well-formatted PDFs I am guessing there are tools available to parse the information. I will look into this more when I get around to it.

@theOehrly
Copy link
Owner

theOehrly commented Mar 21, 2023

I have never tried to automatically parse a PDF so I can't judge whether it's easier to extract data from a table in there versus a table in an image.
I am still wondering, whether it's actually worth automating this. Or if it's better to just have a very simple manual process. The effort for automating this completely is relatively high. On the other hand I don't really want to be responsible for doing this task before every race weekend, even if it is simple.

@theOehrly theOehrly added the enhancement New feature or request label Apr 5, 2023
@theOehrly theOehrly added this to the v3.1.0 milestone Apr 5, 2023
@harningle
Copy link
Contributor

harningle commented Jul 1, 2023

It's fairly easy to parse FIA documents and get the tyre compounds, but not sure if and how should we add this into the package. Manually searching for the compounds seems much easier to me. Or should we build our own API for this?

Minimal example (I don't think it works for all races): https://github.com/harningle/Fast-F1/blob/dev-compound/fastf1/compound/docparser.py

@theOehrly
Copy link
Owner

It's fairly easy to parse FIA documents and get the tyre compounds, but not sure if and how should we add this into the package. Manually searching for the compounds seems much easier to me. Or should we build our own API for this?

Minimal example (I don't think it works for all races): https://github.com/harningle/Fast-F1/blob/master/fastf1/compound/docparser.py

I want to add something like this. The way that I'd prefer is to automate the parsing of the documents and build an own API server that serves the data on a public API.
I might get around to that later this summer.

I don't really want to do processing like this in FastF1 itself, because it is inefficient, likely to cause rate limit issues and it's more difficult to fix incorrectly parsed results.

@harningle
Copy link
Contributor

It's fairly easy to parse FIA documents and get the tyre compounds, but not sure if and how should we add this into the package. Manually searching for the compounds seems much easier to me. Or should we build our own API for this?
Minimal example (I don't think it works for all races): https://github.com/harningle/Fast-F1/blob/master/fastf1/compound/docparser.py

I want to add something like this. The way that I'd prefer is to automate the parsing of the documents and build an own API server that serves the data on a public API. I might get around to that later this summer.

I don't really want to do processing like this in FastF1 itself, because it is inefficient, likely to cause rate limit issues and it's more difficult to fix incorrectly parsed results.

Yes make sense! I've never built an API server but would like to learn and get involved. Let me know if there is anything I can help! Btw, Oracle has pretty good lifetime free virtual instances and we may be able to build the API there.

@theOehrly theOehrly modified the milestones: v3.1.0, v3.2.0 Aug 29, 2023
@harningle
Copy link
Contributor

I've parsed the event notes and pirelli preview for all races since 2019, to get the tyre compounds: https://github.com/harningle/fia-doc/blob/main/tyres.json. The race name in the json can be matched with EventName here. Happy to revise the format or set an API for this!

@theOehrly
Copy link
Owner

Nice, especially that it's possible to parse all those races with the same script. That's very promising.

I'm planning to set up an API server anyway but this is still very much a work in progress. But it's needed for multiple reasons and I think the point has come where it's actually reasonable to do.

The current list of things to do to make this work is the following:

  • set up general FastF1 API server (currently leaning towards Cloudflare Workers with Cloudflare D1 Database)
  • automate running this script (and others) (how, where, ... is still to be determined)
  • implement in FastF1

The API server is the most blocking part here right now. But I'm kind of working on it. It's new territory for me, though. Therefore, it's going a bit slowly and I need to play around with some stuff and see how it works.

@Casper-Guo
Copy link
Contributor Author

Thoughts on how to bring 2018 compounds into the fold? I have the data in TOML but I am more thinking about how to make the code/API more general to anticipate for future compound changes

@harningle
Copy link
Contributor

harningle commented Sep 8, 2023

Thoughts on how to bring 2018 compounds into the fold? I have the data in TOML but I am more thinking about how to make the code/API more general to anticipate for future compound changes

FIA has all documents in PDF format in their archive, all the way back to 2012. However, the formats are different: my current script can handle 2018 with simple revision, while I can't find compound info in any document for year 2014. For historical data maybe it's easier to do it manually once. If we want to go even more back in time, next year FIA will have a public E-library, which will host all documents since the first days.

No idea for the potential changes in the future though.

@theOehrly
Copy link
Owner

theOehrly commented Sep 9, 2023

So, after a bit of thinking, I came up with something like this for the required data structure, if we want to support all past and future events.

{
  'season': <int>,
  // the season year
  
  'events': [
  // an array of events
    {
      'round': <int>,
      // the round number
      
      'eventKey': <int, optional>,
      // the eventKey that is used in the F1 livetiming API, only exits for current events
      
      'compoundSpecifications': {
      // an object that lists all possible compounds for this event
      // each compound gets an id that it can be referred to by
        compoundId <any>: {
          'manufacturer': <str>,
          // name of the tyre manufacturer

          'compoundName': <str>,
          // array of compound names, e.g. C1, C2, ...
          
          'simpleName': <str, optional>
          // "simple" names i.e. HARD, MEDIUM, ...
          // this name may change per event
        }
      },
      
      'compounds': {
      // this object maps an array of compound ids to each constructor
      // (preferably use ergast constructor ids)
        constructorId <int>: [
          {
            'compoundId': <compoundId>,
            
            'availableCount': <int>
            // number of available tyre set of this compound
            // could be interesting if we manage to get the data
          },
          ...
        ]
      }
    },
  ]
}

This should make it possible to handle all reasonable cases that I can think of right now, like

  • one manufacturer, one set of compounds for all teams
  • multiple manufacturers with each one set of compounds for their teams
  • a custom mix of compounds from multiple manufacturers per team
    all changing on a per-event basis.

Instead of using compound ids, we could also just directly include the compound information in the mapping for each team. That would make for lots of redundant data (which probably doesn't matter too much) but less complexity.

Annoyingly, this looks pretty complicated. Much more complicated then I'd like it to be. But supporting various compounds, manufacturers and compound sets and names that change per event is complicated I guess.
If anybody can come up with something simpler, please suggest it.

@theOehrly
Copy link
Owner

@harningle this tyre data stuff might get somewhat delayed, considering that #445 has popped up. I want to integrate it into the potential Ergast successor, to not have many different systems. That whole project will hopefully get more traction starting in October. In case you are interested in helping out there, I really could need some help from people who know about relational databases and how to build an API server. Even if it's just a bit of consulting.

@Casper-Guo
Copy link
Contributor Author

Comment to say I would like to be in the loop as well. Has done some DB and SQL but building API server will be a good learning opportunity

@harningle
Copy link
Contributor

I'm in! I work frequently with SQL and API in my job, but have never built anything. Happy to learn and contribute.

@theOehrly theOehrly modified the milestones: v3.2.0, future Jan 5, 2024
@marcll
Copy link

marcll commented Jun 10, 2024

Hello @theOehrly, this might be already an old topic, but I just published something that might become useful.

I have created a generalist parser for FIA documents that works with LLMs and text summarizers and that is able to extract tyre compound information, as well penalties and decisions for given races by running the race documents through an Large language model.

Wanted to share the repo (https://github.com/marcll/f1-fia-doc-parser) in case that it might be useful and that the problem and use case is still relevant.

Thanks for your amazing work!

@psychemedia
Copy link

For 2024 at least, there was a "Pirelli preview" pdf doc with tyre information provided in the following format:

image

Example: https://www.fia.com/sites/default/files/decision-document/2024%20Abu%20Dhabi%20Grand%20Prix%20-%20Event%20Notes%20-%20Pirelli%20Preview.pdf

@Casper-Guo
Copy link
Contributor Author

Yes someone have already explored parsing from that PDF doc repo

I haven't really kept up with the development of the Ergast replacement API so I cannot say whether this functionality will be incorporated, and if so, when it will be

@harningle
Copy link
Contributor

For 2024 at least, there was a "Pirelli preview" pdf doc with tyre information provided in the following format:

image Example: https://www.fia.com/sites/default/files/decision-document/2024%20Abu%20Dhabi%20Grand%20Prix%20-%20Event%20Notes%20-%20Pirelli%20Preview.pdf

yes we are aware of this and have tried something already: https://github.com/jolpica/jolpica-f1 and https://github.com/harningle/fia-doc/blob/main/parse_event_note.py, as @Casper-Guo pointed out above.

The top priority now is to achieve full parity with Ergast. Ergast does not have the underlying compounds behind the tyres, which means jolpica (the new API after the sunset of Ergast) wouldn't have it very soon. I can't promise anything about the timing, as I still have data correctness checks to be done for jolpica, before continuing on this compound info parsing. I'm aiming at late Feb for getting the parser working for compounds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked:ergast enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants