-
-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Include underlying compound information #332
Comments
One of the main reasons why I haven't worked on this is the lack of a good source. As you say, there does not seem to be a good way to fetch that information automatically. The only reliable way I have found so far is to keep an eye on Pirelli Motorsports Twitter feed before each race. I need to check if there aren't any FIA documents about this. |
The Pirelli Motorsports Twitter will definitely be a hard source to use. I doubt their tweets will be formatted in a set way each week and it would be challenging to automatically extract information from their graphics. I have somehow never thought of using FIA documents for this. I took a quick look and the compounds are included in the event notes for each race under the Pirelli preview. Example These are already uploaded to Twitter automatically sample Tweet. Since these are well-formatted PDFs I am guessing there are tools available to parse the information. I will look into this more when I get around to it. |
I have never tried to automatically parse a PDF so I can't judge whether it's easier to extract data from a table in there versus a table in an image. |
It's fairly easy to parse FIA documents and get the tyre compounds, but not sure if and how should we add this into the package. Manually searching for the compounds seems much easier to me. Or should we build our own API for this? Minimal example (I don't think it works for all races): https://github.com/harningle/Fast-F1/blob/dev-compound/fastf1/compound/docparser.py |
I want to add something like this. The way that I'd prefer is to automate the parsing of the documents and build an own API server that serves the data on a public API. I don't really want to do processing like this in FastF1 itself, because it is inefficient, likely to cause rate limit issues and it's more difficult to fix incorrectly parsed results. |
Yes make sense! I've never built an API server but would like to learn and get involved. Let me know if there is anything I can help! Btw, Oracle has pretty good lifetime free virtual instances and we may be able to build the API there. |
I've parsed the event notes and pirelli preview for all races since 2019, to get the tyre compounds: https://github.com/harningle/fia-doc/blob/main/tyres.json. The race name in the json can be matched with |
Nice, especially that it's possible to parse all those races with the same script. That's very promising. I'm planning to set up an API server anyway but this is still very much a work in progress. But it's needed for multiple reasons and I think the point has come where it's actually reasonable to do. The current list of things to do to make this work is the following:
The API server is the most blocking part here right now. But I'm kind of working on it. It's new territory for me, though. Therefore, it's going a bit slowly and I need to play around with some stuff and see how it works. |
Thoughts on how to bring 2018 compounds into the fold? I have the data in TOML but I am more thinking about how to make the code/API more general to anticipate for future compound changes |
FIA has all documents in PDF format in their archive, all the way back to 2012. However, the formats are different: my current script can handle 2018 with simple revision, while I can't find compound info in any document for year 2014. For historical data maybe it's easier to do it manually once. If we want to go even more back in time, next year FIA will have a public E-library, which will host all documents since the first days. No idea for the potential changes in the future though. |
So, after a bit of thinking, I came up with something like this for the required data structure, if we want to support all past and future events. {
'season': <int>,
// the season year
'events': [
// an array of events
{
'round': <int>,
// the round number
'eventKey': <int, optional>,
// the eventKey that is used in the F1 livetiming API, only exits for current events
'compoundSpecifications': {
// an object that lists all possible compounds for this event
// each compound gets an id that it can be referred to by
compoundId <any>: {
'manufacturer': <str>,
// name of the tyre manufacturer
'compoundName': <str>,
// array of compound names, e.g. C1, C2, ...
'simpleName': <str, optional>
// "simple" names i.e. HARD, MEDIUM, ...
// this name may change per event
}
},
'compounds': {
// this object maps an array of compound ids to each constructor
// (preferably use ergast constructor ids)
constructorId <int>: [
{
'compoundId': <compoundId>,
'availableCount': <int>
// number of available tyre set of this compound
// could be interesting if we manage to get the data
},
...
]
}
},
]
} This should make it possible to handle all reasonable cases that I can think of right now, like
Instead of using compound ids, we could also just directly include the compound information in the mapping for each team. That would make for lots of redundant data (which probably doesn't matter too much) but less complexity. Annoyingly, this looks pretty complicated. Much more complicated then I'd like it to be. But supporting various compounds, manufacturers and compound sets and names that change per event is complicated I guess. |
@harningle this tyre data stuff might get somewhat delayed, considering that #445 has popped up. I want to integrate it into the potential Ergast successor, to not have many different systems. That whole project will hopefully get more traction starting in October. In case you are interested in helping out there, I really could need some help from people who know about relational databases and how to build an API server. Even if it's just a bit of consulting. |
Comment to say I would like to be in the loop as well. Has done some DB and SQL but building API server will be a good learning opportunity |
I'm in! I work frequently with SQL and API in my job, but have never built anything. Happy to learn and contribute. |
Hello @theOehrly, this might be already an old topic, but I just published something that might become useful. I have created a generalist parser for FIA documents that works with LLMs and text summarizers and that is able to extract tyre compound information, as well penalties and decisions for given races by running the race documents through an Large language model. Wanted to share the repo (https://github.com/marcll/f1-fia-doc-parser) in case that it might be useful and that the problem and use case is still relevant. Thanks for your amazing work! |
Yes someone have already explored parsing from that PDF doc repo I haven't really kept up with the development of the Ergast replacement API so I cannot say whether this functionality will be incorporated, and if so, when it will be |
yes we are aware of this and have tried something already: https://github.com/jolpica/jolpica-f1 and https://github.com/harningle/fia-doc/blob/main/parse_event_note.py, as @Casper-Guo pointed out above. The top priority now is to achieve full parity with Ergast. Ergast does not have the underlying compounds behind the tyres, which means jolpica (the new API after the sunset of Ergast) wouldn't have it very soon. I can't promise anything about the timing, as I still have data correctness checks to be done for jolpica, before continuing on this compound info parsing. I'm aiming at late Feb for getting the parser working for compounds. |
Proposed new feature or change:
Opening this issue because a quick search doesn't turn up any prior discussion.
Per documentation for
Laps
objectCompound
column:This information is critical for comparing tyre performance across races. Although it may be redundant/repetitive at the
Laps
level, it would still be good to have it as a part of eitherSession
orEvent
.My own investigation has not turned up a good way to fetch this information automatically.
The text was updated successfully, but these errors were encountered: