Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Substitutions, players temporarily leaving pitch in Kloppy EventData model and dataframe #362

Open
ussf-jbekkers opened this issue Nov 7, 2024 · 7 comments

Comments

@ussf-jbekkers
Copy link

Before 3.16.0, in the Opta Parser specifically, PLAYER_ON and PLAYER_OFF are used to describe a substitution with these two individual events.

In (currently officially not yet released) version 3.16.0 we add a breaking change to this by adding the SUBSTITUTION event. Where the SUBSTITUTION event is now the combination of PLAYER_ON and PLAYER_OFF events in case of a substitution. However, the PLAYER_ON and PLAYER_OFF events can also relate to players temporarily retiring from the pitch as shared by @DriesDeprest in #361.

Some ideas for behavior as discussed by @DriesDeprest @probberechts and me for the Kloppy EventData model should be:

  • Make PLAYER_ON and PLAYER_OFF more specific to mean PLAYER_RETIRES and PLAYER_RETURNS for any event that is unrelated to a substitution.
  • In case of a substitution use the SUBSTITUTION event and assign the replacement_player the Id of the player coming on. (This is implemented in a commit in #333
  • Deprecate the use of PLAYER_ON and PLAYER_OFF explicitly as they are confusing and ambiguous. Similarly to how we did away with HOME_AWAY orientation in favor of STATIC_HOME_AWAY and they were explicitly renamed.

Idea for the implementation in the dataframe representation:

  • Set the replacement_player in the receiving_player field for the SUBSTITUTION event.
  • Remove the GENERIC:player off and GENERIC:player on in favor of GENERIC:player retires and GENERIC: player returns events respectively.

Let me know if I forgot anything or if anyone has any suggestions on this!

@probberechts
Copy link
Contributor

probberechts commented Nov 7, 2024

There are two general types of events that are related to players going off the pitch and returning on the pitch.

  1. First, there are substitutions. Substitutions are final. A player cannot return after he has been substituted.
  2. Second, there are various reasons for which a player temporarily has to leave the pitch. Typically, these are related to injury treatments.

Hence, I would argue that we need at least two different event types.

Substitutions
First, my take on how to do the substitution events. For most event types, there is a main actor and a secondary actor E.g., for a pass, the main actor is the player who executes the pass and the secondary actor is the player who receives the pass. This is not the case for substitutions. Both players that are involved are the main actors. Therefore, I think a substitution should be implemented as:

class SubstitutionEvent(Event):
    player_in: Player | None
    player_out: Player

However, this does not work well with kloppy's current event data model. The SubstitutionEvent would have an empty player attribute.

The alternative is to split it into two event types:

class SubstitutionInEvent(Event):
    """A player comes on the pitch after a substitution."""
    
class SubstitutionOutEvent(Event):
    """A player goes off the pitch after a substitution."""

Additionally, this has the advantage that it automatically works well with the dataframe representation. In the object-oriented format, I don't really care whether a substitution is a single object or two objects, but I think that in the dataframe representation, it should definitely be split into two rows. Adding the replacement player in the receiving_player column is not a good solution!

Temporarily leaving the pitch

I think here the main issue is the naming of event types. I agree that people could easily mistakenly assume that these event types are related to substitutions, especially when you are familiar with Opta data.

As an alternative to "Player on", I like "Player returns". However, I do not have any ideas for a better name for "Player Off".

Maybe it is not really needed to change these names and we can solve it with proper documentation.


For reference, this is how kloppy and the main data providers currently do it:

kloppy

  • EventType.SUBSTITUTION: a player is substituted (and can no longer come on the pitch after it)
  • EventType.PLAYER_OFF: a player goes out of the pitch without a substitution (typically for/after an injury treatment)
  • EventType.PLAYER_ON: a player returns to the pitch after a PLAYER_OFF event.

StatsBomb

  • 19/Substitution: A player is substituted off the field for various reasons.
  • 27/Player off: A player goes/ is carried out of the pitch without a substitution.
  • 26/Player on: A player returns to the pitch after a Player Off event.

Opta

  • 18/Player off: Player is substituted off.
  • 19/Player on: Player comes on as a substitute.
  • 20/Player retired: Player is forced to leave the pitch due to injury and the team have no substitutions left.
  • 21/Player returns: Player comes back on the pitch.

Wyscout
Wyscout does not have events for substitutions. Substitutions are provided as part of the match meta data as a list [{"minute": 69, "playerIn": 1234, "playerOut": 5678}, ...]. Wyscout does not have events / annotations for players temporarily leaving the pitch.

@DriesDeprest
Copy link
Contributor

Thanks for clearly laying it out @probberechts.

But why would it be a problem to keep the substitution event as

@dataclass(repr=False)
@docstring_inherit_attributes(Event)
class SubstitutionEvent(Event):
    """
    SubstitutionEvent

    Attributes:
        event_type (EventType): `EventType.SUBSTITUTION` (See [`EventType`][kloppy.domain.models.event.EventType])
        event_name (str): `"substitution"`,
        replacement_player (Player): See [`Player`][kloppy.domain.models.common.Player]
    """

    replacement_player: Player
    position: Optional[PositionType] = None

    event_type: EventType = EventType.SUBSTITUTION
    event_name: str = "substitution"

where the player is the sub off and the replacement_player is the sub on? Then the SubstitutionEvent class can still properly inherit from the Event class and splitting it up in two events is thus not required.

@DriesDeprest

This comment was marked as resolved.

@probberechts
Copy link
Contributor

probberechts commented Nov 7, 2024

But why would it be a problem to keep the substitution event as ...

I wouldn't say it is a problem per se. I just think it would be more elegant if we split it up for various reasons.

First, it automatically would work well with the dataframe representation.

Second, the player attribute is typically used to refer to the player that executes the event. For a substitution, it's interpretation is not obvious. You can only understand it by looking at the other attributes of the event. If we would use a single event type, I would prefer attributes named player_in and player_out that describe the (equally important) role of each player in the event.

Third, the position attribute is the position of the replacement_player. That is counter-intuitive since in other event types, the additional attributes are mostly related to the player.

This is how I would approach it if I were starting from scratch. Whether it's worth making these changes is a different question. Personally, I don't use the substitution events, so it's not something I’m particularly invested in. 🤷

Also, I think you've made a small mistake wrt Opta definitions

Thanks, I've edited my previous comment.

@DriesDeprest
Copy link
Contributor

Okay, thanks for explaining.

I'm also in doubt whether it's worth making these changes.

Maybe a balance between not having to refactor too much and not introducing too much breaking changes but still supporting a valid dataframe representation, can be found by adding a "replacement_player" column in the dataframe representation?

I personally only need the positions of the Player objects to be correct. But these are only properly set if there are Substitution events in the EventDataset. I thus don't use the SubstitutionEvent's directly.

@UnravelSports
Copy link
Contributor

This has now been fixed by #361 right?

@probberechts
Copy link
Contributor

This has now been fixed by #361 right?

No. This is a discussion thread about

  1. Do we want to stick to a singleSubstitutionEvent or split it in a SubstitutionInEvent and SubstitutionOutEvent.
  2. How we should deserialize a substitution to a dataframe. Do we put it in a single row and add a column for the second player that is involved or do we split it into two rows?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants