It is a Maven project based on Spark 2.4.7
, Scala 2.11.12
and Java 1.8 (openjdk recommended)
outcomes:
- HADOOP_HOME
- SPARK_HOME
- JAVA_HOME
- Run
PreparationPreInterview
- output:
club | player_name | position |
---|---|---|
Sporting CP | Coates | Defender |
Benfica | Diogo Gonçalves | Midfielder |
... |
Dataset is composed 2 JSON files that act as a dictionary containing each player basic information. Apart from that, there are 5 different CSV files that contain different stats related to the players on different areas of the game.
Although it is slightly modified. It is based on UCL UEFA Champions League 21/22 Dataset
club
: Stringplayer_name
: Stringposition
: String
club
: Stringplayer_name
: Stringposition
: String
player_name
: Stringclub
: Stringposition
: Stringassists
: Integercorner_taken
: Integeroffsides
: Integerdribbles
: Integer
player_name
: Stringclub
: Stringposition
: Stringpass_accuracy
: Doublepass_attempted
: Integerpass_completed
: Integercross_accuracy
: Integercross_attempted
: Integercross_complted
: Integerfreekicks_taken
: Integer
player_name
: Stringclub
: Stringposition
: Stringsaved
: Integerconceded
: Integersaved_penalties
: Integercleansheets
: Integerpunches_made
: Integer
player_name
: Stringclub
: Stringposition
: Stringgoals
: Integerright_foot
: Integerleft_foot
: Integerheaders
: Integerothers
: Integerinside_area
: Integeroutside_areas
: Integerpenalties
: Integer
player_name
: Stringclub
: Stringposition
: Stringminutes_played
: Integermatch_played
: Integergoals
: Integerassists
: Integerdistance_covered
: String