Skip to content

albertols/king-corteenr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DB Interview Spark Scala

It is a Maven project based on Spark 2.4.7, Scala 2.11.12 and Java 1.8 (openjdk recommended)

How to run Spark in local Windows?

  1. https://www.knowledgehut.com/blog/big-data/how-to-install-apache-spark-on-windows

outcomes:

  • HADOOP_HOME
  • SPARK_HOME
  • JAVA_HOME
  1. Run PreparationPreInterview
  2. output:
club player_name position
Sporting CP Coates Defender
Benfica Diogo Gonçalves Midfielder
...

Dataset Model

Dataset is composed 2 JSON files that act as a dictionary containing each player basic information. Apart from that, there are 5 different CSV files that contain different stats related to the players on different areas of the game.

Although it is slightly modified. It is based on UCL UEFA Champions League 21/22 Dataset

players1.json

  • club: String
  • player_name: String
  • position: String

players2.json

  • club: String
  • player_name: String
  • position: String

attacking.csv

  • player_name: String
  • club: String
  • position: String
  • assists: Integer
  • corner_taken: Integer
  • offsides: Integer
  • dribbles: Integer

distr.csv

  • player_name: String
  • club: String
  • position: String
  • pass_accuracy: Double
  • pass_attempted: Integer
  • pass_completed: Integer
  • cross_accuracy: Integer
  • cross_attempted: Integer
  • cross_complted: Integer
  • freekicks_taken: Integer

goalkeeping.csv

  • player_name: String
  • club: String
  • position: String
  • saved: Integer
  • conceded: Integer
  • saved_penalties: Integer
  • cleansheets: Integer
  • punches_made: Integer

goals.csv

  • player_name: String
  • club: String
  • position: String
  • goals: Integer
  • right_foot: Integer
  • left_foot: Integer
  • headers: Integer
  • others: Integer
  • inside_area: Integer
  • outside_areas: Integer
  • penalties: Integer

key_stats.csv

  • player_name: String
  • club: String
  • position: String
  • minutes_played: Integer
  • match_played: Integer
  • goals: Integer
  • assists: Integer
  • distance_covered: String

About

Scala and UCL DataFrame exercises

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published