Skip to content
Gregory Kanevsky edited this page Apr 5, 2017 · 13 revisions

Baseball dataset

The baseball dataset is a subset of Lahman's Baseball Database maintained by Sean Lahman. Detailed master dictionary can be found on github here.

To run baseball examples please download zip here and extract data and scripts it contains into a directory on your machine; then run commmand:

load_baseball_data.sh -h your_host_name -p port -d your_db_name -U username -w password

where Aster database your_db_name is hosted on your_host_name:port and accessed using username and password.

The baseball dataset is suitable for the following types of analysis (non-exhaustive list):

  • basic exploratory analysis
  • correlation and regression analysis
  • multi-variate analysis
  • affinity analysis (market basket analysis)
  • any type of baseball statistical analysis based on detailed player and team season stats

Dallas Open Data dataset

To run Dallas examples please download zip here, extract and run:

load_dallas_data.sh -h your_host_name -p port -d your_db_name -U username -w password

Now you can run examples in toaster_demo.R

Clone this wiki locally