-
Notifications
You must be signed in to change notification settings - Fork 2
Demo and Examples
The baseball dataset is a subset of Lahman's Baseball Database maintained by Sean Lahman. Detailed master dictionary can be found on github here.
To run baseball examples please download zip here and extract data and scripts it contains into a directory on your machine; then run commmand:
load_baseball_data.sh -h your_host_name -p port -d your_db_name -U username -w password
where Aster database your_db_name is hosted on your_host_name:port and accessed using username and password.
The baseball dataset is suitable for the following types of analysis (non-exhaustive list):
- basic exploratory analysis
- correlation and regression analysis
- multi-variate analysis
- affinity analysis (market basket analysis)
- any type of baseball statistical analysis based on detailed player and team season stats
To run Dallas examples please download zip here, extract and run:
load_dallas_data.sh -h your_host_name -p port -d your_db_name -U username -w password
Now you can run examples in toaster_demo.R
Dallas Open Data consists of two independent subsets:
- Building Inspection Permits (Building_Inspection_Master_Permits.csv)
- Police Reports (Police_Bulk_Data.csv and Bulk_Police_Narrative.csv)
Dallas building permits are stored in single table Building_Inspection_Master_Permits.csv spanning time period from January 2, 2012 to May 31, 2014 - 91,682 permits in total. Each permit record contains:
- Permit No (unique id)
- Permit Type
- Issued (date)
- Mapsco code
- Contractor (text field with name, address, etc.)
- Value (US$)
- Area code
- Work Description (text)
- Land Use
- Address
- Geo Location
- Zip code
Building permit data are recommended for the following types of analysis (non-exhaustive list):
- geo analytics
- clustering
- text analytics
- exploratory analysis
- trend and time series analysis
- financial analytics
Dallas police reports are comprised of two files:
- Police_Bulk_Data.csv: main table with each record containing a police report
- Bulk_Police_Narrative.csv: supplementing table with written narratives for the reports
This is a wide table where each record is a unique police report (81,018 reports in total) with the following attributes:
- offenseservicenumber: unique identifier
- offensedate: date of offense report
- offensereporteddate: date when offense reported
- offensedescription: non-codified description of offense
- offensestarttime and offensestoptime: offense start and end times
- offensetimedispatched: offense dispatch time
- offensebeat: offense location code
- offensereportingarea: reporting area of offense
- offensename: name of the offender
- offenserace: race of the offender
- offensegender: gender of the offender
- offenseage: age of the offender
- offenseblock: street block of the offense
- offensedirection: street direction
- offensestreet: street name
- offenseapartment: apartment number
- offensecity: city
- offensezip: zipcode of the offense
- offensebusinessblock, offensebusinessdirection, offensebusinessstreet, offensebusinesscity: steet information for businesses
- offensepropertyattackcode: ?
- offensepremises: type (with code sometimes) of premises of the offense
- offensemethodofoffense: method of offense (text)
- offenseweather: how weather was
- offensefamilyviolence: Y or N if family violence qualified
- offensegangacitivty: Y or N if gang activity qualified
- offensereportofficerbadge1: 1st reporting officer
- offensereportingofficerbadge2: 2nd reporting officer (if any)
- offensestatus: status of report
This table compliments main police report table above with text narrative for some of the reports:
- offenseservicenumber: unique identifier of the offense to join police tables together;
- offensenarrative: reporting police officer written text that describe the report
Dallas police reports data are recommended for the following types of analysis (non-exhaustive list):
- text analytics
- clustering
- exploratory analysis
- trend and time series analysis
- predictive modeling