This package is no longer functional and is not being maintained, because the California Open Data Portal (data.ca.gov) has moved from the DKAN platform to the CKAN platform. If you'd like to use R to interface with the current CKAN version of the data portal, a great option is the ckanr package.
This package defines an R function called read_dkan()
which provides an interface for datastore API of any DKAN-based data portal, such as California's Open Data Portal at data.ca.gov. The read_dkan()
function retrieves data specified by the query parameters (described below), and returns the data formatted in an R data frame (which can be used for analysis within R, or written to an external file, such as a .csv file).
This section describes how to install the package, and how to use it to construct a query of a DKAN-based data portal via the datastore API.
To install the package, run the following lines of code:
install.packages('devtools')
devtools::install_github('daltare/dkanTools')
library(dkanTools)
Alternatively, you can save the read_dkan.R file (in the R folder) to your computer, then run the following code (NOTE: this assumes the file is saved in your home directory; to save the file in a different location, replace ~ with the path to the location where you saved the file):
source('~/read_dkan.R')
The arguments to the function correspond to certain parameters of the DKAN datastore API. A description of the DKAN datastore API and its parameters is available here.
The possible arguments to the read_dkan()
function include:
base_URL
(optional): The base URL for the data portal (defaults to: https://data.ca.gov)resource_id
(required): An alphanumeric code representing a particular data resource (e.g., a731c980-9477-4ec7-bcfc-6d0cce00306c). It can be found on the data.ca.gov portal, by clicking on the Data API link within a data resource's Preview page (here's an example).filter_fields
(optional): A list of the fields that will be used as a filter. If filtering on multiple fields, enter the field names using thec()
function (e.g.,c(field1, field2)
)filter_values
(optional): This argument must be entered as a list, where each element of the list corresponds to a givenfilter_field
, and each element can have multiple items (e.g.,list(c('Element1_Item1', 'Element1_Item2'), c('Element2_Item1','Element2_Item2')
specifies a 2 element list with 2 items in each element, and each element corresponds to the respective field(s) entered to thefilter_fields
argument. Note that if a given filter value contains a comma (,
), the filter may not work, and thequery
field may need to be used instead.fields
(optional): This specifies the fields (i.e., columns) of the dataset to return. If left blank, all of the dataset's fields will be returned.query
(optional): A fulltext search across all fields (i.e, this returns all records where the query text is found in any field).sort_field
(optional): A field to sort (i.e. order) the records returned by the query, in either ascending or descending order.sort_direction
(optional): The method (i.e., direction) for sorting the givensort_field
, either ascending or descending. Enter 'asc' (i.e., ascending) or 'desc' (i.e., descending).max_records
(optional): The maximum number of records to return (if not entered, all records satisfyting the input conditions will be returned).
The following examples illustrate how to use the read_dkan()
to download data from the California Open Data Portal via the datastore API (in each example, the results are stored in an object called dkan_data
):
dkan_data <- read_dkan(resource_id = 'a731c980-9477-4ec7-bcfc-6d0cce00306c', filter_fields = c('PWSID', 'Stage_Invoked'), filter_values = list(c('CA3010037'), c('Stage 1')), fields = c('Supplier_Name', 'PWSID', '2013_Production_Reported', 'Stage_Invoked'))
The above example returns records from the dataset specificed by resource ID a731c980-9477-4ec7-bcfc-6d0cce00306c where the PWSID value is CA3010037 AND the Stage_Invoked value is Stage 1, then it returns only 4 colums of data ('Supplier_Name', 'PWSID', '2013_Production_Reported', and 'Stage_Invoked').
dkan_data <- read_dkan(resource_id = 'a731c980-9477-4ec7-bcfc-6d0cce00306c', query = 'American Canyon, City of', sort_field = 'Reporting_Month', sort_direction = 'asc')
The above example returns records from the dataset specificed by resource ID a731c980-9477-4ec7-bcfc-6d0cce00306c where the text American Canyon, City of is found within any field, and the records that are returned are sorted on the Reporting_Month field in ascending order.
The results from these examples would be returned to an R data frame stored in the variable called dkan_data
. To write this data to a text document called Output.csv in your home directory, use the R code below (you can change the location of the file by replacing the ~
with a new path, and you can change the filename by changing Output.csv to a different name):
write.csv(dkan_data, file = '~/Output.csv', row.names = FALSE)
An example application built with this function to access data via the datastore API is avaiable here.