forked from datacarpentry/R-dplyr-ecology-archived
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path02-working-with-data.Rmd
78 lines (62 loc) · 2.71 KB
/
02-working-with-data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
layout: topic
title: Starting with data
minutes: 20
---
```{r, echo=FALSE, purl=FALSE}
knitr::opts_chunk$set(results='hide', fig.path='img/r-lesson-')
```
> ## Learning Objectives
>
> * download the mammals data from Figshare (CSV files) and load the data in memory
> using the survey table (`surveys.csv`) as an example
> * explore the structure and the content of the data in R
> * understand what are factors and how to manipulate them
# Presentation of the Survey Data
```{r, echo=FALSE, purl=TRUE}
# Presentation of the survey data
```
We are studying the species and weight and hindfoot length of animals caught in plots in our study
area. The dataset is stored as a `.csv` files in an (online repository)[http://figshare.com/articles/Portal_Project_Teaching_Database/1314459]: each row
holds information for a single animal, and the columns represent `record_id`
`month`, `day`, `year`, `plot_id`, `species_id` (a 2 letter code, see the
`species.csv` file for correspondance), `sex` ("M" for males and "F" for
females), `hindfoot_length` (in millimeters), and `weight` (in grams).
This is an example of what the survey dataset looks like:
"63","8","19","1977","3","DM","M","35","40"
"64","8","19","1977","7","DM","M","37","48"
"65","8","19","1977","4","DM","F","34","29"
"66","8","19","1977","4","DM","F","35","46"
"67","8","19","1977","7","DM","M","35","36"
To load our survey data, we need to locate the `surveys.csv` file. We will use
`read.csv()` to load into memory (as a `data.frame`) the content of the CSV
file.
* download [the data file`"surveys.csv"`](http://files.figshare.com/1919744/surveys.csv) from Figshare
* put it in directory `"data"` within your working directory for these exercises
```{r download-data, echo=FALSE, purl=TRUE}
if (!file.exists("data/")) {
dir.create("data")
}
if (!file.exists("data/surveys.csv")) {
download.file("http://files.figshare.com/1919744/surveys.csv",
"data/surveys.csv")
}
if (!file.exists("data/species.csv")) {
download.file("http://files.figshare.com/1919741/species.csv",
"data/species.csv")
}
```
Next, read the `surveys.csv` into memory using `read.csv()`:
```{r demo-data-load, eval=TRUE, purl=FALSE}
surveys <- read.csv('data/surveys.csv')
```
__At this point, make sure all participants have the data loaded__
This statement doesn't produce any output because assignment doesn't display
anything. If we want to check that our data has been loaded, we can print the
variable's value: `surveys`
Wow... that was a lot of output. At least it means the data loaded
properly. Let's check the top (the first 6 lines) of this `data.frame` using the
function `head()`:
```{r, results='show', purl=FALSE}
head(surveys)
```