Skip to content

Commit

Permalink
minor text updates + typos
Browse files Browse the repository at this point in the history
  • Loading branch information
hdolinh committed Sep 14, 2023
1 parent 552fc12 commit 5c16563
Showing 1 changed file with 14 additions and 15 deletions.
29 changes: 14 additions & 15 deletions materials/sections/clean-wrangle-data.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,6 @@ First, open a new Quarto document. Delete everything below the setup chunk, and

```{r}
#| message: false
#| warning: false
library(dplyr)
library(tidyr)
Expand All @@ -119,7 +118,7 @@ library(readr)

## A note on loading packages

You may have noticed the following warning messages pop up when you ran your library chunk.
You may have noticed the following messages pop up when you ran your library chunk.

```
Attaching package: ‘dplyr’
Expand All @@ -133,7 +132,7 @@ The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
```

These are important warnings. They are letting you know that certain functions from the `stats` and `base` packages (which are loaded by default when you start R) are masked by *different functions* with the same name in the `dplyr` package. It turns out, the order that you load the packages in matters. Since we loaded `dplyr` after `stats`, R will assume that if you call `filter()`, you mean the `dplyr` version unless you specify otherwise.
These are important messages. They are letting you know that certain functions from the `stats` and `base` packages (which are loaded by default when you start R) are masked by *different functions* with the same name in the `dplyr` package. It turns out, the order that you load the packages in matters. Since we loaded `dplyr` after `stats`, R will assume that if you call `filter()`, you mean the `dplyr` version unless you specify otherwise.

Being specific about which version of `filter()`, for example, you call is easy.
To explicitly call a function by its unambiguous name, we use the syntax `package_name::function_name(...)`.
Expand All @@ -145,7 +144,7 @@ So, if we wanted to call the `stats` version of `filter()` in this Rmarkdown doc
::: callout-note
## Note

Warnings are important, but we might not want them in our final document. After you have read the packages in, **adjust the chunk settings in your library chunk** to suppress warnings and messages by adding `#| warning: false`.
Messages and warnings are important, but we might not want them in our final document. After you have read the packages in, **adjust the chunk settings in your library chunk** to suppress warnings and messages by adding `#| message: false` or `#| warning: false`. Both of these chunk options, when set to false, prevents messages or warnings from appearing in the rendered file.
:::

Now that we have introduced some data wrangling libraries, let's get the data that we are going to use for this lesson.
Expand Down Expand Up @@ -194,7 +193,7 @@ Before we get too much further, spend a minute or two outlining your RMarkdown d
:::

## Data exploration
Similar to what we did in our [Intro to Quarto](https://learning.nceas.ucsb.edu/2023-04-coreR/session_03.html) lesson, it is good practice to skim through the data you just read in.
Similar to what we did in our [Intro to Literate Analysis](https://learning.nceas.ucsb.edu/2023-09-ucsb-faculty/session_06.html) lesson, it is good practice to skim through the data you just read in.
Doing so is important to make sure the data is read as you were expecting and to familiarize yourself with the data.

Some of the basic ways to explore your data are:
Expand Down Expand Up @@ -575,10 +574,10 @@ chinook_see <- catch_long %>%

## Sorting your data using `arrange()`

The `arrange()` function is used to sort the rows of a `data.frame`. Two common case to use `arrange()` are:
The `arrange()` function is used to sort the rows of a `data.frame`. Two common cases to use `arrange()` are:

- To calculate a cumulative sum (with `cumsum()`) so row order matters
- To display a table (like in an `.Rmd` document) in sorted order
- To display a table (like in an `.qmd` document) in sorted order

Let's re-calculate mean catch by region, and then `arrange()` the output by mean catch:

Expand All @@ -605,18 +604,18 @@ head(mean_region)

## Splitting a column using `separate()` and `unite()`

The `separate()` function allow us to easily split a single column into numerous. Its complement, the `unite()` function, allows ys to combine multiple columns into a single one.
The `separate()` function allow us to easily split a single column into numerous. Its complement, the `unite()` function, allows us to combine multiple columns into a single one.

This can come in really handy when we need to split a column into two pieces by a consistent separator (like a dash).

Let's make a new `data.frame` with fake data to illustrate this. Here we have a set of site identification codes with information about the island where the site is (the first 3 letters) and a site number (the 3 numbers). If we want to group and summarize by island, we need a column with just the island information.

```{r}
sites_df <- data.frame(site = c("HAW-101",
"HAW-103",
"OAH-320",
"OAH-219",
"MAI-039"))
"HAW-103",
"OAH-320",
"OAH-219",
"MAU-039"))
sites_df %>%
separate(site, c("island", "site_number"), "-")
Expand Down Expand Up @@ -693,13 +692,13 @@ head(mean_region)
```


We have completed our lesson on Cleaning and Wrangling data. Before we break, let's practice our Github workflow.
We have completed our lesson on Cleaning and Wrangling data. Before we break, let's practice our Git workflow.

::: callout-tip
## Steps

1. Save the `.Rmd` you have been working on for this lesson.
2. Knit the R Markdown file. This is a way to test everything in your code is working.
1. Save the `.qmd` you have been working on for this lesson.
2. Render the Quarto file. This is a way to test everything in your code is working.
3. ```Stage > Commit > Pull > Push```
:::

Expand Down

0 comments on commit 5c16563

Please sign in to comment.