block011_write-your-own-function-03.html

<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<meta charset="utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="pandoc" />


<title>Write your own R functions, part 3</title>

<script src="libs/jquery-1.11.0/jquery.min.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link href="libs/bootstrap-2.3.2/css/united.min.css" rel="stylesheet" />
<link href="libs/bootstrap-2.3.2/css/bootstrap-responsive.min.css" rel="stylesheet" />
<script src="libs/bootstrap-2.3.2/js/bootstrap.min.js"></script>

<style type="text/css">code{white-space: pre;}</style>
<link rel="stylesheet"
      href="libs/highlight/default.css"
      type="text/css" />
<script src="libs/highlight/highlight.js"></script>
<style type="text/css">
  pre:not([class]) {
    background-color: white;
  }
</style>
<script type="text/javascript">
if (window.hljs && document.readyState && document.readyState === "complete") {
   window.setTimeout(function() {
      hljs.initHighlighting();
   }, 0);
}
</script>


<link rel="stylesheet" href="libs/local/nav.css" type="text/css" />

</head>

<body>

<style type = "text/css">
.main-container {
  max-width: 940px;
  margin-left: auto;
  margin-right: auto;
}
</style>
<div class="container-fluid main-container">

<header>
  <div class="nav">
    <a class="nav-logo" href="index.html">
      <img src="static/img/stat545-logo-s.png" width="70px" height="70px"/>
    </a>
    <ul>
      <li class="home"><a href="index.html">Home</a></li>
      <li class="faq"><a href="faq.html">FAQ</a></li>
      <li class="syllabus"><a href="syllabus.html">Syllabus</a></li>
      <li class="topics"><a href="topics.html">Topics</a></li>
      <li class="people"><a href="people.html">People</a></li>
    </ul>
  </div>
</header>

<div id="header">
<h1 class="title">Write your own R functions, part 3</h1>
</div>

<div id="TOC">
<ul>
<li><a href="#where-were-we-where-are-we-going">Where were we? Where are we going?</a></li>
<li><a href="#load-the-gapminder-data">Load the Gapminder data</a></li>
<li><a href="#load-assertthat-and-our-qdiff-function">Load assertthat and our qdiff function</a></li>
<li><a href="#be-proactive-about-nas">Be proactive about <code>NA</code>s</a></li>
<li><a href="#the-useful-but-mysterious-...-argument">The useful but mysterious <code>...</code> argument</a></li>
<li><a href="#use-testthat-for-formal-unit-tests">Use <code>testthat</code> for formal unit tests</a></li>
<li><a href="#resources">Resources</a></li>
</ul>
</div>

<div id="where-were-we-where-are-we-going" class="section level3">
<h3>Where were we? Where are we going?</h3>
<p>In <a href="block011_write-your-own-function-02.html">part 2</a> we generalized our first R function so it could take the difference between any two quantiles of a numeric vector. We also set default values for the underlying probabilities, so that, by default, we compute the max minus the min.</p>
<p>In this part, we tackle <code>NA</code>s, the special argument <code>...</code> and formal testing.</p>
</div>
<div id="load-the-gapminder-data" class="section level3">
<h3>Load the Gapminder data</h3>
<p>As usual, load the Gapminder excerpt.</p>
<pre class="r"><code>gDat &lt;- read.delim(&quot;gapminderDataFiveYear.txt&quot;)
str(gDat)
## &#39;data.frame&#39;:    1704 obs. of  6 variables:
##  $ country  : Factor w/ 142 levels &quot;Afghanistan&quot;,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ pop      : num  8425333 9240934 10267083 11537966 13079460 ...
##  $ continent: Factor w/ 5 levels &quot;Africa&quot;,&quot;Americas&quot;,..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ gdpPercap: num  779 821 853 836 740 ...
## or do this if the file isn&#39;t lying around already
## gd_url &lt;- &quot;http://tiny.cc/gapminder&quot;
## gDat &lt;- read.delim(gd_url)</code></pre>
</div>
<div id="load-assertthat-and-our-qdiff-function" class="section level3">
<h3>Load assertthat and our qdiff function</h3>
<p>We’ll keep using <code>assert_that()</code> to check that <code>x</code> is numeric and we’ll want our previous function around as a baseline.</p>
<pre class="r"><code>library(assertthat)
qdiff4 &lt;- function(x, probs = c(0, 1)) {
  assert_that(is.numeric(x))
  the_quantiles &lt;- quantile(x, probs)
  return(max(the_quantiles) - min(the_quantiles))
}</code></pre>
</div>
<div id="be-proactive-about-nas" class="section level3">
<h3>Be proactive about <code>NA</code>s</h3>
<p>I am being gentle by letting you practice with the Gapminder data. In real life, you will be plagued by missing data. If you are lucky, it will be properly indicated by the special value <code>NA</code>. Many built-in R functions have an <code>na.rm =</code> argument through which you can specify how you want to handle <code>NA</code>s. Typically the default value is <code>na.rm = FALSE</code> and typical default behavior is to either let <code>NA</code>s propagate or to raise an error. Let’s see how <code>quantile()</code> handles <code>NA</code>s:</p>
<pre class="r"><code>z &lt;- gDat$lifeExp
z[3] &lt;- NA
quantile(gDat$lifeExp)
##      0%     25%     50%     75%    100% 
## 23.5990 48.1980 60.7125 70.8455 82.6030
quantile(z)
## Error: missing values and NaN&#39;s not allowed if &#39;na.rm&#39; is FALSE
quantile(z, na.rm = TRUE)
##     0%    25%    50%    75%   100% 
## 23.599 48.228 60.765 70.846 82.603</code></pre>
<p>So <code>quantile()</code> simply will not operate in the presence of <code>NA</code>s unless <code>na.rm = TRUE</code>. How shall we modify our function?</p>
<p>If we wanted to hardwire <code>na.rm = TRUE</code>, we could. Focus on our call to <code>quantile()</code> inside our function definition.</p>
<pre class="r"><code>qdiff5 &lt;- function(x, probs = c(0, 1)) {
  assert_that(is.numeric(x))
  the_quantiles &lt;- quantile(x, probs, na.rm = TRUE)
  return(max(the_quantiles) - min(the_quantiles))
}
qdiff5(gDat$lifeExp)
## [1] 59.004
qdiff5(z)
## [1] 59.004</code></pre>
<p>This works but it is dangerous to invert the default behavior of a well-known built-in function and to provide the user with no way to override this.</p>
<p>We could add an <code>na.rm =</code> argument to our own function. We might even enforce our preferred default – but at least we’re giving the user a way to control the behavior around <code>NA</code>s.</p>
<pre class="r"><code>qdiff6 &lt;- function(x, probs = c(0, 1), na.rm = TRUE) {
  assert_that(is.numeric(x))
  the_quantiles &lt;- quantile(x, probs, na.rm = na.rm)
  return(max(the_quantiles) - min(the_quantiles))
}
qdiff6(gDat$lifeExp)
## [1] 59.004
qdiff6(z)
## [1] 59.004
qdiff6(z, na.rm = FALSE)
## Error: missing values and NaN&#39;s not allowed if &#39;na.rm&#39; is FALSE</code></pre>
</div>
<div id="the-useful-but-mysterious-...-argument" class="section level3">
<h3>The useful but mysterious <code>...</code> argument</h3>
<p>You probably could have lived a long and happy life without knowing there are at least 9 different algorithms for computing quantiles. <a href="http://www.rdocumentation.org/packages/stats/functions/quantile">Go read about the <code>type</code> argument</a> of <code>quantile()</code>. TLDR: If a quantile is not unambiguously equal to an observed data point, you must somehow average two data points. You can weight this average different ways, depending on the rest of the data, and <code>type =</code> controls this.</p>
<p>Let’s say we want to give the user of our function the ability to specify how the quantiles are computed, but we want to accomplish with as little fuss as possible. In fact, we don’t even want to clutter our function’s interface with this! This calls for the very special <code>...</code> argument.</p>
<pre class="r"><code>qdiff7 &lt;- function(x, probs = c(0, 1), na.rm = TRUE, ...) {
  the_quantiles &lt;- quantile(x = x, probs = probs, na.rm = na.rm, ...)
  return(max(the_quantiles) - min(the_quantiles))
}</code></pre>
<p>The practical significance of the <code>type =</code> argument is virtually nonexistent, so we can’t demo with the Gapminder data. Thanks to <a href="https://twitter.com/wrathematics">@wrathematics</a>, here’s a small example where we can (barely) detect a difference due to <code>type</code>.</p>
<pre class="r"><code>set.seed(1234)
z &lt;- rnorm(10)
quantile(z, type = 1)
##         0%        25%        50%        75%       100% 
## -2.3456977 -0.8900378 -0.5644520  0.4291247  1.0844412
quantile(z, type = 4)
##        0%       25%       50%       75%      100% 
## -2.345698 -1.048552 -0.564452  0.353277  1.084441
all.equal(quantile(z, type = 1), quantile(z, type = 4))
## [1] &quot;Mean relative difference: 0.1776594&quot;</code></pre>
<p>Now we can call our function, requesting that quantiles be computed in different ways.</p>
<pre class="r"><code>qdiff7(z, probs = c(0.25, 0.75), type = 1)
## [1] 1.319163
qdiff7(z, probs = c(0.25, 0.75), type = 4)
## [1] 1.401829</code></pre>
<p>While the difference may be subtle, <strong>it’s there</strong>. Marvel at the fact that we have passed <code>type = 1</code> through to <code>quantile()</code> <em>even though it was not a formal argument of our own function</em>.</p>
<p>The special argument <code>...</code> is very useful when you want the ability to pass arbitrary arguments down to another function, but without constantly expanding the formal arguments to your function. This leaves you with a less cluttered function definition and gives you future flexibility to specify these arguments only when you need to.</p>
<p>You will also encounter the <code>...</code> argument in many built-in functions – read up <a href="http://www.rdocumentation.org/packages/base/functions/c">on <code>c()</code></a> or <a href="http://www.rdocumentation.org/packages/base/functions/list"><code>list()</code></a> – and now you have a better sense of what it means. It is not a breezy “and so on and so forth.”</p>
</div>
<div id="use-testthat-for-formal-unit-tests" class="section level3">
<h3>Use <code>testthat</code> for formal unit tests</h3>
<p>Until now, we’ve relied on informal tests of our evolving function. If you are going to use a function alot, especially if it is part of a package, it is wise to use formal unit tests.</p>
<p>The <a href="https://github.com/hadley/testthat"><code>testthat</code> package</a> provides excellent facilities for this, with a distinct emphasis on automated unit testing of entire packages. However, we can take it out for a test drive even with our one measly function.</p>
<p>We will construct a test with <code>test_that()</code> and, within it, we put one or more <em>expectations</em> that check actual against expected results. You simply harden your informal, interactive tests into formal unit tests. Here are some examples of tests and indicative expectations.</p>
<pre class="r"><code>library(testthat)
test_that(&#39;invalid args are detected&#39;, {
  expect_error(qdiff7(&quot;eggplants are purple&quot;))
  expect_error(qdiff7(iris))
  })
test_that(&#39;NA handling works&#39;, {
  expect_error(qdiff7(c(1:5, NA), na.rm = FALSE))
  expect_equal(qdiff7(c(1:5, NA)), 4)
})</code></pre>
<p>No news is good news! Let’s see what test failure would look like. Let’s revert to a version of our function that does no <code>NA</code> handling, then test for proper <code>NA</code> handling. We can watch it fail.</p>
<pre class="r"><code>qdiff_no_NA &lt;- function(x, probs = c(0, 1)) {
  the_quantiles &lt;- quantile(x = x, probs = probs)
  return(max(the_quantiles) - min(the_quantiles))
}
test_that(&#39;NA handling works&#39;, {
  expect_that(qdiff_no_NA(c(1:5, NA)), equals(4))
})
## Error: Test failed: &#39;NA handling works&#39;
## Not expected: missing values and NaN&#39;s not allowed if &#39;na.rm&#39; is FALSE
## 1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls)
## 2: eval(code, new_test_environment)
## 3: eval(expr, envir, enclos)
## 4: expect_that(qdiff_no_NA(c(1:5, NA)), equals(4)) at &lt;text&gt;:6
## 5: condition(object)
## 6: compare(expected, actual, ...)
## 7: compare.default(expected, actual, ...)
## 8: all.equal(x, y, ...)
## 9: all.equal.numeric(x, y, ...)
## 10: attr.all.equal(target, current, tolerance = tolerance, scale = scale, ...)
## 11: mode(current)
## 12: qdiff_no_NA(c(1:5, NA))
## 13: quantile(x = x, probs = probs) at &lt;text&gt;:2
## 14: quantile.default(x = x, probs = probs)
## 15: stop(&quot;missing values and NaN&#39;s not allowed if &#39;na.rm&#39; is FALSE&quot;).</code></pre>
<p>Similar to the advice to use <code>assertthat</code> in data analytical scripts, I recommend you use <code>testthat</code> to monitor the behavior of functions you (or others) will use often. If your tests cover the function’s important behavior, then you can edit the internals freely. You’ll rest easy in the knowledge that, if you broke anything important, the tests will fail and alert you to the problem.</p>
<!--
### Return to `dplyr` SKIP THIS IN FAVOR OF PLYR


```r
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
gtbl <- gDat %>% tbl_df
by_continent <- gtbl %>%
  group_by(continent)
by_continent %>% do(data.frame(max(.$lifeExp)))
## Source: local data frame [5 x 2]
## Groups: continent
## 
##   continent max...lifeExp.
## 1    Africa         76.442
## 2  Americas         80.653
## 3      Asia         82.603
## 4    Europe         81.757
## 5   Oceania         81.235
```

### other content

match.arg()

defaulting to NULL then checking is.null() and take it from there

-->

</div>
<div id="resources" class="section level3">
<h3>Resources</h3>
<p>Packages</p>
<ul>
<li><a href="https://github.com/hadley/assertthat"><code>assertthat</code> package</a></li>
<li><a href="https://github.com/smbache/ensurer"><code>ensurer</code> package</a></li>
<li><a href="https://github.com/hadley/testthat"><code>testthat</code> package</a></li>
</ul>
<p>Hadley Wickham’s forthcoming book <a href="http://adv-r.had.co.nz">Advanced R</a></p>
<ul>
<li>Section on <a href="http://adv-r.had.co.nz/Exceptions-Debugging.html#defensive-programming">defensive programming</a></li>
</ul>
<p>Hadley Wickham’s forthcoming book <a href="http://r-pkgs.had.co.nz">R packages</a></p>
<ul>
<li><a href="http://r-pkgs.had.co.nz/tests.html">Testing chapter</a></li>
</ul>
</div>

<div class="footer">
This work is licensed under the  <a href="http://creativecommons.org/licenses/by-nc/3.0/">CC BY-NC 3.0 Creative Commons License</a>.
</div>

</div>

<script>

// add bootstrap table styles to pandoc tables
$(document).ready(function () {
  $('tr.header').parent('thead').parent('table').addClass('table table-condensed');
});

</script>

<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
  (function () {
    var script = document.createElement("script");
    script.type = "text/javascript";
    script.src  = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
    document.getElementsByTagName("head")[0].appendChild(script);
  })();
</script>

</body>
</html>