-
Notifications
You must be signed in to change notification settings - Fork 43
/
Copy path16.Rmd
1431 lines (1127 loc) · 65 KB
/
16.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
```{r, echo = FALSE, cachse = FALSE}
knitr::opts_chunk$set(fig.retina = 2.5)
knitr::opts_chunk$set(fig.align = "center")
options(width = 100)
```
# Metric-Predicted Variable on One or Two Groups
> In the context of the generalized linear model (GLM) introduced in the previous chapter, this chapter's situation involves the most trivial cases of the linear core of the GLM, as indicated in the left cells of Table 15.1 (p. 434), with a link function that is the identity along with a normal distribution for describing noise in the data, as indicated in the first row of Table 15.2 (p. 443). We will explore options for the prior distribution on parameters of the normal distribution, and methods for Bayesian estimation of the parameters. We will also consider alternative noise distributions for describing data that have outliers. [@kruschkeDoingBayesianData2015, pp. 449--450]
Although I agree this chapter covers the "most trivial cases of the linear core of the GLM", Kruschke's underselling himself a bit, here. In addition to "trivial" Gaussian models, Kruschke went well beyond and introduced robust Student's $t$ modeling. It's a testament to Kruschke's rigorous approach that he did so so early in the text. IMO, we could use more robust Student's $t$ models in the social sciences. So heed well, friends.
## Estimating the mean and standard deviation of a normal distribution
The Gaussian probability density function follows the form
$$p(y | \mu, \sigma) = \frac{1}{\sigma \sqrt{2 \pi}} \exp \left (-\frac{1}{2} \frac{(y - \mu)^2}{\sigma^2} \right ),$$
where the two parameters to estimate are $\mu$ (i.e., the mean) and $\sigma$ (i.e., the standard deviation). If you prefer to think in terms of $\sigma^2$, that's the variance. In case is wasn't clear, $\pi$ is the actual number $\pi$, not a parameter to be estimated.
We'll divide Figure 16.1 into data and plot steps. I came up with the primary data like so:
```{r, warning = F, message = F}
library(tidyverse)
sequence_length <- 100
d <-
crossing(y = seq(from = 50, to = 150, length.out = sequence_length),
mu = c(87.8, 100, 112),
sigma = c(7.35, 12.2, 18.4)) %>%
mutate(density = dnorm(y, mean = mu, sd = sigma),
mu = factor(mu, labels = str_c("mu==", c(87.8, 100, 112))),
sigma = factor(sigma, labels = str_c("sigma==", c(7.35, 12.2, 18.4))))
head(d)
```
Instead of putting the coordinates for the three data points in our `d` tibble, I just threw them into their own tibble in the `geom_point()` function.
Okay, let's talk color and theme. For this chapter, we'll take our color palette from the [**beyonce** package](https://github.com/dill/beyonce) [@R-beyonce]. As one might guess, the **beyonce** package provides an array of palettes based on pictures of [Beyoncé](https://www.beyonce.com/). The origins of the palettes come from [https://beyoncepalettes.tumblr.com/](https://beyoncepalettes.tumblr.com/). Our palette will be #126.
```{r, warning = F, message = F, fig.width = 3, fig.height = 1}
library(beyonce)
beyonce_palette(126)
bp <- beyonce_palette(126)[]
bp
```
Our overall theme will be based on the default `ggplot2::theme_grey()`.
```{r}
theme_set(
theme_grey() +
theme(text = element_text(color = "white"),
axis.text = element_text(color = beyonce_palette(126)[5]),
axis.ticks = element_line(color = beyonce_palette(126)[5]),
legend.background = element_blank(),
legend.box.background = element_rect(fill = beyonce_palette(126)[5],
color = "transparent"),
legend.key = element_rect(fill = beyonce_palette(126)[5],
color = "transparent"),
legend.text = element_text(color = beyonce_palette(126)[1]),
legend.title = element_text(color = beyonce_palette(126)[1]),
panel.background = element_rect(fill = beyonce_palette(126)[5],
color = beyonce_palette(126)[5]),
panel.grid = element_blank(),
plot.background = element_rect(fill = beyonce_palette(126)[1],
color = beyonce_palette(126)[1]),
strip.background = element_rect(fill = beyonce_palette(126)[4]),
strip.text = element_text(color = beyonce_palette(126)[1]))
)
```
Here's Figure 16.1.
```{r}
d %>%
ggplot(aes(x = y)) +
geom_area(aes(y = density),
fill = bp[2]) +
geom_vline(xintercept = c(85, 100, 115),
linetype = 3, color = bp[5]) +
geom_point(data = tibble(y = c(85, 100, 115)),
aes(y = 0.002),
size = 2, color = bp[6]) +
scale_y_continuous(expression(italic(p)(italic(y)*"|"*mu*", "*sigma)),
expand = expansion(mult = c(0, 0.05)), breaks = NULL) +
ggtitle("Competing Gaussian likelihoods given the same data") +
coord_cartesian(xlim = c(60, 140)) +
facet_grid(sigma ~ mu, labeller = label_parsed)
```
Here's how you might compute the $p(D | \mu, \sigma)$ values and identify which combination of $\mu$ and $\sigma$ returns the maximum value for the data set.
```{r, message = F, warning = F}
crossing(y = c(85, 100, 115),
mu = c(87.8, 100, 112),
sigma = c(7.35, 12.2, 18.4)) %>%
mutate(density = dnorm(y, mean = mu, sd = sigma)) %>%
group_by(mu, sigma) %>%
summarise(`p(D|mu, sigma)` = prod(density)) %>%
ungroup() %>%
mutate(maximum = `p(D|mu, sigma)` == max(`p(D|mu, sigma)`))
```
### ~~Solution by mathematical analysis~~ Heads up on precision.
Not much for us, here. But we might reiterate that sometimes we talk about the *precision* (see page 453), which is the reciprocal of the variance (i.e., $\frac{1}{\sigma^2}$). As we'll see, the **brms** package doesn't use priors parameterized in terms of precision. But JAGS does, which means we'll need to be able to translate Kruschke's precision-laden JAGS code into $\sigma$-oriented **brms** code in many of the remaining chapters. Proceed with caution.
### Approximation by ~~MCMC in JAGS~~ HMC in brms.
Let's load and `glimpse()` at the `TwoGroupIQ.csv` data.
```{r, message = F}
my_data <- read_csv("data.R/TwoGroupIQ.csv")
glimpse(my_data)
```
The data file included values from two groups.
```{r}
my_data %>%
distinct(Group)
```
Kruschke clarified that for the following example, "the data are IQ (intelligence quotient) scores from a group of people who have consumed a 'smart drug'" (p. 456). That means we'll want to subset the data.
```{r}
my_data <-
my_data %>%
filter(Group == "Smart Drug")
```
It's a good idea to take a look at the data before modeling.
```{r, fig.width = 4, fig.height = 3}
my_data %>%
ggplot(aes(x = Score)) +
geom_histogram(fill = bp[2], binwidth = 5) +
scale_y_continuous(expand = expansion(mult = c(0, 0.05))) +
ggtitle("The smart-drug group")
```
Here are the mean and standard deviation of the `Score` values.
```{r}
(mean_y <- mean(my_data$Score))
(sd_y <- sd(my_data$Score))
```
The values of those sample statistics will come in handy in just a bit. But first, let's load **brms**.
```{r, message = F, warning = F}
library(brms)
```
If we want to pass user-defined values into our `brm()` prior code, we'll need to define them first in using the `brms::stanvar()` function.
```{r}
stanvars <-
stanvar(mean_y, name = "mean_y") +
stanvar(sd_y, name = "sd_y")
```
It's been a while since we used `stanvar()`, so we should review. Though we've saved the object as `stanvars`, you could name it whatever you want. The main trick is to tell **brms** about your values with the `stanvars` argument within `brm()`.
Kruschke mentioned that the "the conventional noncommittal gamma prior [for the precision] has shape and rate constants that are close to zero, such as Sh = 0.01 and R = 0.01" (p. 456). Here's what that looks like.
```{r, fig.width = 6, fig.height = 3, warning = F}
tibble(x = seq(from = 0, to = 12, by = 0.025)) %>%
mutate(d = dgamma(x, shape = 0.01, rate = 0.01)) %>%
ggplot(aes(x = x, y = d)) +
geom_area(fill = bp[3]) +
geom_vline(xintercept = 1 / sd_y^2, linetype = 2, color = bp[5]) +
labs(subtitle = "The grey density in the background is the conventional gamma prior for precision.\nThe dashed vertical line is our precision value.") +
scale_x_continuous(expand = expansion(mult = c(0, 0.05))) +
scale_y_continuous(expand = expansion(mult = c(0, 0.05)), breaks = NULL) +
coord_cartesian(xlim = c(0, 8),
ylim = c(0, 0.35))
```
The thing is, with **brms** we typically estimate $\sigma$ rather than precision. Though gamma is also a feasible prior distribution for $\sigma$, we won't use it here. But we won't be using Kruschke's uniform prior, either. The Stan team [discourages uniform priors for variance parameters](https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations), such as our $\sigma$. I'm not going to get into the details of why, but you've got that hyperlink above and the good old [Stan user's guide](https://mc-stan.org/docs/2_29/stan-users-guide/index.html) [@standevelopmentteamStanUserGuide2022] if you'd like to dive deeper.
Here we'll use the half normal. By "half normal," we mean that the mean is zero and it's bounded from zero to positive infinity--no negative $\sigma$ values for us! By the "half normal," we also mean to suggest that smaller values are more credible than those approaching infinity. When working with unstandardized data, an easy default for a weakly-regularizing half normal is to set the $\sigma$ hyperparameter (i.e., *S*) to the standard deviation of the criterion variable (i.e., $s_Y$). Here's that that looks like for this example.
```{r, fig.width = 6, fig.height = 3}
tibble(x = seq(from = 0, to = 110, by = 0.1)) %>%
mutate(d = dnorm(x, mean = 0, sd = sd_y)) %>%
ggplot(aes(x = x, y = d)) +
geom_area(fill = bp[3]) +
geom_vline(xintercept = sd_y, linetype = 2, color = bp[5]) +
scale_x_continuous(expand = expansion(mult = c(0, 0.05))) +
scale_y_continuous(expand = expansion(mult = c(0, 0.05)), breaks = NULL) +
labs(subtitle = "The gray density in the background is the half-normal prior for sigma.\nThe dashed vertical line is our 'sd_y' value.") +
coord_cartesian(xlim = c(0, 100))
```
This prior isn't quite as non-committal as the conventional gamma prior for precision. However, it will discourage the HMC algorithm from exploring $\sigma$ values much larger than two or three times the standard deviation in the data themselves. In practice, I've found it to have a minimal influence on the posterior. If you'd like to make it even less committal, try setting that $\sigma$ hyperparameter to some multiple of $s_Y$ like $2 \times s_Y$ or $10 \times s_Y$. Compare this to Kruschke's recommendations for setting a noncommittal uniform prior for $\sigma$. When using the uniform distribution, $\operatorname{Uniform}(L, H)$,
> we will set the high value $H$ of the uniform prior on $\sigma$ to a huge multiple of the standard deviation in the data, and set the low value $L$ to a tiny fraction of the standard deviation in the data. Again, this means that the prior is vague no matter what the scale of the data happens to be. (p. 455)
On page 456, Kruschke gave an example of such a uniform prior with the code snip `dunif( sdY/1000 , sdY*1000 )`. Here's what that would look like with our data.
```{r, fig.width = 6, fig.height = 3}
tibble(x = 0:(sd_y * 1000)) %>%
mutate(d = dunif(x, min = sd_y / 1000, max = sd_y * 1000)) %>%
ggplot(aes(x = x, y = d)) +
geom_area(fill = bp[3]) +
geom_vline(xintercept = sd_y, linetype = 2, color = bp[5]) +
scale_x_continuous(expand = expansion(mult = c(0, 0.05))) +
scale_y_continuous(expand = expansion(mult = c(0, 0.05)), breaks = NULL) +
labs(subtitle = "The gray density in the background is Kruschke's uniform prior for sigma.\nThe dashed vertical line is our 'sd_y' value.") +
coord_cartesian()
```
That's really noncommittal. I'll stick with my half normal. You do you. Kruschke had this to say about the prior for the mean:
> In this application we seek broad priors relative to typical data, so that the priors have minimal influence on the posterior. One way to discover the constants is by asking an expert in the domain being studied. But in lieu of that, we will use the data themselves to tell us what the typical scale of the data is. We will set $M$ to the mean of the data, and set $S$ to a huge multiple (e.g., $100$) of the standard deviation of the data. This way, no matter what the scale of the data is, the prior will be vague. (p. 455)
In case you're not following along closely in the text, we often use the normal distribution for the intercept and slope parameters in a simple regression model. By $M$ and $S$, Kruschke was referring to the $\mu$ and $\sigma$ parameters of the normal prior for our intercept. Here's what that prior looks like in this data example.
```{r, fig.width = 6, fig.height = 3}
tibble(x = seq(from = -10000, to = 10000, by = 10)) %>%
mutate(d = dnorm(x, mean = mean_y, sd = sd_y * 100)) %>%
ggplot(aes(x = x, y = d)) +
geom_area(fill = bp[3]) +
geom_vline(xintercept = mean_y, linetype = 2, color = bp[5]) +
scale_y_continuous(expand = expansion(mult = c(0, 0.05)), breaks = NULL) +
labs(subtitle = "The brown density in the background is the normal prior for mu.\nThe dashed vertical line is our 'mean_y' value.")
```
Yep, Kruschke's right. That is one noncommittal prior given our data. We could tighten that up by an order of magnitude and still have little influence on the posterior. Now we've decided on our parameterization ($\sigma$, not $\tau$) and our priors (half-normal, not uniform or gamma), we are ready to make our version of the model diagram in Figure 16.2.
```{r, fig.width = 3.9, fig.height = 3.5, message = F}
library(patchwork)
# normal density
p1 <-
tibble(x = seq(from = -3, to = 3, by = .1)) %>%
ggplot(aes(x = x, y = (dnorm(x)) / max(dnorm(x)))) +
geom_area(fill = bp[6]) +
annotate(geom = "text",
x = 0, y = .2,
label = "normal",
size = 7, color = bp[5]) +
annotate(geom = "text",
x = c(0, 1.5), y = .6,
label = c("italic(M)", "italic(S)[mu]"),
size = 7, color = bp[5], family = "Times", parse = T) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(linewidth = 0.5, color = bp[4]))
# half-normal density
p2 <-
tibble(x = seq(from = 0, to = 3, by = .01),
d = (dnorm(x)) / max(dnorm(x))) %>%
ggplot(aes(x = x, y = d)) +
geom_area(fill = bp[6]) +
annotate(geom = "text",
x = 1.5, y = .2,
label = "half-normal",
size = 7, color = bp[5]) +
annotate(geom = "text",
x = 1.5, y = .6,
label = "0*','*~italic(S)[sigma]",
size = 7, color = bp[5], family = "Times", parse = T) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(linewidth = 0.5, color = bp[4]))
## two annotated arrows
# save our custom arrow settings
my_arrow <- arrow(angle = 20, length = unit(0.35, "cm"), type = "closed")
p3 <-
tibble(x = c(.43, 1.5),
y = c(1, 1),
xend = c(.43, .8),
yend = c(.2, .2)) %>%
ggplot(aes(x = x, xend = xend,
y = y, yend = yend)) +
geom_segment(arrow = my_arrow, color = bp[4]) +
annotate(geom = "text",
x = c(.3, 1), y = .6,
label = "'~'",
size = 10, color = bp[5], family = "Times", parse = T) +
xlim(0, 2) +
theme_void()
# a second normal density
p4 <-
tibble(x = seq(from = -3, to = 3, by = .1)) %>%
ggplot(aes(x = x, y = (dnorm(x)) / max(dnorm(x)))) +
geom_area(fill = bp[6]) +
annotate(geom = "text",
x = 0, y = .2,
label = "normal",
size = 7, color = bp[5]) +
annotate(geom = "text",
x = c(0, 1.5), y = .6,
label = c("mu", "sigma"),
size = 7, color = bp[5], family = "Times", parse = T) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(linewidth = 0.5, color = bp[4]))
# the final annotated arrow
p5 <-
tibble(x = c(.375, .625),
y = c(1/3, 1/3),
label = c("'~'", "italic(i)")) %>%
ggplot(aes(x = x, y = y, label = label)) +
geom_text(size = c(10, 7), color = bp[5], parse = T, family = "Times") +
geom_segment(x = .5, xend = .5,
y = 1, yend = 0,
color = bp[4], arrow = my_arrow) +
xlim(0, 1) +
theme_void()
# some text
p6 <-
tibble(x = .5,
y = .5,
label = "italic(y[i])") %>%
ggplot(aes(x = x, y = y, label = label)) +
geom_text(size = 7, color = bp[5], parse = T, family = "Times") +
xlim(0, 1) +
theme_void()
# define the layout
layout <- c(
area(t = 1, b = 2, l = 1, r = 2),
area(t = 1, b = 2, l = 3, r = 4),
area(t = 4, b = 5, l = 1, r = 2),
area(t = 3, b = 4, l = 1, r = 4),
area(t = 6, b = 6, l = 1, r = 2),
area(t = 7, b = 7, l = 1, r = 2)
)
# combine and plot!
(p1 + p2 + p4 + p3 + p5 + p6) +
plot_layout(design = layout) &
ylim(0, 1) &
theme(plot.margin = margin(0, 5.5, 0, 5.5))
```
Two things about the notation in our diagram: Because we have two $\sigma$ hyperparameters, we've denoted the one for the prior on $\mu$ as $S_\mu$ and the one for the prior on $\sigma$ as $S_\sigma$. Also, note that we fixed the $\mu$ hyperparameter for half-normal prior to zero. This won't always be the case, but it's so common within the **brms** ecosystem that I'm going to express it this way throughout most of this ebook. This is our default.
Here's how to put these priors to use with **brms**.
```{r fit16.1}
fit16.1 <-
brm(data = my_data,
family = gaussian,
Score ~ 1,
prior = c(prior(normal(mean_y, sd_y * 100), class = Intercept),
prior(normal(0, sd_y), class = sigma)),
chains = 4, cores = 4,
stanvars = stanvars,
seed = 16,
file = "fits/fit16.01")
```
To be more explicit, the `stanvars = stanvars` argument at the bottom of our code is what allowed us to define our intercept prior as `normal(mean_y, sd_y * 100)` instead of requiring us to type in the parameters as `normal(107.8413, 25.4452 * 100)`. The same basic point goes for our $\sigma$ prior.
Also, notice our prior code for $\sigma$, `prior(normal(0, sd_y), class = sigma)`. Nowhere in there did we actually say we wanted a half normal as opposed to a typical normal. That's because the **brms** default is to set the lower bound for priors of `class = sigma` to zero. There's no need for us to fiddle with it.
Let's examine the chains.
```{r, fig.width = 8, fig.height = 2.5}
plot(fit16.1, widths = c(2, 3))
```
They look good! The model summary looks sensible, too.
```{r}
print(fit16.1)
```
Compare those values with `mean_y` and `sd_y`.
```{r}
mean_y
sd_y
```
Good times. Let's extract the posterior draws and save them in a data frame `draws`.
```{r}
draws <- as_draws_df(fit16.1)
```
Here's the leg work required to make our version of the three histograms in Figure 16.3.
```{r, fig.width = 8, fig.height = 2, message = F, warning = F}
# we'll need this for `stat_pointinterval()`
library(tidybayes)
# we'll use this to mark off the ROPEs as white strips in the background
rope <-
tibble(name = c("Mean", "Standard Deviation", "Effect Size"),
xmin = c(99, 14, -0.1),
xmax = c(101, 16, 0.1))
# annotate the ROPE
text <-
tibble(x = 0,
y = 0.98,
label = "ROPE",
name = "Effect Size")
# here are the primary data
draws %>%
transmute(Mean = b_Intercept,
`Standard Deviation` = sigma) %>%
mutate(`Effect Size` = (Mean - 100) / `Standard Deviation`) %>%
pivot_longer(everything()) %>%
# the plot
ggplot() +
geom_rect(data = rope,
aes(xmin = xmin, xmax = xmax,
ymin = -Inf, ymax = Inf),
color = "transparent", fill = "white") +
stat_histinterval(aes(x = value, y = 0),
point_interval = mode_hdi, .width = .95,
fill = bp[2], color = bp[6],
normalize = "panels") +
geom_text(data = text,
aes(x = x, y = y, label = label),
size = 2.5, color = bp[4]) +
scale_y_continuous(NULL, breaks = NULL) +
xlab(NULL) +
facet_wrap(~ name, scales = "free", ncol = 3)
```
If we wanted those exact 95% HDIs, we'd execute this.
```{r, warning = F}
draws %>%
transmute(Mean = b_Intercept,
`Standard Deviation` = sigma) %>%
mutate(`Effect Size` = (Mean - 100) / `Standard Deviation`) %>%
pivot_longer(everything()) %>%
group_by(name) %>%
mode_hdi(value)
```
For the next part, we should look at the structure of the posterior draws, `draws`.
```{r}
head(draws)
```
By default, `head()` returned six rows, each one corresponding to the credible parameter values from a given posterior draw. Following our model equation $\text{Score}_i \sim N(\mu, \sigma)$, we might reformat the first two columns as:
1. `Score` ~ $N$(`r round(data.frame(draws)[1, 1], 3)`, `r round(data.frame(draws)[1, 2], 3)`)
2. `Score` ~ $N$(`r round(data.frame(draws)[2, 1], 3)`, `r round(data.frame(draws)[2, 2], 3)`)
3. `Score` ~ $N$(`r round(data.frame(draws)[3, 1], 3)`, `r round(data.frame(draws)[3, 2], 3)`)
4. `Score` ~ $N$(`r round(data.frame(draws)[4, 1], 3)`, `r round(data.frame(draws)[4, 2], 3)`)
5. `Score` ~ $N$(`r round(data.frame(draws)[5, 1], 3)`, `r round(data.frame(draws)[5, 2], 3)`)
6. `Score` ~ $N$(`r round(data.frame(draws)[6, 1], 3)`, `r round(data.frame(draws)[6, 2], 3)`)
Each row of `draws` yields a full model equation with which we might credibly describe the data--or at least as credibly as we can within the limits of the model. We can give voice to a subset of these credible distributions with our version of the upper right panel of Figure 16.3.
Before I show that plotting code, it might make sense to slow down on the preparatory data wrangling steps. There are several ways to overlay multiple posterior predictive density lines like those in our upcoming plots. We'll practice a few over the next few chapters. For the method we'll use in this chapter, it's handy to first determine how many you'd like. Here we'll follow Kruschke and choose 63, which we'll save as `n_lines`.
```{r}
# how many credible density lines would you like?
n_lines <- 63
```
Now we've got our `n_lines` value, we'll use it to subset the rows in `draws` with the `slice()` function. We'll then subset the columns and use `expand_grid()` to include a sequence of `Score` values to correspond to the formula implied in each of the remaining rows of `draws`. Notice how we also kept the `.draw` index in the game. That will help us with the plot in a bit. But the main event is how we used `Score`, `b_Intercept`, and `sigma` as the input for the arguments in the `dnorm()`. The output is a column of the corresponding density values.
```{r, warning = F}
draws <-
draws %>%
slice(1:n_lines) %>%
select(.draw, b_Intercept, sigma) %>%
expand_grid(Score = seq(from = 40, to = 250, by = 1)) %>%
mutate(density = dnorm(x = Score, mean = b_Intercept, sd = sigma))
str(draws)
```
Note that after using `expand_grid()`, we have a rather long data frame. Anyway, we're ready to plot.
```{r, fig.width = 3, fig.height = 2.25, warning = F, message = F}
draws %>%
ggplot(aes(x = Score)) +
geom_histogram(data = my_data,
aes(y = after_stat(density)),
fill = bp[2],
linewidth = 0.2, binwidth = 5, boundary = 0) +
geom_line(aes(y = density, group = .draw),
linewidth = 1/4, alpha = 1/3, color = bp[6]) +
scale_x_continuous("y", limits = c(50, 210)) +
scale_y_continuous(NULL, breaks = NULL, expand = expansion(mult = c(0, 0.05))) +
ggtitle("Data with Post. Pred.")
```
Note the `after_stat(density)` argument in the `geom_histogram()` function. That's what rescaled the histogram to the density metric. If you leave that part out, all the density lines will drop to the bottom of the plot. Also, did you see how we used `.draw` to group the density lines within the `geom_line()` function? That's why we kept that information. Without that `group = .draw` argument, the resulting lines are a mess.
Kruschke pointed out this last plot
> constitutes a form of posterior-predictive check, by which we check whether the model appears to be a reasonable description of the data. With such a small amount of data, it is difficult to visually assess whether normality is badly violated, but there appears to be a hint that the normal model is straining to accommodate some outliers: The peak of the data protrudes prominently above the normal curves, and there are gaps under the shoulders of the normal curves. (p. 458)
We can perform a similar posterior-predictive check with the `brms::pp_check()` function. By default, it will return 10 simulated density lines. Like we did above, we'll increase that by setting the `ndraws` argument to our `n_lines` value.
```{r, fig.width = 4, fig.height = 2.25, warning = F, message = F}
library(bayesplot)
color_scheme_set(scheme = bp[c(3, 1, 2, 5, 4, 6)])
pp_check(fit16.1, ndraws = n_lines)
```
In principle, we didn't need to load the **bayesplot** package to use the `brms::pp_check()` function. But doing so gave us access to the `bayesplot::color_scheme_set()`, which allowed us to apply the colors from our color palette to the plot.
Before we move on, we should talk a little about effect sizes, which we all but glossed over in our code.
> *Effect size* is simply the amount of change induced by the treatment relative to the standard deviation: $(\mu - 100) / \sigma$. In other words, the effect size is the "standardized" change... A conventionally "small" effect size in psychological research is $0.2$ [@cohenStatisticalPowerAnalysis1988], and the ROPE limits are set at half that size for purposed of illustration. (p. 457, *emphasis* in the original).
Another way to describe this kind of effect size is as a *standardized mean difference*. In addition to the seminal work by Cohen, you might brush up on effect sizes with Kelley and Preacher's [-@kelley2012effect] [*On effect size*](https://www3.nd.edu/~kkelley/publications/articles/Kelley_and_Preacher_Psychological_Methods_2012.pdf).
## Outliers and robust estimation: The $t$ distribution
Here's the code for our version of Figure 16.4.
```{r, fig.width = 5.5, fig.height = 3.5}
# wrangle
crossing(nu = c(Inf, 4, 2, 1),
y = seq(from = -8, to = 8, length.out = 500)) %>%
mutate(density = dt(x = y, df = nu)) %>%
# this line is unnecessary, but will help with the plot legend
mutate(nu = factor(nu, levels = c("Inf", "4", "2", "1"))) %>%
# plot
ggplot(aes(x = y, y = density, group = nu, color = nu)) +
geom_line() +
scale_color_manual(expression(paste(italic(t)[nu])), values = bp[c(6, 3:1)]) +
scale_y_continuous(expression(p(y)), expand = expansion(mult = c(0, 0.05))) +
coord_cartesian(xlim = c(-6, 6)) +
theme(legend.position = c(.92, .75))
```
> Although the $t$ distribution is usually conceived as a sampling distribution for the NHST $t$ test, we will use it instead as a convenient descriptive model of data with outliers... Outliers are simply data values that fall unusually far from a model's expected value. Real data often contain outliers relative to a normal distribution. Sometimes the anomalous values can be attributed to extraneous influences that can be explicitly identified, in which case the affected data values can be corrected or removed. But usually we have no way of knowing whether a suspected outlying value was caused by an extraneous influence, or is a genuine representation of the target being measured. Instead of deleting suspected outliers from the data according to some arbitrary criterion, we retain all the data but use a noise distribution that is less affected by outliers than is the normal distribution. (p. 459)
Here's Figure 16.5.a.
```{r, fig.width = 6, fig.height = 3}
tibble(y = seq(from = -10, to = 20, length.out = 1e3)) %>%
ggplot(aes(x = y)) +
geom_area(aes(y = dnorm(y, mean = 2.5, sd = 5.73)),
fill = bp[2], alpha = 1/2) +
geom_area(aes(y = metRology::dt.scaled(y, df = 1.14, mean = .12, sd = 1.47)),
fill = bp[2], alpha = 1/2) +
geom_vline(xintercept = c(.12, 2.5), color = bp[5], linetype = 3) +
annotate(geom = "point",
x = c(-2:2, 15), y = 0.002,
size = 2, color = bp[6]) +
annotate(geom = "text",
x = c(1, 4), y = c(0.2, 0.08),
label = c("italic(t)", "italic(N)"),
color = bp[1], parse = T) +
scale_y_continuous(expression(p(y)), expand = expansion(mult = c(0, 0.05))) +
ggtitle("Maximum Likelihood Estimates") +
coord_cartesian(xlim = c(-5, 15))
```
I'm not aware that we have the data for the bottom panel of Figure 16.5. However, we can simulate similar data with the `rt.scaled()` function from the [**metRology** package](https://sourceforge.net/projects/metrology/) [@R-metRology].
```{r, fig.width = 6, fig.height = 3}
set.seed(145)
# simulate the data
d <- tibble(y = metRology::rt.scaled(n = 177, df = 2.63, mean = 1.11, sd = 0.15))
# plot
tibble(y = seq(from = -3, to = 12, length.out = 1e3)) %>%
ggplot(aes(y)) +
geom_histogram(data = d,
aes(y = after_stat(density)),
fill = bp[3],
linewidth = 0.2, binwidth = .1) +
geom_line(aes(y = dnorm(y, mean = 1.16, sd = 0.63)),
color = bp[2]) +
geom_line(aes(y = metRology::dt.scaled(y, df = 2.63, mean = 1.11, sd = 0.15)),
color = bp[2]) +
annotate(geom = "text",
x = c(1.5, 1.9), y = c(1.5, 0.6),
label = c("italic(t)", "italic(N)"),
color = bp[1], parse = T) +
scale_x_continuous(breaks = seq(from = -2, to = 10, by = 2)) +
scale_y_continuous(expression(p(y)), expand = expansion(mult = c(0, 0.05))) +
ggtitle("Maximum Likelihood Estimates") +
coord_cartesian(xlim = c(-1.5, 10.25))
```
In case you were curious, this is how I selected the seed for the plot. Run the code yourself to get a sense of how it works.
```{r, fig.width = 2, fig.height = 8, eval = F}
# in the R Notebook code block settings, I used: fig.width = 2, fig.height = 8
t_maker <- function(seed) {
set.seed(seed)
tibble(y = metRology::rt.scaled(n = 177, df = 2.63, mean = 1.11, sd = 0.15)) %>%
summarise(min = min(y),
max = max(y)) %>%
mutate(spread = max - min)
}
tibble(seed = 1:200) %>%
mutate(t = map(seed, t_maker)) %>%
unnest(t) %>%
ggplot(aes(x = reorder(seed, spread), ymin = min, ymax = max)) +
geom_hline(yintercept = 0, color = "white") +
geom_linerange() +
coord_flip()
```
> It is important to understand that the scale parameter $\sigma$ in the $t$ distribution is not the standard deviation of the distribution. (Recall that the standard deviation is the square root of the variance, which is the expected value of the squared deviation from the mean, as defined back in Equation 4.8, p. 86.) The standard deviation is actually larger than $\sigma$ because of the heavy tails... While this value of the scale parameter is not the standard deviation of the distribution, it does have an intuitive relation to the spread of the data. Just as the range $\pm \sigma$ covers the middle $68\%$ of a *normal* distribution, the range $\pm \sigma$ covers the middle $58\%$ of a $t$ distribution when $\nu = 2$, and the middle $50\%$ when $\nu = 1$. These areas are illustrated in the left column of Figure 16.6. The right column of Figure 16.6 shows the width under the middle of a $t$ distribution that is needed to span $68.27\%$ of the distribution, which is the area under a normal distribution for $\sigma = \pm 1$. (pp. 459--461, *emphasis* in the original)
Speaking of which, here's the code for the left column for Figure 16.6.
```{r}
# the primary data
d <-
crossing(y = seq(from = -8, to = 8, length.out = 1e3),
nu = c(Inf, 5, 2, 1)) %>%
mutate(label = str_c("nu == ", nu) %>%
factor(., levels = c("nu == Inf", "nu == 5", "nu == 2", "nu == 1")))
# the subplot
p1 <-
d %>%
ggplot(aes(x = y)) +
geom_area(aes(y = dt(y, df = nu)),
fill = bp[2]) +
geom_area(data = . %>% filter(y >= -1 & y <= 1),
aes(y = dt(y, df = nu)),
fill = bp[1]) +
# note how this function has its own data
geom_text(data = tibble(
y = 0,
text = c("68%", "64%", "58%", "50%"),
label = factor(c("nu == Inf", "nu == 5", "nu == 2", "nu == 1"))),
aes(y = .175, label = text),
color = "white") +
scale_y_continuous(expression(p(y)), expand = expansion(mult = c(0, 0.05)), breaks = c(0, .2, .4)) +
labs(subtitle = "Shaded from y = - 1 to y = 1") +
coord_cartesian(xlim = c(-6, 6)) +
facet_wrap(~ label, ncol = 1, labeller = label_parsed)
```
Here's the code for the right column.
```{r}
# the primary data
d <-
tibble(nu = c(Inf, 5, 2, 1),
ymin = c(-1, -1.11, -1.32, -1.84)) %>%
mutate(ymax = -ymin) %>%
expand_grid(y = seq(from = -8, to = 8, length.out = 1e3)) %>%
mutate(label = factor(str_c("nu==", nu),
levels = str_c("nu==", c(Inf, 5, 2, 1))))
# the subplot
p2 <-
d %>%
ggplot(aes(x = y)) +
geom_area(aes(y = dt(y, df = nu)),
fill = bp[2]) +
geom_area(data = . %>%
# notice our `filter()` code has changed
filter(y >= ymin & y <= ymax),
aes(y = dt(y, df = nu)),
fill = bp[1]) +
annotate(geom = "text",
x = 0, y = .175,
label = "68%", color = "white") +
scale_y_continuous(expression(p(y)), expand = expansion(mult = c(0, 0.05)), breaks = c(0, .2, .4)) +
labs(subtitle = "Shaded for the middle 68.27%") +
coord_cartesian(xlim = c(-6, 6)) +
facet_wrap(~ label, ncol = 1, labeller = label_parsed)
```
You may have noticed that we just pulled the values in the `ymin` column directly from Kruschke's version of the figure on page 461. If you'd like a better understanding of where those values came from, you can reproduce them with the `qt()` function.
```{r}
qt(p = (1 - .6827) / 2,
df = c(Inf, 5, 2, 1)) %>%
round(digits = 2)
```
Anyway, let's bind the two ggplots together with the **patchwork** package to make the full version of Figure 16.6.
```{r, fig.width = 7, fig.height = 6, warning = F, message = F}
p1 + p2
```
> The use of a heavy-tailed distribution is often called *robust estimation* because the estimated value of the central tendency is stable, that is, "robust," against outliers. The $t$ distribution is useful as a likelihood function for modeling outliers at the level of observed data. But the $t$ distribution is also useful for modeling outliers at higher levels in a hierarchical prior. We will encounter several applications. (p. 462, *emphasis* in the original)
### Using the $t$ distribution in ~~JAGS~~ brms.
It's easy to use Student's $t$ in **brms**. Just make sure to set `family = student`. By default, **brms** already sets the lower bound for $\nu$ to 1. But we do still need to use 1/29. To get a sense, let's simulate exponential data using the `rexp()` function. Like Kruschke covered in the text (p. 462), the `rexp()` function has one parameter, `rate`, which is the reciprocal of the mean. Here we'll set the mean to 29.
```{r, message = F}
n_draws <- 1e7
mu <- 29
set.seed(16)
tibble(y = rexp(n = n_draws, rate = 1 / mu)) %>%
mutate(y_at_least_1 = ifelse(y < 1, NA, y)) %>%
pivot_longer(everything()) %>%
group_by(name) %>%
summarise(mean = mean(value, na.rm = T))
```
The simulation showed that when we define the exponential rate as 1/29 and use the typical boundary at 0, the mean of our samples converges to 29. But when we only consider the samples of 1 or greater, the mean converges to 30. Thus, our $\operatorname{Exponential}(1/29)$ prior with a boundary at 1 is how we get a shifted exponential distribution when we use it as our prior for $\nu$ in **brms**. Just make sure to remember that if you want the mean to be 30, you'll need to specify the rate of 1/29.
Also, Stan will bark if you simply try to set that exponential prior with the code `prior(exponential(1/29), class = nu)`:
> DIAGNOSTIC(S) FROM PARSER:
Info: integer division implicitly rounds to integer. Found int division: 1 / 29
Positive values rounded down, negative values rounded up or down in platform-dependent way.
To avoid this, just do the division beforehand and save the results with `stanvar()`.
```{r}
stanvars <-
stanvar(mean_y, name = "mean_y") +
stanvar(sd_y, name = "sd_y") +
stanvar(1/29, name = "one_over_twentynine")
```
Here's the `brm()` code. Note that we set the prior for our new $\nu$ parameter by specifying `class = nu` within the last `prior()` line.
```{r fit16.2}
fit16.2 <-
brm(data = my_data,
family = student,
Score ~ 1,
prior = c(prior(normal(mean_y, sd_y * 100), class = Intercept),
prior(normal(0, sd_y), class = sigma),
prior(exponential(one_over_twentynine), class = nu)),
chains = 4, cores = 4,
stanvars = stanvars,
seed = 16,
file = "fits/fit16.02")
```
We can make the shifted exponential distribution (i.e., Figure 16.7) with simple addition.
```{r, fig.width = 5, fig.height = 4, message = F, warning = F}
# how many draws would you like?
n_draws <- 1e6
# here are the data
d <-
tibble(exp = rexp(n_draws, rate = 1/29)) %>%
transmute(exp_plus_1 = exp + 1,
log_10_exp_plus_1 = log10(exp + 1))
# this is the plot in the top panel
p1 <-
d %>%
ggplot(aes(x = exp_plus_1)) +
geom_histogram(fill = bp[2],
linewidth = 0.2, binwidth = 5, boundary = 1) +
stat_pointinterval(aes(y = 0),
point_interval = mode_hdi, .width = .95, color = bp[6]) +
scale_y_continuous(NULL, breaks = NULL) +
labs(title = expression(exponential(lambda==29)~shifted~+1),
x = expression(nu)) +
coord_cartesian(xlim = c(0, 150))
# the bottom panel plot
p2 <-
d %>%
ggplot(aes(x = log_10_exp_plus_1)) +
geom_histogram(fill = bp[2],
linewidth = 0.2, binwidth = .1, boundary = 0) +
stat_pointinterval(aes(y = 0),
point_interval = mode_hdi, .width = .95, color = bp[6]) +
scale_y_continuous(NULL, breaks = NULL) +
xlab(expression(log10(nu))) +
coord_cartesian(xlim = c(0, 2.5))
# bind them together
(p1 / p2) & scale_x_continuous(expand = expansion(mult = c(0, 0.05)))
```
Here are the scatter plots of Figure 16.8.
```{r, fig.width = 4.5, fig.height = 4}
pairs(fit16.2,
off_diag_args = list(size = 1/3, alpha = 1/3))
```
I'm not aware of an easy way to use `log10(nu)` instead of `nu` with `brms::pairs()`. However, you can get those plots with the `as_draws_df()` function and a little wrangling.
```{r, fig.width = 2.5, fig.height = 4, warning = F}
draws <- as_draws_df(fit16.2)
draws %>%
mutate(`log10(nu)` = log10(nu)) %>%
rename(mu = b_Intercept) %>%
select(mu, sigma, `log10(nu)`) %>%
pivot_longer(-`log10(nu)`) %>%
ggplot(aes(x = `log10(nu)`, y = value)) +
geom_point(color = bp[6], size = 1/3, alpha = 1/3) +
labs(x = expression(log10(nu)),
y = NULL) +
facet_grid(name ~ ., scales = "free", switch = "y", labeller = label_parsed)
```
If you want the Pearson's correlation coefficients, you can use base **R** `cor()`.
```{r, warning = F}
draws %>%
mutate(`log10(nu)` = log10(nu)) %>%
select(b_Intercept, sigma, `log10(nu)`) %>%
cor() %>%
round(digits = 3)
```
The correlations among our parameters are a similar magnitude as those Kruschke presented in the text. Here are four of the panels for Figure 16.9.
```{r, fig.width = 6, fig.height = 4, warning = F}
# we'll use this to mark off the ROPEs as white strips in the background
rope <-
tibble(name = c("Mean", "Scale", "Effect Size"),
xmin = c(99, 14, -.1),
xmax = c(101, 16, .1))
# here are the primary data
draws %>%
transmute(Mean = b_Intercept,
Scale = sigma,
Normality = log10(nu)) %>%
mutate(`Effect Size` = (Mean - 100) / Scale) %>%
pivot_longer(everything()) %>%
# the plot
ggplot() +
geom_rect(data = rope,
aes(xmin = xmin, xmax = xmax,
ymin = -Inf, ymax = Inf),
color = "transparent", fill = "white") +
stat_histinterval(aes(x = value, y = 0),
point_interval = mode_hdi, .width = .95,
fill = bp[2], color = bp[6],
normalize = "panels") +
scale_y_continuous(NULL, breaks = NULL) +
xlab(NULL) +
facet_wrap(~ name, scales = "free", ncol = 2)
```
For the final panel of Figure 16.9, we'll make our $t$ lines in much the same way we did, earlier. But last time, we just took the first $\mu$ and $\sigma$ values from the first 63 rows of the `post` tibble. This time we'll use `dplyr::slice_sample()` to take *random draws* from the `post` rows instead. We tell `slice_sample()` how many draws we'd like with the `n` argument.
In addition to the change in our row selection strategy, this time we'll slightly amend the code within the last `mutate()` line. Since we'd like to work with the $t$ distribution, we specified `metRology::dt.scaled()` function instead of `dnorm()`.
```{r, fig.width = 3, fig.height = 2.25}
# how many credible density lines would you like?
n_lines <- 63
# setting the seed makes the results from `slice_sample()` reproducible
set.seed(16)
# wrangle
draws %>%
slice_sample(n = n_lines) %>%
expand_grid(Score = seq(from = 40, to = 250, by = 1)) %>%
mutate(density = metRology::dt.scaled(x = Score, df = nu, mean = b_Intercept, sd = sigma)) %>%
# plot
ggplot(aes(x = Score)) +
geom_histogram(data = my_data,
aes(y = after_stat(density)),
fill = bp[2],
linewidth = 0.2, binwidth = 5, boundary = 0) +
geom_line(aes(y = density, group = .draw),
linewidth = 1/3, alpha = 1/3, color = bp[6]) +
scale_y_continuous(NULL, breaks = NULL, expand = expansion(mult = c(0, 0.05))) +
labs(title = "Data with Post. Pred.",
x = "y") +
coord_cartesian(xlim = c(50, 210))
```
Much like Kruschke mused in the text, this plot
> shows that the posterior predictive $t$ distributions appear to describe the data better than the normal distribution in Figure 16.3, insofar as the data histogram does not poke out at the mode and the gaps under the shoulders are smaller. (p. 464)
In case you were wondering, here's the model `summary()`.
```{r}
summary(fit16.2)
```
It's easy to miss how
> $\sigma$ in the robust estimate is much smaller than in the normal estimate. What we had interpreted as increased standard deviation induced by the smart drug might be better described as increased outliers. Both of these differences, that is, $\mu$ more tightly estimated and $\sigma$ smaller in magnitude, are a result of there being outliers in the data. The only way a normal distribution can accommodate the outliers is to use a large value for $\sigma$. In turn, that leads to "slop" in the estimate of $\mu$ because there is a wider range of $\mu$ values that reasonably fit the data when the standard deviation is large. (p. 464)
We can use the `brms::VarCorr()` function to pull the summary statistics for $\sigma$ from both models.
```{r}
VarCorr(fit16.1)$residual__$sd
VarCorr(fit16.2)$residual__$sd
```
It is indeed the case that estimate for $\sigma$ is smaller in the $t$ model. That smaller $\sigma$ resulted in a more precise estimate for $\mu$, as can be seen in the 'Est.Error' columns from the `fixef()` output.
```{r}
fixef(fit16.1)
fixef(fit16.2)
```
Here that is in a coefficient plot using `tidybayes::stat_interval()`.
```{r, fig.width = 6, fig.height = 1.25, warning = F}
bind_rows(as_draws_df(fit16.1) %>% select(b_Intercept),
as_draws_df(fit16.2) %>% select(b_Intercept)) %>%
mutate(fit = rep(c("fit16.1", "fit16.2"), each = n() / 2)) %>%
ggplot(aes(x = b_Intercept, y = fit)) +
stat_interval(point_interval = mode_hdi, .width = c(.5, .8, .95)) +
scale_color_manual("HDI", values = c(bp[c(4, 3, 1)]),
labels = c("95%", "80%", "50%")) +
labs(x = expression(mu),
y = NULL) +
theme(legend.key.size = unit(0.45, "cm"))
```
### Using the $t$ distribution in Stan.
Kruschke expressed concern about high autocorrelations in the chains of his JAGS model. Here are the results of our Stan/**brms** attempt.
```{r, fig.width = 6, fig.height = 4, message = F, warning = F}
# rearrange the bayesplot color scheme
color_scheme_set(scheme = bp[c(6, 2, 2, 2, 1, 2)])
draws %>%
mutate(chain = .chain) %>%
mcmc_acf(pars = vars(b_Intercept:nu), lags = 35)
```
For all three parameters, the autocorrelations were near zero by lag 3 or 4. Not bad. The $N_\textit{eff}/N$ ratios are okay.
```{r, fig.width = 6, fig.height = 1.25}
# rearrange the bayesplot color scheme again
color_scheme_set(scheme = bp[c(2:1, 4:3, 5:6)])
neff_ratio(fit16.2) %>%
mcmc_neff(size = 2.5) +
yaxis_text(hjust = 0)
```
The trace plots look fine.
```{r, fig.width = 8, fig.height = 4}
# rearrange the bayesplot color scheme one more time
color_scheme_set(scheme = c("white", bp[c(2, 2, 2, 6, 3)]))
plot(fit16.2, widths = c(2, 3))
```
The values for `nu` are pretty skewed, but hopefully it makes sense to you why that might be the case. Here are the overlaid density plots.
```{r, fig.width = 8, fig.height = 2}
draws %>%
mutate(chain = .chain) %>%
mcmc_dens_overlay(pars = vars(b_Intercept:nu))
```
The $\widehat R$ values are right where we like them.
```{r}
rhat(fit16.2)[1:3]
```
If you peer into the contents of a `brm()` fit object (e.g., `fit16.2 %>% str()`), you'll discover it contains the Stan code. Here it is for our `fit16.2`.