Skip to content

Commit

Permalink
Merge pull request #137 from dbosak01/main
Browse files Browse the repository at this point in the history
Added Paired T-Test
  • Loading branch information
statasaurus authored Jan 30, 2024
2 parents 0268e66 + fcc1a37 commit 46bf241
Show file tree
Hide file tree
Showing 9 changed files with 319 additions and 7 deletions.
47 changes: 47 additions & 0 deletions Comp/r-sas_ttest_Paired.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
title: "R vs SAS Paired T-Test"
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(procs)
```

# Paired t-test Comparison

The following table shows the types of Paired t-test analysis, the capabilities of each language, and whether or not the results from each language match.

| Analysis | Supported in R | Supported in SAS | Results Match | Notes |
|---------------|---------------|---------------|---------------|---------------|
| Paired t-test, normal data | [Yes](../R/ttest_Paired.html#normal) | [Yes](../SAS/ttest_Paired.html#normal) | [Yes](#normal) | In Base R, use `paired = TRUE` on `t.test()` function |
| Paired t-test, lognormal data | [Maybe](../R/ttest_Paired.html#lognormal) | [Yes](../SAS/ttest_Paired.html#lognormal) | [NA](#lognormal) | May be supported by **envstats** package |

## Comparison Results

### Normal Data {#normal}

Here is a table of comparison values between `t.test()`, `proc_ttest()`, and SAS `PROC TTEST`:

| Statistic | t.test() | proc_ttest() | PROC TTEST | Match | Notes |
|--------------------|----------|--------------|------------|-------|-------|
| Degrees of Freedom | 11 | 11 | 11 | Yes | |
| t value | -1.089648 |-1.089648 | -1.089648 | Yes | |
| p value | 0.2992 | 0.2992 | 0.2992 | Yes | |

### Lognormal Data {#lognormal}

Since there is currently no known support for lognormal t-test in R, this comparison is not applicable.

# Summary and Recommendation

For normal data, the R paired t-test capabilities are comparable to SAS. Comparison between SAS and R show identical results for the datasets tried. The **procs** package `proc_ttest()` function is very similar to SAS in the syntax and output produced. `proc_ttest()` also supports by groups, where `t.test()` does not.

For the lognormal version of the t-test, it does not appear to be supported in the **stats** or **procs** package. It may be supported in the **envstats** package. More exploration is needed to determine whether this package will produce the expected results, and whether the results will match SAS.

# References

R `t.test()` documentation: <https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/t.test>

R `proc_ttest()` documentation: <https://procs.r-sassy.org/reference/proc_ttest.html>

SAS `PROC TTEST` Paired analysis documentation: <https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_ttest_syntax08.htm>
80 changes: 80 additions & 0 deletions R/ttest_Paired.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
---
title: "Paired t-test"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# **Paired t-test in R**

The Paired t-test is used when two samples are naturally correlated. In the Paired t-test, the difference of the means between the two samples is compared to a given number that represents the null hypothesis. For a Paired t-test, the number of observations in each sample must be equal.

In R, a Paired t-test can be performed using the Base R `t.test()` from the **stats** package or the `proc_ttest()` function from the **procs** package.

## Normal Data {#normal}

By default, the R paired t-test functions assume normality in the data and use a classic Student's t-test.

### Data Used

The following data was used in this example.

```{r eval=TRUE, echo = TRUE}
# Create sample data
pressure <- tibble::tribble(
~SBPbefore, ~SBPafter,
120, 128,
124, 131,
130, 131,
118, 127,
140, 132,
128, 125,
140, 141,
135, 137,
126, 118,
130, 132,
126, 129,
127, 135
)
```

### Base R

#### Code

The following code was used to test the comparison in Base R.

```{r eval=TRUE, echo = TRUE}
# Perform t-test
t.test(pressure$SBPbefore, pressure$SBPafter, paired = TRUE)
```

### Procs Package

#### Code

The following code from the **procs** package was used to perform a paired t-test.

```{r eval=TRUE, echo = TRUE, message=FALSE, warning=FALSE}
library(procs)
# Perform t-test
proc_ttest(pressure,
paired = "SBPbefore*SBPafter")
```

Viewer Output:

```{r, echo=FALSE, fig.align='center', out.width="50%"}
knitr::include_graphics("../images/ttest/paired_rtest1.png")
```

## Lognormal Data {#lognormal}

The Base R `t.test()` function does not have an option for lognormal data. Likewise, the **procs** `proc_ttest()` function also does not have an option for lognormal data.

One possibility may be the `tTestLnormAltPower()` function from the **EnvStats** package. This package has not been evaluated yet.
8 changes: 4 additions & 4 deletions SAS/ttest.qmd → SAS/ttest_2Sample.qmd
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
---
title: "Students t-test"
title: "Independant Two-Sample t-test"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

### **Independant Two-Sample t-test in SAS**
### **Independent Two-Sample t-test in SAS**

The null hypothesis of the Independent Samples t-test is, the means for the two populations are equal.

Expand Down Expand Up @@ -37,7 +37,7 @@ Here the t-value is --0.70, degrees of freedom is 30 and P value is 0.4912 which

Note: Before entering straight into the t-test we need to check whether the assumptions (like the equality of variance, the observations should be independent, observations should be normally distributed) are met or not. If normality is not satisfied, we may consider using a suitable non-parametric test.

1. Normality: You can check for data to be normally distributed by plotting a histogram of the data by treatment. Alternatively, you can use the Shapiro-Wilk test or the Kolmogorov-Smirnov test. If the test is <0.05 and your sample is quite small then this suggests you should not use the t-test. However, if your sample in each treatment group is large (say >30 in each group), then you do not need to rely so heavily on the assumption that the data have an underlying normal distribution in order to apply the two-sample t-test. This is where plotting the data using histograms can help to support investigation into the normality assumption. We have checked the normality of the observations using the code below. Here for both the treatment groups we have P value greater than 0.05 (Shapiro-Wilk test is used), therefore the normality assumption is there for our data.
1. Normality: You can check for data to be normally distributed by plotting a histogram of the data by treatment. Alternatively, you can use the Shapiro-Wilk test or the Kolmogorov-Smirnov test. If the test is \<0.05 and your sample is quite small then this suggests you should not use the t-test. However, if your sample in each treatment group is large (say \>30 in each group), then you do not need to rely so heavily on the assumption that the data have an underlying normal distribution in order to apply the two-sample t-test. This is where plotting the data using histograms can help to support investigation into the normality assumption. We have checked the normality of the observations using the code below. Here for both the treatment groups we have P value greater than 0.05 (Shapiro-Wilk test is used), therefore the normality assumption is there for our data.

```{r}
#| eval: false
Expand Down Expand Up @@ -65,7 +65,7 @@ knitr::include_graphics("../images/ttest/trt_sas.png")
knitr::include_graphics("../images/ttest/placb_sas.png")
```

2. Homogeneity of variance (or Equality of variance): Homogeniety of variance will be tested by default in PROC TTEST itself by Folded F-test. In our case the P values is 0.6981 which is greater than 0.05. So we accept the null hypothesis of F-test, i.e. variances are same. Then we will consider the pooled method for t-test. If the F test is statistically significant (p<0.05), then the pooled t-test may give erroneous results. In this instance, if it is believed that the population variances may truly differ, then the Satterthwaite (unequal variances) analysis results should be used. These are provided in the SAS output alongside the Pooled results as default.
2. Homogeneity of variance (or Equality of variance): Homogeniety of variance will be tested by default in PROC TTEST itself by Folded F-test. In our case the P values is 0.6981 which is greater than 0.05. So we accept the null hypothesis of F-test, i.e. variances are same. Then we will consider the pooled method for t-test. If the F test is statistically significant (p\<0.05), then the pooled t-test may give erroneous results. In this instance, if it is believed that the population variances may truly differ, then the Satterthwaite (unequal variances) analysis results should be used. These are provided in the SAS output alongside the Pooled results as default.

Output:

Expand Down
82 changes: 82 additions & 0 deletions SAS/ttest_Paired.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
title: "Paired t-test"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# **Paired t-test in SAS**

The Paired t-test is used when two samples are naturally correlated. In the Paired t-test, the difference of the means between the two samples is compared to a given number that represents the null hypothesis. For a Paired t-test, the number of observations in each sample must be equal.

In SAS, a Paired t-test is typically performed using PROC TTEST.

## Normal Data {#normal}

By default, SAS PROC TTEST t-test assumes normality in the data and uses a classic Student's t-test.

### Data Used

The following data was used in this example.

```
data pressure;
input SBPbefore SBPafter @@;
datalines;
120 128 124 131 130 131 118 127
140 132 128 125 140 141 135 137
126 118 130 132 126 129 127 135
;
```

### Code

The following code was used to test the comparison of two paired samples of Systolic Blood Pressure before and after a procedure.

```
proc ttest data=pressure;
paired SBPbefore*SBPafter;
run;
```

Output:

```{r, echo=FALSE, fig.align='center', out.width="50%"}
knitr::include_graphics("../images/ttest/paired_test1.png")
```

## Lognormal Data {#lognormal}

The SAS paired t-test also supports analysis of lognormal data. Here is the data used for the lognormal analysis.

### Data

```
data auc;
input TestAUC RefAUC @@;
datalines;
103.4 90.11 59.92 77.71 68.17 77.71 94.54 97.51
69.48 58.21 72.17 101.3 74.37 79.84 84.44 96.06
96.74 89.30 94.26 97.22 48.52 61.62 95.68 85.80
;
```

### Code

For cases when the data is lognormal, SAS offers the "DIST" option to chose between a normal and lognormal distribution. The procedure also offers the TOST option to specify the equivalence bounds.

```
proc ttest data=auc dist=lognormal tost(0.8, 1.25);
paired TestAUC*RefAUC;
run;
```

Output:

```{r, echo=FALSE, fig.align='center', out.width="70%"}
knitr::include_graphics("../images/ttest/paired_test2.png")
```

As can be seen in the figure above, the lognormal variation of the TTEST procedure offers additional results for geometric mean, coefficient of variation, and TOST equivalence analysis. The output also includes multiple p-values.
7 changes: 4 additions & 3 deletions data/stat_method_tbl.csv
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
method_grp,method_subgrp,r_links,sas_links,comparison_links
method_grp,method_subgrp,r_links,sas_links,comparison_links
Summary Statistics,Rounding,[R](R/rounding),[SAS](SAS/rounding),[R vs SAS](Comp/r-sas_rounding)
Summary Statistics,Summary statistics,[R](R/summary-stats),[SAS](SAS/summary-stats),[R vs SAS](Comp/r-sas-summary-stats)
General Linear Models,Students t-test,,[SAS](SAS/ttest),
General Linear Models,Paired t-test,,,
General Linear Models,One Sample t-test,,,
General Linear Models,Paired t-test,[R](R/ttest_Paired),[SAS](SAS/ttest_Paired),[R vs SAS](Comp/r-sas_ttest_Paired)
General Linear Models,Two Sample t-test,,[SAS](SAS/ttest_2Sample),
General Linear Models,ANOVA,[R](R/anova),[SAS](SAS/anova),[R vs SAS](Comp/r-sas_anova)
General Linear Models,ANCOVA,[R](R/ancova),,
General Linear Models,MANOVA,[R](R/manova),[SAS](SAS/manova),[R vs SAS](Comp/r-sas_manova)
Expand Down
Binary file added images/ttest/paired_rtest1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/ttest/paired_test1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/ttest/paired_test2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 46bf241

Please sign in to comment.