Skip to content

Commit

Permalink
Merge pull request #369 from yannickvandendijck/tobit-regression
Browse files Browse the repository at this point in the history
Tobit regression - added R-SAS Comparison
  • Loading branch information
statasaurus authored Jan 2, 2025
2 parents be62d36 + 6141b3a commit e509147
Show file tree
Hide file tree
Showing 2 changed files with 52 additions and 2 deletions.
47 changes: 47 additions & 0 deletions Comp/r-sas_tobit.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
title: "R vs SAS Tobit Regression"
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Tobit Regression Comparison

The following table shows the types of Two Sample t-test analysis, the capabilities of each language, and whether or not the results from each language match.

| Analysis | Supported in R | Supported in SAS | Results Match | Notes |
|----------|----------------|------------------|---------------|-------|
| Tobit Regression (normal distributed data assumption) | [Yes](../R/tobit%20regression.html) | [Yes](../SAS/tobit%20regression%20SAS.html) | Yes | The results from `censReg::censReg` and `survival::survreg` match the SAS `PROC LIFEREG` results
|

## Comparison Results

### Normally distributed data assumption

Here is a table of comparison values between the R functions `censReg::censReg`, `survival::survreg`, `VGAM::vglm`, and SAS `PROC LIFEREG` for the dataset used.
The statistics around the treatment effect (difference between group A and B, B-A) are provided. Further we also present the estimate of $\sigma$. All numbers are rounded to 4 digits

| Statistic | censReg() | survreg() | vglm() | LIFEREG | Match | Notes |
|--------------------|------------|-----------|--------|---------|-------|-------|
| Treatment effect | 1.8225 | 1.8225 | 1.8226 | 1.8225 | Yes | see below
| Standard error | 0.8061 | 0.8061 | 0.7942 | 0.8061 | Yes | see below
| p-value | 0.0238 | 0.0238 | 0.0217 | 0.0238 | Yes | see below
| 95% CI (Wald based)| 0.2427 ; 3.4024 | 0.2427 ; 3.4024 | 0.2661 ; 3.3791 | 0.2427 ; 3.4024 | Yes | see below
| $\sigma$ | 1.7316 | 1.7316 | 1.7317 | 1.7316 | Yes | see below


Note: The results of `VGAM::vglm()` are slightly different since an iteratively reweighted least squares (IRLS) algorithm is used for estimation.


# Summary and Recommendation

Comparison between SAS `PROC LIFEREG` and R functions `censReg::censReg` and `survival::survreg` show identical results for the dataset tried.

Historically and typically the Tobit model is based on the assumption of normal distributed data. Within SAS `PROC LIFEREG` and R `survival::survreg` multiple other different distributional assumption are possible. These include *weibull*, *exponential*, *gaussian*, *logistic*, *lognormal* and *loglogistic* for `survival::survreg`. These include *EXPONENTIAL*, *GAMMA*, *LLOGISTIC*, *LOGISTIC*, *LOGNORMAL*, *NORMAL*, *WEIBULL* for `PROC LIFEREG`.

# References

Breen, R. (1996). Regression models. SAGE Publications, Inc., https://doi.org/10.4135/9781412985611

Tobin, James (1958). "Estimation of Relationships for Limited Dependent Variables". Econometrica. 26 (1): 24-36. doi:10.2307/1907382
7 changes: 5 additions & 2 deletions R/tobit regression.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -115,13 +115,16 @@ summary(res_survreg)
lsm = emmeans(res_survreg, specs = trt.vs.ctrl ~ ARM)
lsm$emmeans
# Difference between groups (Wald CIs)
# Difference between groups
lsm_contrast = broom::tidy(lsm$contrasts, conf.int=TRUE, conf.level=0.95)
gt(lsm_contrast) %>%
fmt_number(decimals = 3)
# Wald-based CIs
round(stats::confint(res_survreg, level = 0.95)[2,], 3)
```

The output provides an estimate of difference between groups A and B (B-A), namely 1.823 (se=0.806). The presented p-value is a two-sided p-value based on the Z-test. The output also provides an estimate for $log(\sigma) = 0.549$. Using the `emmeans` package/function least square means and contrast can be easily obtained. The confidence intervals and p-value is based on the t-test.
The output provides an estimate of difference between groups A and B (B-A), namely 1.823 (se=0.806). The presented p-value is a two-sided p-value based on the Z-test. The output also provides an estimate for $log(\sigma) = 0.549$. Using the `emmeans` package/function least square means and contrast can be easily obtained. The confidence intervals and p-value is based on the t-test using `emmeans`. Wald based confidence intervals can be obtained by the `stats::confint` function.


### vglm
Expand Down

0 comments on commit e509147

Please sign in to comment.