From eb396036ed1a88953e95f945cd5cd5a7fb8176bb Mon Sep 17 00:00:00 2001 From: statasaurus <55274484+statasaurus@users.noreply.github.com> Date: Wed, 7 Feb 2024 09:34:54 +0000 Subject: [PATCH] =?UTF-8?q?Deploying=20to=20gh-pages=20from=20@=20PSIAIMS/?= =?UTF-8?q?CAMIS@f08dae0c23790b6d5d519170e6fd2034e6ca18e9=20=F0=9F=9A=80?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- Comp/r-sas_cmh.html | 106 +++--- R/ancova.html | 636 +++++++++++++++---------------- R/anova.html | 2 +- R/cmh.html | 106 +++--- SAS/anova.html | 2 +- SAS/linear-regression.html | 2 +- data-info/sas_disease.html | 740 ------------------------------------- index.html | 106 +++--- minutes/index.html | 26 +- search.json | 611 +++++++++++++++--------------- 10 files changed, 795 insertions(+), 1542 deletions(-) delete mode 100644 data-info/sas_disease.html diff --git a/Comp/r-sas_cmh.html b/Comp/r-sas_cmh.html index 95f83c76..ce66c43e 100644 --- a/Comp/r-sas_cmh.html +++ b/Comp/r-sas_cmh.html @@ -330,23 +330,23 @@

CMH Statistics

As it can be seen, there are two schemata where R does not provide any results:

-
- diff --git a/R/ancova.html b/R/ancova.html index 3044fab0..ef108c43 100644 --- a/R/ancova.html +++ b/R/ancova.html @@ -253,23 +253,23 @@

The Model

model_tidy <- model_ancova %>% tidy() model_glance %>% gt()
-
- @@ -733,23 +733,23 @@

The Model

model_tidy   %>% gt()
-
- @@ -1227,23 +1227,23 @@

The Model

add_row(term = "Total", df = sum(.$df), sumsq = sum(.$sumsq)) model_table %>% gt()
-
- @@ -1726,23 +1726,23 @@

Type 1

get_anova_table() %>% gt()
-
- @@ -2219,23 +2219,23 @@

Type 2

get_anova_table() %>% gt()
-
- @@ -2712,23 +2712,23 @@

Type 3

get_anova_table() %>% gt()
-
- diff --git a/R/anova.html b/R/anova.html index 2ede8dd2..5cfd4acc 100644 --- a/R/anova.html +++ b/R/anova.html @@ -208,7 +208,7 @@

ANOVA

Getting Started

-

To demonstrate the various types of sums of squares, we’ll create a data frame called df_disease taken from the SAS documentation (reference). For more information/to investigate this data go here

+

To demonstrate the various types of sums of squares, we’ll create a data frame called df_disease taken from the SAS documentation.

The Model

diff --git a/R/cmh.html b/R/cmh.html index 795d34b5..a0b4277d 100644 --- a/R/cmh.html +++ b/R/cmh.html @@ -222,23 +222,23 @@

Available R packages<

We did not find any R package that delivers all the same measures as SAS at once. Therefore, we tried out multiple packages:

-
- diff --git a/SAS/anova.html b/SAS/anova.html index 3049ffcb..2ce667bb 100644 --- a/SAS/anova.html +++ b/SAS/anova.html @@ -207,7 +207,7 @@

linear-models

Getting Started

-

To demonstrate the various types of sums of squares, we'll create a data frame called df_disease taken from the SAS documentation (reference). For more information/to investigate this data go here

+

To demonstrate the various types of sums of squares, we’ll create a data frame called df_disease taken from the SAS documentation.

The Model

diff --git a/SAS/linear-regression.html b/SAS/linear-regression.html index d375b48e..1d54ec56 100644 --- a/SAS/linear-regression.html +++ b/SAS/linear-regression.html @@ -198,7 +198,7 @@

Linear Regression

Published
-

6 February, 2024

+

7 February, 2024

diff --git a/data-info/sas_disease.html b/data-info/sas_disease.html deleted file mode 100644 index 807adb6c..00000000 --- a/data-info/sas_disease.html +++ /dev/null @@ -1,740 +0,0 @@ - - - - - - - - - -SAS Disease - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
- -
- -
- - - - -
- -
-
-

SAS Disease

-
- - - -
- - - - -
- - - -
- - -

To demonstrate the various types of sums of squares, we’ll create a data frame called `df_disease` taken from the SAS documentation (__reference__). The summary of the data is shown.

-

The general summary of the data is as follows

-
-
summary(df_disease)
-
-
 drug   disease       y        
- 1:18   1:24    Min.   :-6.00  
- 2:18   2:24    1st Qu.: 9.00  
- 3:18   3:24    Median :21.00  
- 4:18           Mean   :18.88  
-                3rd Qu.:28.00  
-                Max.   :44.00  
-                NA's   :14     
-
-
skim(df_disease)
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Data summary
Namedf_disease
Number of rows72
Number of columns3
_______________________
Column type frequency:
factor2
numeric1
________________________
Group variablesNone
-

Variable type: factor

- -------- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
skim_variablen_missingcomplete_rateorderedn_uniquetop_counts
drug01FALSE41: 18, 2: 18, 3: 18, 4: 18
disease01FALSE31: 24, 2: 24, 3: 24
-

Variable type: numeric

- ------------- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
skim_variablen_missingcomplete_ratemeansdp0p25p50p75p100hist
y140.8118.8812.8-69212844▅▆▆▇▂
-
-
- - - -
- -
- - - - - \ No newline at end of file diff --git a/index.html b/index.html index e0efb4b3..ef5237d7 100644 --- a/index.html +++ b/index.html @@ -185,23 +185,23 @@

Repository

The repository below provides examples of statistical methodology in different software and languages, along with a comparison of the results obtained and description of any discrepancies.

-
- diff --git a/minutes/index.html b/minutes/index.html index b4b0f4ba..8fa20164 100644 --- a/minutes/index.html +++ b/minutes/index.html @@ -229,7 +229,7 @@

Meeting Minutes

-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+

 
diff --git a/search.json b/search.json index 8f3208bd..cec75b13 100644 --- a/search.json +++ b/search.json @@ -130,7 +130,7 @@ "href": "R/anova.html", "title": "ANOVA", "section": "", - "text": "Getting Started\nTo demonstrate the various types of sums of squares, we’ll create a data frame called df_disease taken from the SAS documentation (reference). For more information/to investigate this data go here\n\n\nThe Model\nFor this example, we’re testing for a significant difference in stem_length using ANOVA. In R, we’re using lm() to run the ANOVA, and then using broom::glance() and broom::tidy() to view the results in a table format.\n\nlm_model <- lm(y ~ drug + disease + drug*disease, df_disease)\n\nThe glance function gives us a summary of the model diagnostic values.\n\nlm_model %>% \n glance() %>% \n pivot_longer(everything())\n\n# A tibble: 12 × 2\n name value\n <chr> <dbl>\n 1 r.squared 0.456 \n 2 adj.r.squared 0.326 \n 3 sigma 10.5 \n 4 statistic 3.51 \n 5 p.value 0.00130\n 6 df 11 \n 7 logLik -212. \n 8 AIC 450. \n 9 BIC 477. \n10 deviance 5081. \n11 df.residual 46 \n12 nobs 58 \n\n\nThe tidy function gives a summary of the model results.\n\nlm_model %>% tidy()\n\n# A tibble: 12 × 5\n term estimate std.error statistic p.value\n <chr> <dbl> <dbl> <dbl> <dbl>\n 1 (Intercept) 29.3 4.29 6.84 0.0000000160\n 2 drug2 -1.33 6.36 -0.210 0.835 \n 3 drug3 -13 7.43 -1.75 0.0869 \n 4 drug4 -15.7 6.36 -2.47 0.0172 \n 5 disease2 -1.08 6.78 -0.160 0.874 \n 6 disease3 -8.93 6.36 -1.40 0.167 \n 7 drug2:disease2 6.58 9.78 0.673 0.504 \n 8 drug3:disease2 -10.9 10.2 -1.06 0.295 \n 9 drug4:disease2 0.317 9.30 0.0340 0.973 \n10 drug2:disease3 -0.900 9.00 -0.100 0.921 \n11 drug3:disease3 1.10 10.2 0.107 0.915 \n12 drug4:disease3 9.53 9.20 1.04 0.306 \n\n\n\n\nThe Results\nYou’ll see that R print the individual results for each level of the drug and disease interaction. We can get the combined F table in R using the anova() function on the model object.\n\nlm_model %>% \n anova() %>% \n tidy() %>% \n kable()\n\n\n\n\nterm\ndf\nsumsq\nmeansq\nstatistic\np.value\n\n\n\n\ndrug\n3\n3133.2385\n1044.4128\n9.455761\n0.0000558\n\n\ndisease\n2\n418.8337\n209.4169\n1.895990\n0.1617201\n\n\ndrug:disease\n6\n707.2663\n117.8777\n1.067225\n0.3958458\n\n\nResiduals\n46\n5080.8167\n110.4525\nNA\nNA\n\n\n\n\n\nWe can add a Total row, by using add_row and calculating the sum of the degrees of freedom and sum of squares.\n\nlm_model %>%\n anova() %>%\n tidy() %>%\n add_row(term = \"Total\", df = sum(.$df), sumsq = sum(.$sumsq)) %>% \n kable()\n\n\n\n\nterm\ndf\nsumsq\nmeansq\nstatistic\np.value\n\n\n\n\ndrug\n3\n3133.2385\n1044.4128\n9.455761\n0.0000558\n\n\ndisease\n2\n418.8337\n209.4169\n1.895990\n0.1617201\n\n\ndrug:disease\n6\n707.2663\n117.8777\n1.067225\n0.3958458\n\n\nResiduals\n46\n5080.8167\n110.4525\nNA\nNA\n\n\nTotal\n57\n9340.1552\nNA\nNA\nNA\n\n\n\n\n\n\n\nSums of Squares Tables\nUnfortunately, it is not easy to get the various types of sums of squares calculations in using functions from base R. However, the rstatix package offers a solution to produce these various sums of squares tables. For each type, you supply the original dataset and model to the. anova_test function, then specify the ttype and se detailed = TRUE.\n\nType I\n\ndf_disease %>% \n rstatix::anova_test(\n y ~ drug + disease + drug*disease, \n type = 1, \n detailed = TRUE) %>% \n rstatix::get_anova_table() %>% \n kable()\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nEffect\nDFn\nDFd\nSSn\nSSd\nF\np\np<.05\nges\n\n\n\n\ndrug\n3\n46\n3133.239\n5080.817\n9.456\n5.58e-05\n*\n0.381\n\n\ndisease\n2\n46\n418.834\n5080.817\n1.896\n1.62e-01\n\n0.076\n\n\ndrug:disease\n6\n46\n707.266\n5080.817\n1.067\n3.96e-01\n\n0.122\n\n\n\n\n\n\n\nType II\n\ndf_disease %>% \n rstatix::anova_test(\n y ~ drug + disease + drug*disease, \n type = 2, \n detailed = TRUE) %>% \n rstatix::get_anova_table() %>% \n kable()\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nEffect\nSSn\nSSd\nDFn\nDFd\nF\np\np<.05\nges\n\n\n\n\ndrug\n3063.433\n5080.817\n3\n46\n9.245\n6.75e-05\n*\n0.376\n\n\ndisease\n418.834\n5080.817\n2\n46\n1.896\n1.62e-01\n\n0.076\n\n\ndrug:disease\n707.266\n5080.817\n6\n46\n1.067\n3.96e-01\n\n0.122\n\n\n\n\n\n\n\nType III\n\ndf_disease %>% \n rstatix::anova_test(\n y ~ drug + disease + drug*disease, \n type = 3, \n detailed = TRUE) %>% \n rstatix::get_anova_table() %>% \n kable()\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nEffect\nSSn\nSSd\nDFn\nDFd\nF\np\np<.05\nges\n\n\n\n\n(Intercept)\n20037.613\n5080.817\n1\n46\n181.414\n0.00e+00\n*\n0.798\n\n\ndrug\n2997.472\n5080.817\n3\n46\n9.046\n8.09e-05\n*\n0.371\n\n\ndisease\n415.873\n5080.817\n2\n46\n1.883\n1.64e-01\n\n0.076\n\n\ndrug:disease\n707.266\n5080.817\n6\n46\n1.067\n3.96e-01\n\n0.122\n\n\n\n\n\n\n\nType IV\nIn R there is no equivalent operation to the Type IV sums of squares calculation in SAS." + "text": "Getting Started\nTo demonstrate the various types of sums of squares, we’ll create a data frame called df_disease taken from the SAS documentation.\n\n\nThe Model\nFor this example, we’re testing for a significant difference in stem_length using ANOVA. In R, we’re using lm() to run the ANOVA, and then using broom::glance() and broom::tidy() to view the results in a table format.\n\nlm_model <- lm(y ~ drug + disease + drug*disease, df_disease)\n\nThe glance function gives us a summary of the model diagnostic values.\n\nlm_model %>% \n glance() %>% \n pivot_longer(everything())\n\n# A tibble: 12 × 2\n name value\n <chr> <dbl>\n 1 r.squared 0.456 \n 2 adj.r.squared 0.326 \n 3 sigma 10.5 \n 4 statistic 3.51 \n 5 p.value 0.00130\n 6 df 11 \n 7 logLik -212. \n 8 AIC 450. \n 9 BIC 477. \n10 deviance 5081. \n11 df.residual 46 \n12 nobs 58 \n\n\nThe tidy function gives a summary of the model results.\n\nlm_model %>% tidy()\n\n# A tibble: 12 × 5\n term estimate std.error statistic p.value\n <chr> <dbl> <dbl> <dbl> <dbl>\n 1 (Intercept) 29.3 4.29 6.84 0.0000000160\n 2 drug2 -1.33 6.36 -0.210 0.835 \n 3 drug3 -13 7.43 -1.75 0.0869 \n 4 drug4 -15.7 6.36 -2.47 0.0172 \n 5 disease2 -1.08 6.78 -0.160 0.874 \n 6 disease3 -8.93 6.36 -1.40 0.167 \n 7 drug2:disease2 6.58 9.78 0.673 0.504 \n 8 drug3:disease2 -10.9 10.2 -1.06 0.295 \n 9 drug4:disease2 0.317 9.30 0.0340 0.973 \n10 drug2:disease3 -0.900 9.00 -0.100 0.921 \n11 drug3:disease3 1.10 10.2 0.107 0.915 \n12 drug4:disease3 9.53 9.20 1.04 0.306 \n\n\n\n\nThe Results\nYou’ll see that R print the individual results for each level of the drug and disease interaction. We can get the combined F table in R using the anova() function on the model object.\n\nlm_model %>% \n anova() %>% \n tidy() %>% \n kable()\n\n\n\n\nterm\ndf\nsumsq\nmeansq\nstatistic\np.value\n\n\n\n\ndrug\n3\n3133.2385\n1044.4128\n9.455761\n0.0000558\n\n\ndisease\n2\n418.8337\n209.4169\n1.895990\n0.1617201\n\n\ndrug:disease\n6\n707.2663\n117.8777\n1.067225\n0.3958458\n\n\nResiduals\n46\n5080.8167\n110.4525\nNA\nNA\n\n\n\n\n\nWe can add a Total row, by using add_row and calculating the sum of the degrees of freedom and sum of squares.\n\nlm_model %>%\n anova() %>%\n tidy() %>%\n add_row(term = \"Total\", df = sum(.$df), sumsq = sum(.$sumsq)) %>% \n kable()\n\n\n\n\nterm\ndf\nsumsq\nmeansq\nstatistic\np.value\n\n\n\n\ndrug\n3\n3133.2385\n1044.4128\n9.455761\n0.0000558\n\n\ndisease\n2\n418.8337\n209.4169\n1.895990\n0.1617201\n\n\ndrug:disease\n6\n707.2663\n117.8777\n1.067225\n0.3958458\n\n\nResiduals\n46\n5080.8167\n110.4525\nNA\nNA\n\n\nTotal\n57\n9340.1552\nNA\nNA\nNA\n\n\n\n\n\n\n\nSums of Squares Tables\nUnfortunately, it is not easy to get the various types of sums of squares calculations in using functions from base R. However, the rstatix package offers a solution to produce these various sums of squares tables. For each type, you supply the original dataset and model to the. anova_test function, then specify the ttype and se detailed = TRUE.\n\nType I\n\ndf_disease %>% \n rstatix::anova_test(\n y ~ drug + disease + drug*disease, \n type = 1, \n detailed = TRUE) %>% \n rstatix::get_anova_table() %>% \n kable()\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nEffect\nDFn\nDFd\nSSn\nSSd\nF\np\np<.05\nges\n\n\n\n\ndrug\n3\n46\n3133.239\n5080.817\n9.456\n5.58e-05\n*\n0.381\n\n\ndisease\n2\n46\n418.834\n5080.817\n1.896\n1.62e-01\n\n0.076\n\n\ndrug:disease\n6\n46\n707.266\n5080.817\n1.067\n3.96e-01\n\n0.122\n\n\n\n\n\n\n\nType II\n\ndf_disease %>% \n rstatix::anova_test(\n y ~ drug + disease + drug*disease, \n type = 2, \n detailed = TRUE) %>% \n rstatix::get_anova_table() %>% \n kable()\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nEffect\nSSn\nSSd\nDFn\nDFd\nF\np\np<.05\nges\n\n\n\n\ndrug\n3063.433\n5080.817\n3\n46\n9.245\n6.75e-05\n*\n0.376\n\n\ndisease\n418.834\n5080.817\n2\n46\n1.896\n1.62e-01\n\n0.076\n\n\ndrug:disease\n707.266\n5080.817\n6\n46\n1.067\n3.96e-01\n\n0.122\n\n\n\n\n\n\n\nType III\n\ndf_disease %>% \n rstatix::anova_test(\n y ~ drug + disease + drug*disease, \n type = 3, \n detailed = TRUE) %>% \n rstatix::get_anova_table() %>% \n kable()\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nEffect\nSSn\nSSd\nDFn\nDFd\nF\np\np<.05\nges\n\n\n\n\n(Intercept)\n20037.613\n5080.817\n1\n46\n181.414\n0.00e+00\n*\n0.798\n\n\ndrug\n2997.472\n5080.817\n3\n46\n9.046\n8.09e-05\n*\n0.371\n\n\ndisease\n415.873\n5080.817\n2\n46\n1.883\n1.64e-01\n\n0.076\n\n\ndrug:disease\n707.266\n5080.817\n6\n46\n1.067\n3.96e-01\n\n0.122\n\n\n\n\n\n\n\nType IV\nIn R there is no equivalent operation to the Type IV sums of squares calculation in SAS." }, { "objectID": "minutes/posts/20Nov2023.html", @@ -224,242 +224,221 @@ "text": "Conference Programme\nWe plan to showcase the CAMIS project at a number of conferences throughout 2023 and 2024. See below the list of conferences the CAMIS team will be at and please come say hello to us !\n\n\n\n\n\n\n\n\n\n\n\nConference\n2023/2024 Planning dates\n2023 Date & Location\n2023 Main Contact\nAlso attending\nDetails\n\n\n\n\nJSM (ASA conference)\nAbstract Feb 2024\n\nLeon Shih\n\n\n\n\nPHUSE US Connect\nTBC for 2024\n5-8 March 2023 Orlando, Florida\nSoma Sekhar\n\nPresentation\n\n\nDISS (Duke industry statistics symposium)\nTBC for 2024\n29-31st March 2023 Virtual\nMolly MacDiarmid (2023)\n\nPoster\n\n\nPSDM(Pharmaceutical statistics and data management)\n\n19 Apr 2023 Netherlands\n\n\n\n\n\nIASCT (ConSPIC - conference for statistics and programming in clinical research)\nAbstract submission 14 Mar-3rd Apr\n4-6 May 2023 Bengaluru, India\nHarshal Khanolkar\n\nTalk. and/or poster\n\n\nSociety of Clinical Trials (SCT\n\n21-24 May 2023\nMichael Kane\n\n\n\n\nuse R\nTBC\nJune 2024?\n\n\n\n\n\nPSI 2023 Conference\nTalk submission Nov.\nPoster submission Feb.\n11-14 June 2023 Hammersmith London West, England\nMartin Brown\nChristina Fillmore\nLyn Taylor\nMolly Macdiarmid\nMartin Brown\nAiming Yang\nOral & poster submission completed\n\n\nDIA 2023 Global Annual Meeting\n\n25-29 June 2023 Boston MA, USA\n\n\n\n\n\nJoint statistical meeting (JSM)\nFeb 2024 submission for next year\n5-10 Aug 2023 Toronto, Ontario, Canada\n\n\n\n\n\nISCB Conference\n\n27-31 Aug 2023 Milan-Italy\n\n\n\n\n\nRSS conference\nAbstract by 6th April\n4-7 sept 2023 Harrogate, England\nLyn Taylor\n\nConfirmed oral presentation\n\n\nPHUSE/FDA Quarterly meeting\nSeptember 13 (10:00 EDT/15:00 BST)\nWG can present their work, share their progress, and request any FDA \nsupport\nLyn Taylor\n\n30 min presentation\n\n\nPHUSE CSS\n15th June Abstract open, register by 30th june\n(TBC 2024 DVOST breakout sessions)\nSept 18-20, Maryland USA\nSoma Sekhar\nVikash Jain\nAditee Dani\nPoster\n\n\nASA Bio pharmaceutical Section Regulatory-industry Statistics Workshop\n\n27-29 Sept 2023 Rockville, Maryland, USA\n\n\n\n\n\nEASD 2023 - European Association for study of diabetes\n\n02-06 Oct 2023 Hamberg Germany\n\n\n\n\n\nSESUG (South East SAS user group)\n\n17-19 Oct 2023\nBrian Varney\n\n\n\n\nPHUSE EU Connect 2023\n\n5-8 November 2023 ICC Birmingham, England\nJayashree vendanayagam\n\nPresenting on shiny App for regulatory submission (will include CAMIS advert)\n\n\nR in Pharma\n\n\nNov Virtual\nBrian Varney/ Christina Fillmore?\n\nForm open. Christina to speak to Brian.\n\n\nPOSIT conf.\nInvite only\nSeptember - Chicago\nJulianne Manitz & Doug Kelkhoff\n\nR Validation Hub team will include a slide for us. (Juliane Manitz/Doug Kelkhoff)\n\n\nPHUSE (Single day events) SDEs\n\nMississauga\nJune 8th\nJayashree vendanayagam\n\nPresenting on shiny App for regulatory submission (will include CAMIS advert)\n\n\nPHUSE (Single day events) SDEs\n\nNew York (Oct 16th)\nAiming Yang\n\nEmailed host to have poster/ talk/ advert" }, { - "objectID": "SAS/kruskal_wallis.html", - "href": "SAS/kruskal_wallis.html", - "title": "Kruskal Wallis SAS", + "objectID": "SAS/ttest_1Sample.html", + "href": "SAS/ttest_1Sample.html", + "title": "One Sample t-test", "section": "", - "text": "The Kruskal-Wallis test is a non-parametric equivalent to the one-way ANOVA. For this example, the data used is a subset of R’s datasets::iris, testing for difference in sepal width between species of flower. This data was subset in R and input manually to SAS with a data step.\n\ndata iris_sub;\n input Species $ Sepal_Width;\n datalines;\nsetosa 3.4\nsetosa 3.0\nsetosa 3.4\nsetosa 3.2\nsetosa 3.5\nsetosa 3.1\nversicolor 2.7\nversicolor 2.9\nversicolor 2.7\nversicolor 2.6\nversicolor 2.5\nversicolor 2.5\nvirginica 3.0\nvirginica 3.0\nvirginica 3.1\nvirginica 3.8\nvirginica 2.7\nvirginica 3.3\n;\nrun;" + "text": "In SAS, a one sample t-test is usually performed using PROC TTEST. The one sample t-test compares the mean of the sample to a provided null hypothesis, called “h0”. The h0 value is provided as an option. By default, the h0 value is zero (0). Running the procedure produces a set of results that suggest whether or not the null hypothesis should be rejected.\n\n\nThe following data was used in this example.\n data read;\n input score count @@;\n datalines;\n 40 2 47 2 52 2 26 1 19 2\n 25 2 35 4 39 1 26 1 48 1\n 14 2 22 1 42 1 34 2 33 2\n 18 1 15 1 29 1 41 2 44 1\n 51 1 43 1 27 2 46 2 28 1\n 49 1 31 1 28 1 54 1 45 1\n ;\n\n\n\nBy default, SAS PROC TTEST t-test assumes normality in the data and uses a classic Student’s t-test.\n\n\nThe following code was used to test the comparison of a reading scores against a baseline hypothesis value of 30:\n proc ttest data=read h0=30;\n var score;\n run;\nOutput:\n\n\n\n\n\n\n\n\n\n\n\n\n\nThe SAS one sample t-test also supports lognormal analysis for a one sample t-test.\n\n\nUsing the same data as above, we will set the “DIST” option to “lognormal” to perform this analysis:\n proc ttest data=read h0=30 dist=lognormal;\n var score;\n run;\nOutput:\n\n\n\n\n\n\n\n\n\nAs can be seen in the figure above, the lognormal variation of the one sample TTEST provides results for geometric mean, coefficient of variation, and 95% confidence limits for the coefficient of variation." }, { - "objectID": "SAS/kruskal_wallis.html#introduction", - "href": "SAS/kruskal_wallis.html#introduction", - "title": "Kruskal Wallis SAS", + "objectID": "SAS/ttest_1Sample.html#normal", + "href": "SAS/ttest_1Sample.html#normal", + "title": "One Sample t-test", "section": "", - "text": "The Kruskal-Wallis test is a non-parametric equivalent to the one-way ANOVA. For this example, the data used is a subset of R’s datasets::iris, testing for difference in sepal width between species of flower. This data was subset in R and input manually to SAS with a data step.\n\ndata iris_sub;\n input Species $ Sepal_Width;\n datalines;\nsetosa 3.4\nsetosa 3.0\nsetosa 3.4\nsetosa 3.2\nsetosa 3.5\nsetosa 3.1\nversicolor 2.7\nversicolor 2.9\nversicolor 2.7\nversicolor 2.6\nversicolor 2.5\nversicolor 2.5\nvirginica 3.0\nvirginica 3.0\nvirginica 3.1\nvirginica 3.8\nvirginica 2.7\nvirginica 3.3\n;\nrun;" - }, - { - "objectID": "SAS/kruskal_wallis.html#implementing-kruskal-wallis-in-sas", - "href": "SAS/kruskal_wallis.html#implementing-kruskal-wallis-in-sas", - "title": "Kruskal Wallis SAS", - "section": "Implementing Kruskal-Wallis in SAS", - "text": "Implementing Kruskal-Wallis in SAS\nThe Kruskal-Wallis test can be implemented in SAS using the NPAR1WAY procedure with WILCOXON option. Below, the test is defined with the indicator variable (Species) by the CLASS statement, and the response variable (Sepal_Width) by the VAR statement. Adding the EXACT statement outputs the exact p-value in addition to the asymptotic result. The null hypothesis is that the samples are from identical populations.\n\nproc npar1way data=iris_sub wilcoxon;\nclass Species;\nvar Sepal_Width;\nexact;\nrun;" - }, - { - "objectID": "SAS/kruskal_wallis.html#results", - "href": "SAS/kruskal_wallis.html#results", - "title": "Kruskal Wallis SAS", - "section": "Results", - "text": "Results\n\n\n\n\n\n\n\n\n\nAs seen above, SAS outputs a table of Wilcoxon Scores for Sepal_Width by each Species including (per group): the number (N); the sum of scores; the expected sum of scores under the null hypothesis; the standard deviation under the null hypothesis, and the observed mean score. The table also includes a footnote to specify that ties were handled by using the average score.\nA table of the test results gives the Kruskal-Wallis rank sum statistic (10.922), the degrees of freedom (2), and the asymptotic p-value of the test (0.0042), and the exact p-value (0.0008). Therefore, the difference in population medians is statistically significant at the 5% level." + "text": "By default, SAS PROC TTEST t-test assumes normality in the data and uses a classic Student’s t-test.\n\n\nThe following code was used to test the comparison of a reading scores against a baseline hypothesis value of 30:\n proc ttest data=read h0=30;\n var score;\n run;\nOutput:" }, { - "objectID": "SAS/mcnemar.html", - "href": "SAS/mcnemar.html", - "title": "McNemar’s test in SAS", + "objectID": "SAS/ttest_1Sample.html#lognormal", + "href": "SAS/ttest_1Sample.html#lognormal", + "title": "One Sample t-test", "section": "", - "text": "Performing McNemar’s test in SAS\nTo demonstrate McNemar’s test in SAS, data concerning the presence or absence of cold symptoms was used. The symptoms were recorded by the same children at the age of 12 and 14. A total of 2638 participants were involved.\n\nUsing PROC FREQ\nTesting for a significant difference in cold symptoms between ages, using McNemar’s test in SAS, can be performed as below. The AGREE option is stated within the FREQ procedure to produce agreement tests and measures, including McNemar’s test.\n\nproc freq data=colds;\n tables age12*age14 / agree;\nrun;\n\n\n\nResults\n\n\n\n\n\n\n\n\n\nSAS outputs the tabulated data for proportions, the McNemar’s Chi-square statistic, and the Kappa coefficient with 95% confidence limits. There is no continuity correction used and no option to include this." + "text": "The SAS one sample t-test also supports lognormal analysis for a one sample t-test.\n\n\nUsing the same data as above, we will set the “DIST” option to “lognormal” to perform this analysis:\n proc ttest data=read h0=30 dist=lognormal;\n var score;\n run;\nOutput:\n\n\n\n\n\n\n\n\n\nAs can be seen in the figure above, the lognormal variation of the one sample TTEST provides results for geometric mean, coefficient of variation, and 95% confidence limits for the coefficient of variation." }, { - "objectID": "SAS/mmrm.html", - "href": "SAS/mmrm.html", - "title": "MMRM in SAS", + "objectID": "SAS/linear-regression.html", + "href": "SAS/linear-regression.html", + "title": "Linear Regression", "section": "", - "text": "Mixed Models\n\nFitting the MMRM in SAS\nIn SAS the following code was used (assessments at avisitn=0 should also be removed from the response variable):\n\nproc mixed data=adlbh;\n where base ne . and avisitn not in (., 99);\n class usubjid trtpn(ref=\"0\") avisitn;\n by paramcd param;\n model chg=base trtpn avisitn trtpn*avisitn / solution cl alpha=0.05 ddfm=KR;\n repeated avisitn/subject=usubjid type=&covar;\n lsmeans trtpn * avisitn / diff cl slice=avisitn;\n lsmeans trtpn / diff cl;\nrun;\n\nwhere the macro variable covar could be UN, CS or AR(1). The results were stored in .csv files that were post-processed in R and compared with the results from R." + "text": "To demonstrate the use of linear regression we examine a dataset that illustrates the relationship between Height and Weight in a group of 237 teen-aged boys and girls. The dataset is available at (../data/htwt.csv) and is imported to sas using proc import procedure.\n\nDescriptive Statistics\nThe first step is to obtain the simple descriptive statistics for the numeric variables of htwt data, and one-way frequencies for categorical variables. This is accomplished by employing proc means and proc freq procedures There are 237 participants who are from 13.9 to 25 years old. It is a cross-sectional study, with each participant having one observation. We can use this data set to examine the relationship of participants’ height to their age and sex.\n\nproc means data=htwt;\nrun;\n\n Descriptive Statistics for HTWT Data Set \n The MEANS Procedure\n\nVariable Label N Mean Std Dev Minimum Maximum\n-----------------------------------------------------------------------------\nAGE AGE 237 16.4430380 1.8425767 13.9000000 25.0000000\nHEIGHT HEIGHT 237 61.3645570 3.9454019 50.5000000 72.0000000\nWEIGHT WEIGHT 237 101.3080169 19.4406980 50.5000000 171.5000000\n----------------------------------------------------------------------------\n\n\nproc freq data=htwt;\ntables sex;\nrun;\n\n Oneway Frequency Tabulation for Sex for HTWT Data Set \n The FREQ Procedure\n\n Cumulative Cumulative\nSEX Frequency Percent Frequency Percent\n-------------------------------------------------------------\nf 111 46.84 111 46.84\nm 126 53.16 237 100.00\n\nIn order to create a regression model to demonstrate the relationship between age and height for females, we first need to create a flag variable identifying females and an interaction variable between age and female gender flag.\n\ndata htwt2;\n set htwt;\n if sex=\"f\" then female=1;\n if sex=\"m\" then female=0; \n \n *model to demonstrate interaction between female gender and age;\n fem_age = female * age; \nrun;\n\n\n\nRegression Analysis\nNext, we fit a regression model, representing the relationships between gender, age, height and the interaction variable created in the datastep above. We again use a where statement to restrict the analysis to those who are less than or equal to 19 years old. We use the clb option to get a 95% confidence interval for each of the parameters in the model. The model that we are fitting is height = b0 + b1 x female + b2 x age + b3 x fem_age + e\n\nproc reg data=htwt2;\n where age <=19;\n model height = female age fem_age / clb;\nrun; quit;\n\n Number of Observations Read 219\n Number of Observations Used 219\n\n Analysis of Variance\n\n Sum of Mean\n Source DF Squares Square F Value Pr > F\n Model 3 1432.63813 477.54604 60.93 <.0001\n Error 215 1684.95730 7.83701\n Corrected Total 218 3117.59543\n\n Root MSE 2.79947 R-Square 0.4595\n Dependent Mean 61.00457 Adj R-Sq 0.4520\n Coeff Var 4.58895\n\nWe examine the parameter estimates in the output below.\n\n Parameter Estimates\n Parameter Standard\n Variable DF Estimate Error t Value Pr > |t| 95% Confidence Limits\n Intercept 1 28.88281 2.87343 10.05 <.0001 23.21911 34.54650\n female 1 13.61231 4.01916 3.39 0.0008 5.69031 21.53432\n AGE 1 2.03130 0.17764 11.44 <.0001 1.68117 2.38144\n fem_age 1 -0.92943 0.24782 -3.75 0.0002 -1.41791 -0.44096\n\nFrom the parameter estimates table the coefficients b0,b1,b2,b3 are estimated as b0=28.88 b1=13.61 b2=2.03 b3=-0.92942\nThe resulting regression model for height, age and gender based on the available data is height=28.88281 + 13.61231 x female + 2.03130 x age -0.92943 x fem_age" }, { - "objectID": "SAS/survival.html", - "href": "SAS/survival.html", - "title": "Survival Analysis Using SAS", + "objectID": "SAS/ttest_Paired.html", + "href": "SAS/ttest_Paired.html", + "title": "Paired t-test", "section": "", - "text": "The most commonly used survival analysis methods in clinical trials include:\nAdditionally, other methods for analyzing time-to-event data are available, such as:\nWhile these models may be explored in a separate document, this particular document focuses solely on the three most prevalent methods: KM estimators, log-rank test and Cox PH model." - }, - { - "objectID": "SAS/survival.html#example-data", - "href": "SAS/survival.html#example-data", - "title": "Survival Analysis Using SAS", - "section": "Example Data", - "text": "Example Data\nData source: https://stats.idre.ucla.edu/sas/seminars/sas-survival/\nThe data include 500 subjects from the Worcester Heart Attack Study. This study examined several factors, such as age, gender and BMI, that may influence survival time after heart attack. Follow up time for all participants begins at the time of hospital admission after heart attack and ends with death or loss to follow up (censoring). The variables used here are:\n\nlenfol: length of followup, terminated either by death or censoring - time variable\nfstat: loss to followup = 0, death = 1 - censoring variable\nafb: atrial fibrillation, no = 0, 1 = yes - explanatory variable\ngender: males = 0, females = 1 - stratification factor\n\n\nlibname mylib \"..\\data\";\n\ndata dat;\nset mylib.whas500;\nlenfoly = round(lenfol/365.25, 0.01); /* change follow-up days to years for better visualization*/\nrun;" + "text": "The Paired t-test is used when two samples are naturally correlated. In the Paired t-test, the difference of the means between the two samples is compared to a given number that represents the null hypothesis. For a Paired t-test, the number of observations in each sample must be equal.\nIn SAS, a Paired t-test is typically performed using PROC TTEST.\n\n\nBy default, SAS PROC TTEST t-test assumes normality in the data and uses a classic Student’s t-test.\n\n\nThe following data was used in this example.\n data pressure;\n input SBPbefore SBPafter @@;\n datalines;\n 120 128 124 131 130 131 118 127\n 140 132 128 125 140 141 135 137\n 126 118 130 132 126 129 127 135\n ;\n\n\n\nThe following code was used to test the comparison of two paired samples of Systolic Blood Pressure before and after a procedure.\n proc ttest data=pressure;\n paired SBPbefore*SBPafter;\n run;\nOutput:\n\n\n\n\n\n\n\n\n\n\n\n\n\nThe SAS paired t-test also supports analysis of lognormal data. Here is the data used for the lognormal analysis.\n\n\n data auc;\n input TestAUC RefAUC @@;\n datalines;\n 103.4 90.11 59.92 77.71 68.17 77.71 94.54 97.51\n 69.48 58.21 72.17 101.3 74.37 79.84 84.44 96.06\n 96.74 89.30 94.26 97.22 48.52 61.62 95.68 85.80\n ;\n\n\n\nFor cases when the data is lognormal, SAS offers the “DIST” option to chose between a normal and lognormal distribution. The procedure also offers the TOST option to specify the equivalence bounds.\n proc ttest data=auc dist=lognormal tost(0.8, 1.25);\n paired TestAUC*RefAUC;\n run;\nOutput:\n\n\n\n\n\n\n\n\n\nAs can be seen in the figure above, the lognormal variation of the TTEST procedure offers additional results for geometric mean, coefficient of variation, and TOST equivalence analysis. The output also includes multiple p-values." }, { - "objectID": "SAS/survival.html#the-non-stratified-model", - "href": "SAS/survival.html#the-non-stratified-model", - "title": "Survival Analysis Using SAS", - "section": "The Non-stratified Model", - "text": "The Non-stratified Model\nFirst we try a non-stratified analysis following the mock-up above to describe the association between survival time and afb (atrial fibrillation).\nThe KM estimators and log-rank test are from PROC LIFETEST, and Cox PH model is conducted using PROC PHREG.\n\nKM estimators and log-rank test\n\nproc lifetest data=dat outsurv=_SurvEst timelist= 1 3 5 reduceout stderr; \ntime lenfoly*fstat(0);\nstrata afb;\nrun;\n\nThe landmark estimates and quartile estimates for AFB = 0 group are as shown in below:\n\n\n\n\n\n\n\n\n\nThe logrank test result is in below:\n\n\n\n\n\n\n\n\n\n\n\nCox PH model\n\nproc phreg data = dat;\nclass afb;\nmodel lenfol*fstat(0) = afb/rl;\nrun;\n\nThe hazard ratio and confidence intervals are shown as below:" + "objectID": "SAS/ttest_Paired.html#normal", + "href": "SAS/ttest_Paired.html#normal", + "title": "Paired t-test", + "section": "", + "text": "By default, SAS PROC TTEST t-test assumes normality in the data and uses a classic Student’s t-test.\n\n\nThe following data was used in this example.\n data pressure;\n input SBPbefore SBPafter @@;\n datalines;\n 120 128 124 131 130 131 118 127\n 140 132 128 125 140 141 135 137\n 126 118 130 132 126 129 127 135\n ;\n\n\n\nThe following code was used to test the comparison of two paired samples of Systolic Blood Pressure before and after a procedure.\n proc ttest data=pressure;\n paired SBPbefore*SBPafter;\n run;\nOutput:" }, { - "objectID": "SAS/survival.html#the-stratified-model", - "href": "SAS/survival.html#the-stratified-model", - "title": "Survival Analysis Using SAS", - "section": "The Stratified Model", - "text": "The Stratified Model\nIn a stratified model, the Kaplan-Meier estimators remain the same as those in the non-stratified model. To implement stratified log-rank tests and Cox proportional hazards models, simply add the STRATA option in both PROC LIFETEST and PROC PHREG.\n\n# KM estimators and log-rank test\nproc lifetest data=dat;\ntime lenfoly*fstat(0);\nstrata gender/group = afb;\nrun;\n\n# Cox PH model\nproc phreg data=dat;\nclass afb;\nmodel lenfol*fstat(0) = afb/rl;\nstrata gender;\nrun;" + "objectID": "SAS/ttest_Paired.html#lognormal", + "href": "SAS/ttest_Paired.html#lognormal", + "title": "Paired t-test", + "section": "", + "text": "The SAS paired t-test also supports analysis of lognormal data. Here is the data used for the lognormal analysis.\n\n\n data auc;\n input TestAUC RefAUC @@;\n datalines;\n 103.4 90.11 59.92 77.71 68.17 77.71 94.54 97.51\n 69.48 58.21 72.17 101.3 74.37 79.84 84.44 96.06\n 96.74 89.30 94.26 97.22 48.52 61.62 95.68 85.80\n ;\n\n\n\nFor cases when the data is lognormal, SAS offers the “DIST” option to chose between a normal and lognormal distribution. The procedure also offers the TOST option to specify the equivalence bounds.\n proc ttest data=auc dist=lognormal tost(0.8, 1.25);\n paired TestAUC*RefAUC;\n run;\nOutput:\n\n\n\n\n\n\n\n\n\nAs can be seen in the figure above, the lognormal variation of the TTEST procedure offers additional results for geometric mean, coefficient of variation, and TOST equivalence analysis. The output also includes multiple p-values." }, { - "objectID": "SAS/manova.html", - "href": "SAS/manova.html", - "title": "Multivariate Analysis of Variance in SAS", + "objectID": "SAS/summary-stats.html", + "href": "SAS/summary-stats.html", + "title": "Deriving Quantiles or Percentiles in SAS", "section": "", - "text": "Example 39.6 Multivariate Analysis of Variance from SAS MANOVA User Guide\nThis example employs multivariate analysis of variance (MANOVA) to measure differences in the chemical characteristics of ancient pottery found at four kiln sites in Great Britain. The data are from Tubb, Parker, and Nickless (1980), as reported in Hand et al. (1994).\nFor each of 26 samples of pottery, the percentages of oxides of five metals are measured. The following statements create the data set and invoke the GLM procedure to perform a one-way MANOVA. Additionally, it is of interest to know whether the pottery from one site in Wales (Llanederyn) differs from the samples from other sites; a CONTRAST statement is used to test this hypothesis.\n\n #Example code\n title \"Romano-British Pottery\";\n data pottery;\n input Site $12. Al Fe Mg Ca Na;\n datalines;\n Llanederyn 14.4 7.00 4.30 0.15 0.51\n Llanederyn 13.8 7.08 3.43 0.12 0.17\n Llanederyn 14.6 7.09 3.88 0.13 0.20\n Llanederyn 11.5 6.37 5.64 0.16 0.14\n Llanederyn 13.8 7.06 5.34 0.20 0.20\n Llanederyn 10.9 6.26 3.47 0.17 0.22\n Llanederyn 10.1 4.26 4.26 0.20 0.18\n Llanederyn 11.6 5.78 5.91 0.18 0.16\n Llanederyn 11.1 5.49 4.52 0.29 0.30\n Llanederyn 13.4 6.92 7.23 0.28 0.20\n Llanederyn 12.4 6.13 5.69 0.22 0.54\n Llanederyn 13.1 6.64 5.51 0.31 0.24\n Llanederyn 12.7 6.69 4.45 0.20 0.22\n Llanederyn 12.5 6.44 3.94 0.22 0.23\n Caldicot 11.8 5.44 3.94 0.30 0.04\n Caldicot 11.6 5.39 3.77 0.29 0.06\n IslandThorns 18.3 1.28 0.67 0.03 0.03\n IslandThorns 15.8 2.39 0.63 0.01 0.04\n IslandThorns 18.0 1.50 0.67 0.01 0.06\n IslandThorns 18.0 1.88 0.68 0.01 0.04\n IslandThorns 20.8 1.51 0.72 0.07 0.10\n AshleyRails 17.7 1.12 0.56 0.06 0.06\n AshleyRails 18.3 1.14 0.67 0.06 0.05\n AshleyRails 16.7 0.92 0.53 0.01 0.05\n AshleyRails 14.8 2.74 0.67 0.03 0.05\n AshleyRails 19.1 1.64 0.60 0.10 0.03\n ;\n run;\n \n proc glm data=pottery;\n class Site;\n model Al Fe Mg Ca Na = Site;\n contrast 'Llanederyn vs. the rest' Site 1 1 1 -3;\n manova h=_all_ / printe printh;\n run;\n\nAfter the summary information (1), PROC GLM produces the univariate analyses for each of the dependent variables (2-6). These analyses show that sites are significantly different for all oxides individually. You can suppress these univariate analyses by specifying the NOUNI option in the MODEL statement.\n1 Summary Information about Groups\n\n\n\n\n\n\n\n\n\n2 Univariate Analysis of Variance for Aluminum Oxide (AI)\n\n\n\n\n\n\n\n\n\n3 Univariate Analysis of Variance for Iron Oxide (Fe)\n\n\n\n\n\n\n\n\n\n4 Univariate Analysis of Variance for Calcium Oxide (Ca)\n\n\n\n\n\n\n\n\n\n5 Univariate Analysis of Variance for Magnesium Oxide (Mg)\n\n\n\n\n\n\n\n\n\n6 Analysis of Variance for Sodium Oxide (Na)\n\n\n\n\n\n\n\n\n\nThe PRINTE option in the MANOVA statement displays the elements of the error matrix (7), also called the Error Sums of Squares and Crossproducts matrix. The diagonal elements of this matrix are the error sums of squares from the corresponding univariate analyses.\nThe PRINTE option also displays the partial correlation matrix (7) associated with the E matrix. In this example, none of the oxides are very strongly correlated; the strongest correlation (r=0.488) is between magnesium oxide and calcium oxide.\n7 Error SSCP Matrix and Partial Correlations\n\n\n\n\n\n\n\n\n\nThe PRINTH option produces the SSCP matrix for the hypotheses being tested (Site and the contrast); (8 and 9). Since the Type III SS are the highest-level SS produced by PROC GLM by default, and since the HTYPE= option is not specified, the SSCP matrix for Site gives the Type III H matrix. The diagonal elements of this matrix are the model sums of squares from the corresponding univariate analyses.\nFour multivariate tests are computed, all based on the characteristic roots and vectors of \\(E^{-1}H\\). These roots and vectors are displayed along with the tests. All four tests can be transformed to variates that have distributions under the null hypothesis. Note that the four tests all give the same results for the contrast, since it has only one degree of freedom. In this case, the multivariate analysis matches the univariate results: there is an overall difference between the chemical composition of samples from different sites, and the samples from Llanederyn are different from the average of the other sites.\n8 Hypothesis SSCP Matrix and Multivariate Tests for Overall Site Effect\n\n\n\n\n\n\n\n\n\n9 Hypothesis SSCP Matrix and Multivariate Tests for Differences between Llanederyn and the Other Sites\n\n\n\n\n\n\n\n\n\nReferences\nSAS MANOVA User Guide" + "text": "Percentiles can be calculated in SAS using the UNIVARIATE procedure. The procedure has the option PCTLDEF which allows for five different percentile definitions to be used. The default is PCTLDEF=5, which uses the empirical distribution function to find percentiles.\nThis is how the 25th and 40th percentiles of aval in the dataset adlb could be calculated, using the default option for PCTLDEF.\n\nproc univariate data=adlb;\n var aval;\n output out=stats pctlpts=25 40 pctlpre=p;\nrun;\n\nThe pctlpre=p option tells SAS the prefix to use in the output dataset for the percentile results. In the above example, SAS will create a dataset called stats, containing variables p25 and p40." }, { - "objectID": "SAS/cmh.html", - "href": "SAS/cmh.html", - "title": "CMH Test", + "objectID": "SAS/rounding.html", + "href": "SAS/rounding.html", + "title": "Rounding in SAS", "section": "", - "text": "The CMH procedure tests for conditional independence in partial contingency tables for a 2 x 2 x K design. However, it can be generalized to tables of X x Y x K dimensions.\n\n\nThe cmh test is calculated in SAS using the PROC FREQ procedure. By default, it outputs the chi square statistic, degrees of freedom and p-value for each of the three alternative hypothesis: general association, row means differ, and nonzero correlation. It is up to the statistical analyst or statistician to know which result is appropriate for their analysis.\nWhen the design of the contingency table is 2 x 2 x K (i.e, X == 2 levels, Y == 2 levels, K >= 2 levels), the Mantel-Haenszel Common Odds Ratio (odds ratio estimate, 95% CI, P-value) and the Breslow-Day Test for Homogeneity of the Odds Ratios (chi-square statistic, degrees of freedom, P-value) are also output.\nBelow is the syntax to conduct a CMH analysis in SAS:\n\nProc freq data = filtered_data; \ntables K * X * Y / cmh; \n* the order of K, X, and Y appearing on the line is important!;\nrun;" + "text": "There are two rounding functions in SAS.\nThe round() function in SAS will round to the nearest whole number and ‘away from zero’ or ‘rounding up’ when equidistant meaning that exactly 12.5 rounds to the integer 13.\nThe rounde() function in SAS will round to the nearest whole number and ‘rounding to the even number’ when equidistant, meaning that exactly 12.5 rounds to the integer 12.\nBoth functions allow you to specify the number of decimal places you want to round to.\nFor example (See references for source of the example)\n\n #Example code\n data XXX;\n my_number=2.2; output;\n my_number=3.99; output;\n my_number=1.2345; output;\n my_number=7.876; output;\n my_number=13.8739; output;\n run;\n\n data xxx2;\n set xxx;\n r_1_dec = round(my_number, 0.1);\n r_2_dec = round(my_number, 0.01);\n r_3_dec = round(my_number, 0.001);\n \n re_1_dec = rounde(my_number, 0.1);\n re_2_dec = rounde(my_number, 0.01);\n re_3_dec = rounde(my_number, 0.001);\n run;\n\n\n\n\n\n\n\n\n\n\n\n\n\nmy_number\nr_1_dec\nr_2_de\nr_3_dec\nre_1_dec\nre_2_dec\nre_3_dec\n\n\n\n\n2.2\n2.2\n2.2\n2.2\n2.2\n2.2\n2.2\n\n\n3.99\n4\n3.99\n3.99\n4\n3.99\n3.99\n\n\n1.2345\n1.2\n1.23\n1.235\n1.2\n1.23\n1.234\n\n\n7.876\n7.9\n7.88\n7.876\n7.9\n7.88\n7.876\n\n\n13.8739\n13.9\n13.87\n13.874\n13.9\n13.87\n13.874\n\n\n\nReferences\nHow to Round Numbers in SAS - SAS Example Code" }, { - "objectID": "SAS/cmh.html#cmh-in-sas", - "href": "SAS/cmh.html#cmh-in-sas", - "title": "CMH Test", + "objectID": "SAS/anova.html", + "href": "SAS/anova.html", + "title": "linear-models", "section": "", - "text": "The cmh test is calculated in SAS using the PROC FREQ procedure. By default, it outputs the chi square statistic, degrees of freedom and p-value for each of the three alternative hypothesis: general association, row means differ, and nonzero correlation. It is up to the statistical analyst or statistician to know which result is appropriate for their analysis.\nWhen the design of the contingency table is 2 x 2 x K (i.e, X == 2 levels, Y == 2 levels, K >= 2 levels), the Mantel-Haenszel Common Odds Ratio (odds ratio estimate, 95% CI, P-value) and the Breslow-Day Test for Homogeneity of the Odds Ratios (chi-square statistic, degrees of freedom, P-value) are also output.\nBelow is the syntax to conduct a CMH analysis in SAS:\n\nProc freq data = filtered_data; \ntables K * X * Y / cmh; \n* the order of K, X, and Y appearing on the line is important!;\nrun;" + "text": "Getting Started\nTo demonstrate the various types of sums of squares, we’ll create a data frame called df_disease taken from the SAS documentation.\n\n\nThe Model\nFor this example, we’re testing for a significant difference in stem_length using ANOVA.\n\nproc glm;\n class drug disease;\n model y=drug disease drug*disease;\nrun;\n\n\n\nSums of Squares Tables\nSAS has four types of sums of squares calculations. To get these calculations, the sum of squares option needs to be added (/ ss1 ss2 ss3 ss4) to the model statement.\n\nproc glm;\n class drug disease;\n model y=drug disease drug*disease / ss1 ss2 ss3 ss4;\nrun;\n\n\nType I\n\n\n\n\n\n\n\n\n\n\n\nType II\n\n\n\n\n\n\n\n\n\n\n\nType III\n\n\n\n\n\n\n\n\n\n\n\nType IV" }, { - "objectID": "SAS/ttest_2Sample.html", - "href": "SAS/ttest_2Sample.html", - "title": "Independant Two-Sample t-test", + "objectID": "Comp/r-sas_manova.html", + "href": "Comp/r-sas_manova.html", + "title": "Multivariate Analysis of Variance in R vs SAS", "section": "", - "text": "Independent Two-Sample t-test in SAS\nThe null hypothesis of the Independent Samples t-test is, the means for the two populations are equal.\nIn SAS the following code was used to test the mean comparison (mean of Weight Gain) of two independent treatment groups (Treatment and Placebo).\nFor this example, we’re testing the significant difference in mean of Weight gain (WtGain) between treatment and placebo (trt_grp) using PROC TTEST procedure in SAS.\n\n proc ttest data=d1; \n class trt_grp; \n var WtGain; \n run; \n\nOutput:\n Figure 1: Test results for independent t-test using PROC TTEST in SAS\n\n\n\n\n\n\n\n\n\nHere the t-value is –0.70, degrees of freedom is 30 and P value is 0.4912 which is greater than 0.05, so we accept the null hypothesis that there is no evidence of a significant difference between the means of treatment groups. The mean in placebo group is 75.1875 and mean in Treatment group is 83.1250. The mean difference the treatment groups (Treatment-Placebo) is –7.9375 and the 95% CI for the mean difference is [–31.1984, 15.3234]. The 95% confidence interval includes a treatment difference of 0, which supports the conclusion that the data fail to provide any evidence of a difference between the treatment groups.\nNote: Before entering straight into the t-test we need to check whether the assumptions (like the equality of variance, the observations should be independent, observations should be normally distributed) are met or not. If normality is not satisfied, we may consider using a suitable non-parametric test.\n\nNormality: You can check for data to be normally distributed by plotting a histogram of the data by treatment. Alternatively, you can use the Shapiro-Wilk test or the Kolmogorov-Smirnov test. If the test is <0.05 and your sample is quite small then this suggests you should not use the t-test. However, if your sample in each treatment group is large (say >30 in each group), then you do not need to rely so heavily on the assumption that the data have an underlying normal distribution in order to apply the two-sample t-test. This is where plotting the data using histograms can help to support investigation into the normality assumption. We have checked the normality of the observations using the code below. Here for both the treatment groups we have P value greater than 0.05 (Shapiro-Wilk test is used), therefore the normality assumption is there for our data.\n\n\n proc univariate data=d1 normal; \n qqplot WtGain; \n by trt_grp; \n run; \n\nOutput:\n Figure 2: The results of normality test for Treatment group\n\n\n\n\n\n\n\n\n\n Figure 3: The results of normality test for Placebo group\n\n\n\n\n\n\n\n\n\n\nHomogeneity of variance (or Equality of variance): Homogeniety of variance will be tested by default in PROC TTEST itself by Folded F-test. In our case the P values is 0.6981 which is greater than 0.05. So we accept the null hypothesis of F-test, i.e. variances are same. Then we will consider the pooled method for t-test. If the F test is statistically significant (p<0.05), then the pooled t-test may give erroneous results. In this instance, if it is believed that the population variances may truly differ, then the Satterthwaite (unequal variances) analysis results should be used. These are provided in the SAS output alongside the Pooled results as default.\n\nOutput:\n Figure 4: Folded F-test result in PROC TTEST" + "text": "MANOVA: Testing for group mean vectors are the same vs at least one is different\nWhen applying the following hypothesis, SAS and R match identically see R and SAS.\n\nH0: Group mean vectors are the same for all groups or they don’t differ significantly.\nH1: At least one of the group mean vectors is different from the rest.\n\nHowever, if interest was in comparing 1 level of a parameter vs the others, this was only achieved using SAS. Contrast statements in SAS were easy to implement as shown here SAS however R did not replicate these results and to date a solution has not been found.\nNOTE: if you feel you can help with the above discrepancy please contribute to the CAMIS repo by following the instructions on the contributions page." }, { - "objectID": "Comp/r-sas_mcnemar.html", - "href": "Comp/r-sas_mcnemar.html", - "title": "R v SAS McNemar’s test", + "objectID": "Comp/r-sas_chi-sq.html", + "href": "Comp/r-sas_chi-sq.html", + "title": "R/SAS Chi-Squared Comparision", "section": "", - "text": "McNemar’s test; R and SAS\nIn R, the mcNemar function from the epibasix package can be used to perform McNemar’s test.\n\nX<-table(colds$age12,colds$age14)\nsummary(mcNemar(X))\n\nThe FREQ procedure can be used in SAS with the AGREE option to run the McNemar test, with OR, and RISKDIFF options stated for production of odds ratios and risk difference. These options were added as epibasix::mcNemar outputs the odds ratio and risk difference with confidence limits as default. In contrast to R, SAS outputs the Kappa coefficients with confident limits as default.\n\nproc freq data=colds;\n tables age12*age14 / agree or riskdiff;\nrun;\n\nWhen calculating the odds ratio and risk difference confidence limits, SAS is not treating the data as matched-pairs. There is advice on the SAS blog and SAS support page to amend this, which requires a lot of additional coding.\nR is using Edward’s continuity correction with no option to remove this. In contrast, there is no option to include Edward’s continuity correction in SAS, but this can be manually coded to agree with R. However, its use is controversial due to being seen as overly conservative.\nR’s use of the continuity correction is consistent with other functions within the epibasix package, which was categorised as ‘High Risk’ by the Risk Assessment Shiny App created by the R Validation Hub. Risk is quantified by the app through a number of metrics relating to maintenance and community usage. It was found that the author is no longer maintaining the package and there was no documentation available for certain methods used. Therefore, the use of the epibasix package is advised against and other packages may be more suitable.\nThe mcnemar.test function in the stats package provides the option to remove continuity corrections which results in a match with SAS. This function does not output any other coefficients for agreement/difference in proportions etc. but (if required) these can be achieved within other functions and/or packages.\n\nmcnemar.test(X, correct = FALSE)" + "text": "Chi-Squared test is a hypothesis test for independent contingency tables, dependent on rows and column totals. The test assumes:\n\nobservations are independent of each other\nall values are 1 or more and at least 80% of the cells are greater than 5.\ndata should be categorical\n\nThe Chi-Squared statistic is found by:\n\\[\n\\chi^2=\\frac{\\sum(O-E)^2}{E}\n\\]\nWhere O is the observed and E is the expected.\nFor an r x c table (where r is the number of rows and c the number of columns), the Chi-squared distribution’s degrees of freedom is (r-1)*(c-1). The resultant statistic with correct degrees of freedom follows this distribution when its expected values are aligned with the assumptions of the test, under the null hypothesis. The resultant p value informs the magnitude of disagreement with the null hypothesis and not the magnitude of association\nFor this example we will use data about cough symptoms and history of bronchitis.\n\nbronch <- matrix(c(26, 247, 44, 1002), ncol = 2)\nrow.names(bronch) <- c(\"cough\", \"no cough\")\ncolnames(bronch) <- c(\"bronchitis\", \"no bronchitis\")\n\nTo a chi-squared test in R you will use the following code.\n\nchisq.test(bronch)\n\n\n Pearson's Chi-squared test with Yates' continuity correction\n\ndata: bronch\nX-squared = 11.145, df = 1, p-value = 0.0008424\n\n\nTo run a chi-squared test in SAS you used the following code.\n\nproc freq data=proj1.bronchitis;\ntables Cough*Bronchitis / chisq;\nrun;\n\nThe result in the “Chi-Square” section of the results table in SAS will not match R, in this case it will be 12.1804 with a p-value of 0.0005. This is because by default R does a Yate’s continuity adjustment. To change this set correct to false.\n\nchisq.test(bronch, correct = FALSE)\n\n\n Pearson's Chi-squared test\n\ndata: bronch\nX-squared = 12.18, df = 1, p-value = 0.0004829\n\n\nAlternatively, SAS also produces the correct chi-square value by default. It is the “Continuity Adj. Chi-Square” value in the results table." }, { - "objectID": "Comp/r-sas_mmrm.html", - "href": "Comp/r-sas_mmrm.html", - "title": "R vs SAS MMRM", + "objectID": "Comp/r-sas_linear-regression.html", + "href": "Comp/r-sas_linear-regression.html", + "title": "R vs SAS Linear Regression", "section": "", - "text": "In this vignette we briefly compare the mmrm::mmrm, SAS’s PROC GLIMMIX, nlme::gls, lme4::lmer, and glmmTMB::glmmTMB functions for fitting mixed models for repeated measures (MMRMs). A primary difference in these implementations lies in the covariance structures that are supported “out of the box”. In particular, PROC GLIMMIX and mmrm are the only procedures which provide support for many of the most common MMRM covariance structures. Most covariance structures can be implemented in gls, though users are required to define them manually. lmer and glmmTMB are more limited. We find that mmmrm converges more quickly than other R implementations while also producing estimates that are virtually identical to PROC GLIMMIX’s." + "text": "Summary of R vs SAS Comparison for Linear Regression\nTo date the lm function in R and proc reg in sas have been found to 100% agree.\nSee R and SAS for more detail." }, { - "objectID": "Comp/r-sas_mmrm.html#fev-data", - "href": "Comp/r-sas_mmrm.html#fev-data", - "title": "R vs SAS MMRM", - "section": "FEV Data", - "text": "FEV Data\nThe FEV dataset contains measurements of FEV1 (forced expired volume in one second), a measure of how quickly the lungs can be emptied. Low levels of FEV1 may indicate chronic obstructive pulmonary disease (COPD). It is summarized below.\n Stratified by ARMCD\n Overall PBO TRT\n n 800 420 380\n USUBJID (%)\n PT[1-200] 200 105 (52.5) 95 (47.5)\n AVISIT\n VIS1 200 105 95\n VIS2 200 105 95\n VIS3 200 105 95\n VIS4 200 105 95\n RACE (%)\n Asian 280 (35.0) 152 (36.2) 128 (33.7)\n Black or African American 300 (37.5) 184 (43.8) 116 (30.5)\n White 220 (27.5) 84 (20.0) 136 (35.8)\n SEX = Female (%) 424 (53.0) 220 (52.4) 204 (53.7)\n FEV1_BL (mean (SD)) 40.19 (9.12) 40.46 (8.84) 39.90 (9.42)\n FEV1 (mean (SD)) 42.30 (9.32) 40.24 (8.67) 44.45 (9.51)\n WEIGHT (mean (SD)) 0.52 (0.23) 0.52 (0.23) 0.51 (0.23)\n VISITN (mean (SD)) 2.50 (1.12) 2.50 (1.12) 2.50 (1.12)\n VISITN2 (mean (SD)) -0.02 (1.03) 0.01 (1.07) -0.04 (0.98)" + "objectID": "Comp/r-sas_anova.html", + "href": "Comp/r-sas_anova.html", + "title": "R vs SAS Linear Models", + "section": "", + "text": "Matching Contrasts: R and SAS\nIt is recommended to use the emmeans package when attempting to match contrasts between R and SAS. In SAS, all contrasts must be manually defined, whereas in R, we have many ways to use pre-existing contrast definitions. The emmeans package makes simplifies this process, and provides syntax that is similar to the syntax of SAS.\nThis is how we would define a contrast in SAS.\n\n# In SAS\nproc glm data=work.mycsv;\n class drug;\n model post = drug pre / solution;\n estimate 'C vs A' drug -1 1 0;\n estimate 'E vs CA' drug -1 -1 2;\nrun;\n\nAnd this is how we would define the same contrast in R, using the emmeans package.\n\nlm(formula = post ~ pre + drug, data = df_trial) %>% \n emmeans(\"drug\") %>% \n contrast(method = list(\n \"C vs A\" = c(-1, 1, 0),\n \"E vs CA\" = c(-1, -1, 2)\n ))\n\nNote, however, that there are some cases where the scale of the parameter estimates between SAS and R is off, though the test statistics and p-values are identical. In these cases, we can adjust the SAS code to include a divisor. As far as we can tell, this difference only occurs when using the predefined Base R contrast methods like contr.helmert.\n\nproc glm data=work.mycsv;\n class drug;\n model post = drug pre / solution;\n estimate 'C vs A' drug -1 1 0 / divisor = 2;\n estimate 'E vs CA' drug -1 -1 2 / divisor = 6;\nrun;" }, { - "objectID": "Comp/r-sas_mmrm.html#bcva-data", - "href": "Comp/r-sas_mmrm.html#bcva-data", - "title": "R vs SAS MMRM", - "section": "BCVA Data", - "text": "BCVA Data\nThe BCVA dataset contains data from a randomized longitudinal ophthalmology trial evaluating the change in baseline corrected visual acuity (BCVA) over the course of 10 visits. BCVA corresponds to the number of letters read from a visual acuity chart. A summary of the data is given below:\n Stratified by ARMCD\n Overall CTL TRT\n n 8605 4123 4482\n USUBJID (%)\n PT[1-1000] 1000 494 (49.4) 506 (50.6)\n AVISIT\n VIS1 983 482 501\n VIS2 980 481 499\n VIS3 960 471 489\n VIS4 946 458 488\n VIS5 925 454 471\n VIS6 868 410 458\n VIS7 816 388 428\n VIS8 791 371 420\n VIS9 719 327 392\n VIS10 617 281 336\n RACE (%)\n Asian 297 (29.7) 151 (30.6) 146 (28.9)\n Black or African American 317 (31.7) 149 (30.1) 168 (33.2)\n White 386 (38.6) 194 (39.3) 192 (37.9)\n BCVA_BL (mean (SD)) 75.12 (9.93) 74.90 (9.76) 75.40 (10.1)\n BCVA_CHG (mean (SD))\n VIS1 5.59 (1.31) 5.32 (1.23) 5.86 (1.33)\n VIS10 9.18 (2.91) 7.49 (2.58) 10.60 (2.36)" + "objectID": "Comp/r-sas_rounding.html", + "href": "Comp/r-sas_rounding.html", + "title": "R v SAS rounding", + "section": "", + "text": "Rounding; R and SAS\nOn comparing the documentation of rounding rules for both languages, it will be noted that the default rounding rule (implemented in the respective language’s round() function) are different. Numerical differences arise in the knife-edge case where the number being rounded is equidistant between the two possible results. The round() function in SAS will round the number ‘away from zero’, meaning that 12.5 rounds to the integer 13. The round() function in Base R will round the number ‘to even’, meaning that 12.5 rounds to the integer 12. SAS does provide the rounde() function which rounds to even and the janitor package in R contains a function round_half_up() that rounds away from zero. In this use case, SAS produces a correct result from its round() function, based on its documentation, as does R. Both are right based on what they say they do, but they produce different results (Rimler, M.S. et al.).\nReferences\nRimler M.S., Rickert J., Jen M-H., Stackhouse M. Understanding differences in statistical methodology implementations across programming languages (2022, Fall). ASA Biopharmaceutical Report Issue 3, Volume 29.  Retrieved from https://higherlogicdownload.s3.amazonaws.com/AMSTAT/fa4dd52c-8429-41d0-abdf-0011047bfa19/UploadedImages/BIOP%20Report/BioPharm_fall2022FINAL.pdf" }, { - "objectID": "Comp/r-sas_mmrm.html#ante-dependence-heterogeneous", - "href": "Comp/r-sas_mmrm.html#ante-dependence-heterogeneous", - "title": "R vs SAS MMRM", - "section": "Ante-dependence (heterogeneous)", - "text": "Ante-dependence (heterogeneous)\n\nPROC GLIMMIX\nPROC GLIMMIX DATA = fev_data;\nCLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;\nMODEL FEV1 = AVISIT|ARMCD / ddfm=satterthwaite solution chisq;\nRANDOM AVISIT / subject=USUBJID type=ANTE(1);\n\n\n\nmmrm\nmmrm(\n formula = FEV1 ~ ARMCD * AVISIT + adh(VISITN | USUBJID),\n data = fev_data\n)" + "objectID": "Comp/r-sas_survival.html", + "href": "Comp/r-sas_survival.html", + "title": "R vs SAS - Kaplan Meier and Cox-proportion hazards modelling", + "section": "", + "text": "Results from the examples shown for R here and SAS here were compared below.\nComparing the non-stratified model results side-by-side, the CIs for the quartile estimates and landmark estimates are different between R and SAS. HR and CI also have slight differences.\n\n\n\n\n\n\n\n\n\n\n\nThe default methods for handling ties in a Cox regression model are different which can lead to a different result for the Hazard ratio and associated confidence interval.\nR uses “efron” by default. SAS uses “breslow” by default. Both R and SAS are able to change these default options. By making the changes to the code below, we can force R to use “breslow” to match SAS, or SAS to use “efron” to match R. When the software use the same methods, then we obtain an identical HR and CI.\n\nR: change method for ties to use “breslow”\n\n\nfit.cox <- coxph(Surv(LENFOLY, FSTAT) ~ AFB, ties = \"breslow\", data = dat)\n\n\nSAS: change method for ties to use “efron”\n\n\nproc phreg data=dat;\nclass afb;\nmodel lenfol*fstat(0) = afb/rl ties = efron;\nrun;\n\nIf there are no tied event times, then the methods are equivalent.\nThe Breslow approximation is the easiest to program and hence it historically became the first option coded for almost all software. It then ended up as the default option when other options were added in order to maintain “backwards compatibility”. The Efron option is more accurate if there are a large number of ties, and it was therefore selected as the default option in R. In practice the number of ties is usually small, in which case all the methods are statistically indistinguishable.\nFrom the arguments of coxph in R, there are three possible choices for handling tied event times ‘ties=breslow’, ‘ties=efron’, or ‘ties=’clogit’. This last option is an exact partial likelihood approach. See [here](\ncoxph function - RDocumentation) for more detail.\n\n\n\nThe default methods for calculation of the confidence interval of a KM estimator are different in the two languages (for example, for calculation of the CI associated with the Median Survival estimate, the 25th percentile and the 75th percentile).\nR uses “log” by default, and SAS uses “log-log” by default. As shown below, using ‘conf.type’ option, R can be forced to use the “log-log” method to match SAS. Alternatively, using the ‘conftype=’ option, SAS can be forced to use the “log” method to match R.\n\nR: change to “log-log”\n\n\nfit.km <- survfit(Surv(LENFOLY, FSTAT) ~ AFB, conf.type = \"log-log\", data = dat)\n\n\nSAS: change to “log”\n\n\nproc lifetest data=dat conftype=log;\ntime lenfoly*fstat(0);\nstrata afb;\nrun;\n\n“log-log” prevents the problem of having confidence intervals of >1 or <0, which might happen if using “log” transformation. However, both R and SAS will clip the interval at [0, 1] and report a bound >1 as 1 and <0 as 0.\nFrom a reference: The appeal of the log-log interval is clear, but the log-scale interval has the advantage of variance stabilization. As a result, simulation studies have generally found it to have better (closer to nominal) coverage; for this reason, it is the default in the survival package.\nNow if we change the confidence interval type in SAS to “log” and tie handling to “efron”, the results will be identical to the results in R.\n\n\n\n\n\n\n\n\n\nBelow is the side-by-side comparison for stratified analysis with default methods in SAS matched to R’s, the results are also identical." }, { - "objectID": "Comp/r-sas_mmrm.html#ante-dependence-homogeneous", - "href": "Comp/r-sas_mmrm.html#ante-dependence-homogeneous", - "title": "R vs SAS MMRM", - "section": "Ante-dependence (homogeneous)", - "text": "Ante-dependence (homogeneous)\n\nmmrm\nmmrm(\n formula =FEV1 ~ ARMCD * AVISIT + ad(VISITN | USUBJID),\n data = fev_data\n)" + "objectID": "Comp/r-sas_survival.html#reason-1-cox-regression-handling-of-tied-survival-times", + "href": "Comp/r-sas_survival.html#reason-1-cox-regression-handling-of-tied-survival-times", + "title": "R vs SAS - Kaplan Meier and Cox-proportion hazards modelling", + "section": "", + "text": "The default methods for handling ties in a Cox regression model are different which can lead to a different result for the Hazard ratio and associated confidence interval.\nR uses “efron” by default. SAS uses “breslow” by default. Both R and SAS are able to change these default options. By making the changes to the code below, we can force R to use “breslow” to match SAS, or SAS to use “efron” to match R. When the software use the same methods, then we obtain an identical HR and CI.\n\nR: change method for ties to use “breslow”\n\n\nfit.cox <- coxph(Surv(LENFOLY, FSTAT) ~ AFB, ties = \"breslow\", data = dat)\n\n\nSAS: change method for ties to use “efron”\n\n\nproc phreg data=dat;\nclass afb;\nmodel lenfol*fstat(0) = afb/rl ties = efron;\nrun;\n\nIf there are no tied event times, then the methods are equivalent.\nThe Breslow approximation is the easiest to program and hence it historically became the first option coded for almost all software. It then ended up as the default option when other options were added in order to maintain “backwards compatibility”. The Efron option is more accurate if there are a large number of ties, and it was therefore selected as the default option in R. In practice the number of ties is usually small, in which case all the methods are statistically indistinguishable.\nFrom the arguments of coxph in R, there are three possible choices for handling tied event times ‘ties=breslow’, ‘ties=efron’, or ‘ties=’clogit’. This last option is an exact partial likelihood approach. See [here](\ncoxph function - RDocumentation) for more detail." }, { - "objectID": "Comp/r-sas_mmrm.html#auto-regressive-heterogeneous", - "href": "Comp/r-sas_mmrm.html#auto-regressive-heterogeneous", - "title": "R vs SAS MMRM", - "section": "Auto-regressive (heterogeneous)", - "text": "Auto-regressive (heterogeneous)\n\nPROC GLIMMIX\nPROC GLIMMIX DATA = fev_data;\nCLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;\nMODEL FEV1 = AVISIT|ARMCD / ddfm=satterthwaite solution chisq;\nRANDOM AVISIT / subject=USUBJID type=ARH(1);\n\n\n\nmmrm\nmmrm(\n formula = FEV1 ~ ARMCD * AVISIT + ar1h(VISITN | USUBJID),\n data = fev_data\n)\n\n\n\ngls\ngls(\n formula = FEV1 ~ ARMCD * AVISIT,\n data = fev_data,\n correlation = corCAR1(form = ~AVISIT | USUBJID),\n weights = varIdent(form = ~1|AVISIT),\n na.action = na.omit\n)" + "objectID": "Comp/r-sas_survival.html#reason-2-kaplan-meier-median-survival-confidence-intervals", + "href": "Comp/r-sas_survival.html#reason-2-kaplan-meier-median-survival-confidence-intervals", + "title": "R vs SAS - Kaplan Meier and Cox-proportion hazards modelling", + "section": "", + "text": "The default methods for calculation of the confidence interval of a KM estimator are different in the two languages (for example, for calculation of the CI associated with the Median Survival estimate, the 25th percentile and the 75th percentile).\nR uses “log” by default, and SAS uses “log-log” by default. As shown below, using ‘conf.type’ option, R can be forced to use the “log-log” method to match SAS. Alternatively, using the ‘conftype=’ option, SAS can be forced to use the “log” method to match R.\n\nR: change to “log-log”\n\n\nfit.km <- survfit(Surv(LENFOLY, FSTAT) ~ AFB, conf.type = \"log-log\", data = dat)\n\n\nSAS: change to “log”\n\n\nproc lifetest data=dat conftype=log;\ntime lenfoly*fstat(0);\nstrata afb;\nrun;\n\n“log-log” prevents the problem of having confidence intervals of >1 or <0, which might happen if using “log” transformation. However, both R and SAS will clip the interval at [0, 1] and report a bound >1 as 1 and <0 as 0.\nFrom a reference: The appeal of the log-log interval is clear, but the log-scale interval has the advantage of variance stabilization. As a result, simulation studies have generally found it to have better (closer to nominal) coverage; for this reason, it is the default in the survival package.\nNow if we change the confidence interval type in SAS to “log” and tie handling to “efron”, the results will be identical to the results in R.\n\n\n\n\n\n\n\n\n\nBelow is the side-by-side comparison for stratified analysis with default methods in SAS matched to R’s, the results are also identical." }, { - "objectID": "Comp/r-sas_mmrm.html#auto-regressive-homogeneous", - "href": "Comp/r-sas_mmrm.html#auto-regressive-homogeneous", - "title": "R vs SAS MMRM", - "section": "Auto-regressive (homogeneous)", - "text": "Auto-regressive (homogeneous)\n\nPROC GLIMMIX\nPROC GLIMMIX DATA = fev_data;\nCLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;\nMODEL FEV1 = ARMCD|AVISIT / ddfm=satterthwaite solution chisq;\nRANDOM AVISIT / subject=USUBJID type=AR(1);\n\n\n\nmmrm\nmmrm(\n formula = FEV1 ~ ARMCD * AVISIT + ar1(VISITN | USUBJID),\n data = fev_data\n)\n\n\n\ngls\ngls(\n formula = FEV1 ~ ARMCD * AVISIT,\n data = fev_data,\n correlation = corCAR1(form = ~AVISIT | USUBJID),\n na.action = na.omit\n)\n\n\n\nglmmTMB\nglmmTMB(\n FEV1 ~ ARMCD * AVISIT + ar1(0 + AVISIT | USUBJID),\n dispformula = ~ 0,\n data = fev_data\n)" + "objectID": "Comp/r-sas_survival.html#differences-observed-in-the-km-estimators", + "href": "Comp/r-sas_survival.html#differences-observed-in-the-km-estimators", + "title": "R vs SAS - Kaplan Meier and Cox-proportion hazards modelling", + "section": "Differences Observed in the KM Estimators", + "text": "Differences Observed in the KM Estimators\nSuppose we are interested to know the 25%, 50% and 75% quartile estimates, and the day 80, 100, and 120 estimates.\nBelow is the R code:\n\nfit.km <- survfit(Surv(time, status) ~ 1, conf.type = \"log-log\", data = test)\n## quantile estimates\nquantile(fit.km, probs = c(0.25, 0.5, 0.75))\n## landmark estimates at 80, 100, 120-day\nsummary(fit.km, times = c(80, 100, 120), extend = T)\n\nBelow is the SAS code:\n\nproc lifetest data=dat outsurv=_SurvEst timelist= 80 100 120 reduceout stderr; \ntime lenfoly*fstat(0);\nrun;\n\nBelow is the side-by-side comparison:" }, { - "objectID": "Comp/r-sas_mmrm.html#compound-symmetry-heterogeneous", - "href": "Comp/r-sas_mmrm.html#compound-symmetry-heterogeneous", - "title": "R vs SAS MMRM", - "section": "Compound symmetry (heterogeneous)", - "text": "Compound symmetry (heterogeneous)\n\nPROC GLIMMIX\nPROC GLIMMIX DATA = fev_data;\nCLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;\nMODEL FEV1 = AVISIT|ARMCD / ddfm=satterthwaite solution chisq;\nRANDOM AVISIT / subject=USUBJID type=CSH;\n\n\n\nmmrm\nmmrm(\n formula = FEV1 ~ ARMCD * AVISIT + csh(VISITN | USUBJID),\n data = fev_data\n)\n\n\n\ngls\ngls(\n formula = FEV1 ~ ARMCD * AVISIT,\n data = fev_data,\n correlation = corCompSymm(form = ~AVISIT | USUBJID),\n weights = varIdent(form = ~1|AVISIT),\n na.action = na.omit\n)\n\n\n\nglmmTMB\nglmmTMB(\n FEV1 ~ ARMCD * AVISIT + cs(0 + AVISIT | USUBJID),\n dispformula = ~ 0,\n data = fev_data\n)" + "objectID": "Comp/r-sas_survival.html#reasons", + "href": "Comp/r-sas_survival.html#reasons", + "title": "R vs SAS - Kaplan Meier and Cox-proportion hazards modelling", + "section": "Reasons", + "text": "Reasons\nThe reasons for the differences are because:\nReason 1: Survival estimate does not cross the 50% percentile.\nThe kth quantile for a survival curve S(t) is the location at which a horizontal line at height p= 1-k intersects the plot of S(t) as shown in the KM curve below. Since S(t) is a step function, it is possible for the curve to have a horizontal segment at exactly 1-k, in which case the midpoint of the horizontal segment is returned.\nFor example, using the data above, the survival probability is exactly 0.5 at time=87 and remains at 0.5 until the last censored observation at 118.\n\n\n\n\n\n\n\n\n\nWhen using R, the median is the smallest time which survival estimate is <= 0.5 –> (87+118) / 2 = 102.5. However, SAS searches the smallest time which survival estimate is < 0.5, which does not exist in this dataset, so it gives “NE” (Not evaluable).\n\npl <- survminer::ggsurvplot(fit.km, \n conf.int = TRUE,\n ggtheme = theme_light()) \npl$plot + geom_hline(yintercept = 0.5, color = \"black\", linetype = \"solid\") \n\nsummary(fit.km)\n\nReason 2: Last event censored and prior to the required landmark estimate.\nFor the 120-day event-free estimate, SAS considers that 120 days is beyond the maximum observed day in the data (which was a censored event at time =118). Therefore, SAS considers this as Unknown and returns a result of “NE” (Not-evaluable). However, R uses the rate at last observed censored date to estimate the 120-day event free rate. As the event-free estimate at time of the last censored event at 118 was 0.5 (0.184, 0.753), R makes the assumption that this is the best estimate for the event-free rate at Time =120.\nIf we change the last observation in the dataset to be an event (instead of censored), R and SAS will both give 0 for the event-free survival estimate, because it is for sure that all subjects did not survive beyond 120 days.\n\ntest <- tibble(time = c(54, 75, 77, 84, 87, 92, 103, 105, 112, 118), \n status = c(1, 1, 1, 1, 1, 0, 0, 0, 0, 1))\n\ntest\n\n# A tibble: 10 × 2\n time status\n <dbl> <dbl>\n 1 54 1\n 2 75 1\n 3 77 1\n 4 84 1\n 5 87 1\n 6 92 0\n 7 103 0\n 8 105 0\n 9 112 0\n10 118 1" }, { - "objectID": "Comp/r-sas_mmrm.html#compound-symmetry-homogeneous", - "href": "Comp/r-sas_mmrm.html#compound-symmetry-homogeneous", - "title": "R vs SAS MMRM", - "section": "Compound symmetry (homogeneous)", - "text": "Compound symmetry (homogeneous)\n\nPROC GLIMMIX\nPROC GLIMMIX DATA = fev_data;\nCLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;\nMODEL FEV1 = AVISIT|ARMCD / ddfm=satterthwaite solution chisq;\nRANDOM AVISIT / subject=USUBJID type=CS;\n\n\n\nmmrm\nmmrm(\n formula = FEV1 ~ ARMCD * AVISIT + cs(VISITN | USUBJID),\n data = fev_data\n)\n\n\n\ngls\ngls(\n formula = FEV1 ~ ARMCD * AVISIT,\n data = fev_data,\n correlation = corCompSymm(form = ~AVISIT | USUBJID),\n na.action = na.omit\n)" + "objectID": "Comp/r-sas_cmh.html", + "href": "Comp/r-sas_cmh.html", + "title": "R vs SAS CMH", + "section": "", + "text": "The CMH procedure tests for conditional independence in partial contingency tables for a 2 x 2 x K design. However, it can be generalized to tables of X x Y x K dimensions.\n\n\n\n\n\n\n\n\n\n\n\nFor the remainder of this document, we adopt the following naming convention when referring to variables of a contingency table:\n\nX = exposure\nY = response\nK = control\n\n\n\n\nThe scale of the exposure (X) and response (Y) variables dictate which test statistic is computed for the contingency table. Each test statistic is evaluated on different degrees of freedom (df):\n\nGeneral association statistic (X and Y both nominal) results in (X-1) * (Y-1) dfs\nRow mean scores statistic (X is nominal and Y is ordinal) results in X-1 dfs\nNonzero correlation statistic (X and Y both ordinal) results in 1 df" }, { - "objectID": "Comp/r-sas_mmrm.html#spatial-exponential", - "href": "Comp/r-sas_mmrm.html#spatial-exponential", - "title": "R vs SAS MMRM", - "section": "Spatial exponential", - "text": "Spatial exponential\n\nPROC GLIMMIX\nPROC GLIMMIX DATA = fev_data;\nCLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;\nMODEL FEV1 = AVISIT|ARMCD / ddfm=satterthwaite solution chisq;\nRANDOM / subject=USUBJID type=sp(exp)(visitn) rcorr;\n\n\n\nmmrm\nmmrm(\n formula = FEV1 ~ ARMCD * AVISIT + sp_exp(VISITN | USUBJID),\n data = fev_data\n)\n\n\n\ngls\ngls(\n formula = FEV1 ~ ARMCD * AVISIT,\n data = fev_data,\n correlation = corExp(form = ~AVISIT | USUBJID),\n weights = varIdent(form = ~1|AVISIT),\n na.action = na.omit\n)\n\n\n\nglmmTMB\n# NOTE: requires use of coordinates\nglmmTMB(\n FEV1 ~ ARMCD * AVISIT + exp(0 + AVISIT | USUBJID),\n dispformula = ~ 0,\n data = fev_data\n)" + "objectID": "Comp/r-sas_cmh.html#naming-convention", + "href": "Comp/r-sas_cmh.html#naming-convention", + "title": "R vs SAS CMH", + "section": "", + "text": "For the remainder of this document, we adopt the following naming convention when referring to variables of a contingency table:\n\nX = exposure\nY = response\nK = control" }, { - "objectID": "Comp/r-sas_mmrm.html#toeplitz-heterogeneous", - "href": "Comp/r-sas_mmrm.html#toeplitz-heterogeneous", - "title": "R vs SAS MMRM", - "section": "Toeplitz (heterogeneous)", - "text": "Toeplitz (heterogeneous)\n\nPROC GLIMMIX\nPROC GLIMMIX DATA = fev_data;\nCLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;\nMODEL FEV1 = AVISIT|ARMCD / ddfm=satterthwaite solution chisq;\nRANDOM AVISIT / subject=USUBJID type=TOEPH;\n\n\n\nmmrm\nmmrm(\n formula = FEV1 ~ ARMCD * AVISIT + toeph(AVISIT | USUBJID),\n data = fev_data\n)\n\n\n\nglmmTMB\n glmmTMB(\n FEV1 ~ ARMCD * AVISIT + toep(0 + AVISIT | USUBJID),\n dispformula = ~ 0,\n data = fev_data\n)" + "objectID": "Comp/r-sas_cmh.html#scale", + "href": "Comp/r-sas_cmh.html#scale", + "title": "R vs SAS CMH", + "section": "", + "text": "The scale of the exposure (X) and response (Y) variables dictate which test statistic is computed for the contingency table. Each test statistic is evaluated on different degrees of freedom (df):\n\nGeneral association statistic (X and Y both nominal) results in (X-1) * (Y-1) dfs\nRow mean scores statistic (X is nominal and Y is ordinal) results in X-1 dfs\nNonzero correlation statistic (X and Y both ordinal) results in 1 df" }, { - "objectID": "Comp/r-sas_mmrm.html#toeplitz-homogeneous", - "href": "Comp/r-sas_mmrm.html#toeplitz-homogeneous", - "title": "R vs SAS MMRM", - "section": "Toeplitz (homogeneous)", - "text": "Toeplitz (homogeneous)\n\nPROC GLIMMIX\nPROC GLIMMIX DATA = fev_data;\nCLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;\nMODEL FEV1 = AVISIT|ARMCD / ddfm=satterthwaite solution chisq;\nRANDOM AVISIT / subject=USUBJID type=TOEP;\n\n\n\nmmrm\nmmrm(\n formula = FEV1 ~ ARMCD * AVISIT + toep(AVISIT | USUBJID),\n data = fev_data\n)" + "objectID": "Comp/r-sas_cmh.html#data", + "href": "Comp/r-sas_cmh.html#data", + "title": "R vs SAS CMH", + "section": "Data", + "text": "Data\nTo begin investigating the differences in the SAS and R implementations of the CMH test, we decided to use the CDISC Pilot data set, which is publicly available on the PHUSE Test Data Factory repository. We applied very basic filtering conditions upfront (see below) and this data set served as the basis of the examples to follow.\n\n\n# A tibble: 6 × 36\n STUDYID SITEID SITEGR1 USUBJID TRTSDT TRTEDT TRTP TRTPN AGE AGEGR1\n <chr> <chr> <chr> <chr> <date> <date> <chr> <dbl> <dbl> <chr> \n1 CDISCPI… 701 701 01-701… 2014-01-02 2014-07-02 Plac… 0 63 <65 \n2 CDISCPI… 701 701 01-701… 2012-08-05 2012-09-01 Plac… 0 64 <65 \n3 CDISCPI… 701 701 01-701… 2013-07-19 2014-01-14 Xano… 81 71 65-80 \n4 CDISCPI… 701 701 01-701… 2014-03-18 2014-03-31 Xano… 54 74 65-80 \n5 CDISCPI… 701 701 01-701… 2014-07-01 2014-12-30 Xano… 81 77 65-80 \n6 CDISCPI… 701 701 01-701… 2013-02-12 2013-03-09 Plac… 0 85 >80 \n# ℹ 26 more variables: AGEGR1N <dbl>, RACE <chr>, RACEN <dbl>, SEX <chr>,\n# ITTFL <chr>, EFFFL <chr>, COMP24FL <chr>, AVISIT <chr>, AVISITN <dbl>,\n# VISIT <chr>, VISITNUM <dbl>, ADY <dbl>, ADT <date>, PARAMCD <chr>,\n# PARAM <chr>, PARAMN <dbl>, AVAL <dbl>, ANL01FL <chr>, DTYPE <chr>,\n# AWRANGE <chr>, AWTARGET <dbl>, AWTDIFF <dbl>, AWLO <dbl>, AWHI <dbl>,\n# AWU <chr>, QSSEQ <dbl>" }, { - "objectID": "Comp/r-sas_mmrm.html#unstructured", - "href": "Comp/r-sas_mmrm.html#unstructured", - "title": "R vs SAS MMRM", - "section": "Unstructured", - "text": "Unstructured\n\nPROC GLIMMIX\nPROC GLIMMIX DATA = fev_data;\nCLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;\nMODEL FEV1 = ARMCD|AVISIT / ddfm=satterthwaite solution chisq;\nRANDOM AVISIT / subject=USUBJID type=un;\n\n\n\nmmrm\nmmrm(\n formula = FEV1 ~ ARMCD * AVISIT + us(AVISIT | USUBJID),\n data = fev_data\n)\n\n\n\ngls\ngls(\n formula = FEV1 ~ ARMCD * AVISIT,\n data = fev_data,\n correlation = corSymm(form = ~AVISIT | USUBJID),\n weights = varIdent(form = ~1|AVISIT),\n na.action = na.omit\n)\n\n\n\nlmer\nlmer(\n FEV1 ~ ARMCD * AVISIT + (0 + AVISIT | USUBJID),\n data = fev_data,\n control = lmerControl(check.nobs.vs.nRE = \"ignore\"),\n na.action = na.omit\n)\n\n\n\nglmmTMB\nglmmTMB(\n FEV1 ~ ARMCD * AVISIT + us(0 + AVISIT | USUBJID),\n dispformula = ~ 0,\n data = fev_data\n)" + "objectID": "Comp/r-sas_cmh.html#schemes", + "href": "Comp/r-sas_cmh.html#schemes", + "title": "R vs SAS CMH", + "section": "Schemes", + "text": "Schemes\nIn order to follow a systematic approach to testing, and to cover variations in the CMH test, we considered the traditional 2 x 2 x K design as well as scenarios where the generalized CMH test is employed (e.g. 5 x 3 x 3).\nWe present 5 archetype test scenarios that illustrate diverging results, possibly related to sparse data and possibly considered edge cases.\n\n\n\n\n\n\n\n\n\n\nNumber\nSchema\nVariables\nRelevant Test\nDescription\n\n\n\n\n1\n2x2x2\nX = TRTP, Y = SEX, K = AGEGR1\nGeneral Association\nTRTP and AGEGR1 were limited to two categories, overall the the groups were rather balanced\n\n\n3\n2x2x3\nX = TRTP, Y = SEX, K = RACE\nGeneral Association\nGives back NaN in R because RACE is very imbalanced\n\n\n6\n2x5x2\nX = TRTP, Y = AVAL, K = SEX\nRow Means\nCompare Row Means results for R and SAS because Y is ordinal\n\n\n9\n3x5x17\nX = TRTP, Y = AVAL, K = SITEID\nRow Means\nSITEID has many strata and provokes sparse groups, AVAL is ordinal, therefore row means statistic applies here, R threw an error\n\n\n10\n5x3x3\nX = AVAL, Y = AGEGR1, K = TRTP\nCorrelation\nX and Y are ordinal variables and therefore the correlation statistics has to be taken here" }, { - "objectID": "Comp/r-sas_mmrm.html#convergence-times", - "href": "Comp/r-sas_mmrm.html#convergence-times", - "title": "R vs SAS MMRM", - "section": "Convergence Times", - "text": "Convergence Times\n\nFEV Data\nThe mmrm, PROC GLIMMIX, gls, lmer, and glmmTMB functions are applied to the FEV dataset 10 times. The convergence times are recorded for each replicate and are reported in the table below.\n\n\n\nComparison of convergence times: milliseconds\n\n\nImplementation\nMedian\nFirst Quartile\nThird Quartile\n\n\n\n\nmmrm\n56.15\n55.76\n56.30\n\n\nPROC GLIMMIX\n100.00\n100.00\n100.00\n\n\nlmer\n247.02\n245.25\n257.46\n\n\ngls\n687.63\n683.50\n692.45\n\n\nglmmTMB\n715.90\n708.70\n721.57\n\n\n\n\n\nIt is clear from these results that mmrm converges significantly faster than other R functions. Though not demonstrated here, this is generally true regardless of the sample size and covariance structure used. mmrm is faster than PROC GLIMMIX.\n\n\nBCVA Data\nThe MMRM implementations are now applied to the BCVA dataset 10 times. The convergence times are presented below.\n\n\n\nComparison of convergence times: seconds\n\n\nImplementation\nMedian\nFirst Quartile\nThird Quartile\n\n\n\n\nmmrm\n3.36\n3.32\n3.46\n\n\nglmmTMB\n18.65\n18.14\n18.87\n\n\nPROC GLIMMIX\n36.25\n36.17\n36.29\n\n\ngls\n164.36\n158.61\n165.93\n\n\nlmer\n165.26\n157.46\n166.42\n\n\n\n\n\nWe again find that mmrm produces the fastest convergence times on average." + "objectID": "Comp/r-sas_cmh.html#cmh-statistics", + "href": "Comp/r-sas_cmh.html#cmh-statistics", + "title": "R vs SAS CMH", + "section": "CMH Statistics", + "text": "CMH Statistics\n\n\nLoading required package: vcd\n\n\nLoading required package: grid\n\n\nLoading required package: gnm\n\n\n\nAttaching package: 'vcdExtra'\n\n\nThe following object is masked from 'package:dplyr':\n\n summarise\n\n\nscenarios this is a test\nAs it can be seen, there are two schemata where R does not provide any results:\n\n\n\n\n\n\n\n\n\nTest\nChi-Square\ndf\np-value\n\n\nSAS\nR\nSAS\nR\nSAS\nR\n\n\n\n\n1\nCorrelation\n0.01767040\n0.0008689711\n1\n1\n0.89424870\n0.9764831\n\n\nRow Means\n0.01767040\n2.4820278527\n1\n2\n0.89424870\n0.2890910\n\n\nGeneral Association\n0.01767040\n2.4820278527\n1\n2\n0.89424870\n0.2890910\n\n\n3*\nCorrelation\n0.00278713\nNaN\n1\n1\n0.95789662\nNaN\n\n\nRow Means\n2.38606985\nNaN\n2\n2\n0.30329938\nNaN\n\n\nGeneral Association\n2.38606985\nNaN\n2\n2\n0.30329938\nNaN\n\n\n6\nCorrelation\n1.14720000\n0.1115439738\n1\n1\n0.28410000\n0.7383931\n\n\nRow Means\n1.14720000\n2.6632420358\n1\n2\n0.28410000\n0.2640489\n\n\nGeneral Association\n2.56720000\n6.5238474681\n4\n8\n0.63260000\n0.5887637\n\n\n9†\nCorrelation\n0.08544312\nNA\n1\nNA\n0.77005225\nNA\n\n\nRow Means\n2.47631367\nNA\n2\nNA\n0.28991809\nNA\n\n\nGeneral Association\n7.03387844\nNA\n8\nNA\n0.53298189\nNA\n\n\n10\nCorrelation\n2.73816085\n0.8715295423\n1\n1\n0.09797747\n0.3505322\n\n\nRow Means\n4.40701092\n3.0445270087\n4\n4\n0.35371641\n0.5504018\n\n\nGeneral Association\n5.73053819\n5.7305381934\n8\n8\n0.67738613\n0.6773861\n\n\n\n* Reason for NaN in schema 3: Stratum k = AMERICAN INDIAN OR ALASKA NATIVE can not be compared because there are only values for one treatment and one gender.\n\n\n† Reason for Error 4: For large sparse table (many strata) CMHTest will occasionally throw an error in solve.default(AVA) because of singularity" }, { - "objectID": "Comp/r-sas_mmrm.html#marginal-treatment-effect-estimates-comparison", - "href": "Comp/r-sas_mmrm.html#marginal-treatment-effect-estimates-comparison", - "title": "R vs SAS MMRM", - "section": "Marginal Treatment Effect Estimates Comparison", - "text": "Marginal Treatment Effect Estimates Comparison\nWe next estimate the marginal mean treatment effects for each visit in the FEV and BCVA datasets using the MMRM fitting procedures. All R implementations’ estimates are reported relative to PROC GLIMMIX’s estimates. Convergence status is also reported.\n\nFEV Data\n\n\n\n\n\n\n\n\n\nThe R procedures’ estimates are very similar to those output by PROC GLIMMIX, though mmrm and gls generate the estimates that are closest to those produced when using SAS. All methods converge using their default optimization arguments.\n\n\nBCVA Data\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nmmrm, gls and lmer produce estimates that are virtually identical to PROC GLIMMIX’s, while glmmTMB does not. This is likely explained by glmmTMB’s failure to converge. Note too that lmer fails to converge." + "objectID": "index.html", + "href": "index.html", + "title": "CAMIS - A PHUSE DVOST Working Group", + "section": "", + "text": "Introduction\nSeveral discrepancies have been discovered in statistical analysis results between different programming languages, even in fully qualified statistical computing environments. Subtle differences exist between the fundamental approaches implemented by each language, yielding differences in results which are each correct in their own right. The fact that these differences exist causes unease on the behalf of sponsor companies when submitting to a regulatory agency, as it is uncertain if the agency will view these differences as problematic. In its Statistical Software Clarifying Statement, the US Food and Drug Administration (FDA) states that it “FDA does not require use of any specific software for statistical analyses” and that “the computer software used for data management and statistical analysis should be reliable.” Observing differences across languages can reduce the analyst’s confidence in reliability and, by understanding the source of any discrepancies, one can reinstate confidence in reliability.\n\nMotivation\nThe goal of this project is to demystify conflicting results between software and to help ease the transitions to new languages by providing comparison and comprehensive explanations.\n\n\nRepository\nThe repository below provides examples of statistical methodology in different software and languages, along with a comparison of the results obtained and description of any discrepancies.\n\n\n\n\n\n\n\n\nMethods\nR\nSAS\nComparison\n\n\n\n\nSummary Statistics\nRounding\nR\nSAS\nR vs SAS\n\n\nSummary statistics\nR\nSAS\nR vs SAS\n\n\nGeneral Linear Models\nOne Sample t-test\nR\nSAS\nR vs SAS\n\n\nPaired t-test\nR\nSAS\nR vs SAS\n\n\nTwo Sample t-test\n\nSAS\n\n\n\nANOVA\nR\nSAS\nR vs SAS\n\n\nANCOVA\nR\n\n\n\n\nMANOVA\nR\nSAS\nR vs SAS\n\n\nLinear Regression\nR\nSAS\nR vs SAS\n\n\nGeneralized Linear Models\nLogistic Regression\nR\n\n\n\n\nPoisson/Negative Binomial Regression\n\n\n\n\n\nCategorical Repeated Measures\n\n\n\n\n\nCategorical Multiple Imputation\n\n\n\n\n\nNon-parametric Analysis\nWilcoxon signed rank\n\n\n\n\n\nMann-Whitney U/Wilcoxon rank sum\nR\n\n\n\n\nKolmogorov-Smirnov test\n\n\n\n\n\nKruskall-Wallis test\nR\nSAS\nR vs SAS\n\n\nFriedman test\n\n\n\n\n\nJonckheere test\n\n\n\n\n\nCategorical Data Analysis\nBinomial test\n\n\n\n\n\nMcNemar's test\nR\nSAS\nR vs SAS\n\n\nChi-Square Association/Fishers exact\nR\n\nR vs SAS\n\n\nCochran Mantel Haenszel\nR\nSAS\nR vs SAS\n\n\nConfidence Intervals for proportions\n\n\n\n\n\nLinear Mixed Models\nMMRM\nR\nSAS\nR vs SAS\n\n\nGeneralized Linear Mixed Models\nMMRM\n\n\n\n\n\nMultiple Imputation - Continuous Data MAR\nMCMC\n\n\n\n\n\nLinear regression\n\n\n\n\n\nPredictive Mean Matching\n\n\n\n\n\nPropensity Scores\n\n\n\n\n\nMultiple Imputation - Continuous Data MNAR\nDelta Adjustment/Tipping Point\n\n\n\n\n\nReference-Based Imputation/Sequential Methods\n\n\n\n\n\nReference-Based Imputation/Joint Modelling\n\n\n\n\n\nCorrelation\nPearson's/ Spearman's/ Kendall's Rank\nR\n\n\n\n\nSurvival Models\nKaplan-Meier Log-rank test and Cox-PH\nR\nSAS\nR vs SAS\n\n\nAccelerated Failure Time\n\n\n\n\n\nNon-proportional hazards methods\n\n\n\n\n\nSample size /Power calculations\nSingle timepoint analysis\n\n\n\n\n\nGroup-sequential designs\n\n\n\n\n\nMultivariate methods\nClustering\n\n\n\n\n\nFactor analysis\n\n\n\n\n\nPCA\n\n\n\n\n\nCanonical correlation\n\n\n\n\n\nPLS\n\n\n\n\n\nOther Methods\nSurvey statistics\n\n\n\n\n\nNearest neighbour\n\n\n\n\n\nCausal inference\n\n\n\n\n\nMachine learning" }, { - "objectID": "Comp/r-sas_mmrm.html#impact-of-missing-data-on-convergence-rates", - "href": "Comp/r-sas_mmrm.html#impact-of-missing-data-on-convergence-rates", - "title": "R vs SAS MMRM", - "section": "Impact of Missing Data on Convergence Rates", - "text": "Impact of Missing Data on Convergence Rates\nThe results of the previous benchmark suggest that the amount of patients missing from later time points affect certain implementations’ capacity to converge. We investigate this further by simulating data using a data-generating process similar to that of the BCVA datasets, though with various rates of patient dropout.\nTen datasets of 200 patients are generated each of the following levels of missingness: none, mild, moderate, and high. In all scenarios, observations are missing at random. The number patients observed at each visit is obtained for one replicated dataset at each level of missingness is presented in the table below.\n\n\n\nNumber of patients per visit\n\n\n\nnone\nmild\nmoderate\nhigh\n\n\n\n\nVIS01\n200\n196.7\n197.6\n188.1\n\n\nVIS02\n200\n195.4\n194.4\n182.4\n\n\nVIS03\n200\n195.1\n190.7\n175.2\n\n\nVIS04\n200\n194.1\n188.4\n162.8\n\n\nVIS05\n200\n191.6\n182.5\n142.7\n\n\nVIS06\n200\n188.2\n177.3\n125.4\n\n\nVIS07\n200\n184.6\n168.0\n105.9\n\n\nVIS08\n200\n178.5\n155.4\n82.6\n\n\nVIS09\n200\n175.3\n139.9\n58.1\n\n\nVIS10\n200\n164.1\n124.0\n39.5\n\n\n\n\n\nThe convergence rates of all implementations for stratified by missingness level is presented in the plot below.\n\n\n\n\n\n\n\n\n\nmmrm, gls, and PROC GLIMMIX are resilient to missingness, only exhibiting some convergence problems in the scenarios with the most missingness. These implementations converged in all the other scenarios’ replicates. glmmTMB, on the other hand, has convergence issues in the no-, mild-, and high-missingness datasets, with the worst convergence rate occurring in the datasets with the most dropout. Finally, lmer is unreliable in all scenarios, suggesting that it’s convergence issues stem from something other than the missing observations.\nNote that the default optimization schemes are used for each method; these schemes can be modified to potentially improve convergence rates.\nA more comprehensive simulation study using data-generating processes similar to the one used here is outlined in the simulations/missing-data-benchmarks subdirectory. In addition to assessing the effect of missing data on software convergence rates, we also evaluate these methods’ fit times and empirical bias, variance, 95% coverage rates, type I error rates and type II error rates. mmrm is found to be the most most robust software for fitting MMRMs in scenarios where a large proportion of patients are missing from the last time points. Additionally, mmrm has the fastest average fit times regardless of the amount of missingness. All implementations considered produce similar empirical biases, variances, 95% coverage rates, type I error rates and type II error rates." + "objectID": "Conferences.html", + "href": "Conferences.html", + "title": "Conferences 2024", + "section": "", + "text": "2024 Conference Schedule\nList of seminars and conferences that the CAMIS team will be attending in 2024.\nIf you are a volunteer on the CAMIS project and plan to present at a seminar or conference, please add details of the conference below. For help with slides or content go to HERE.\nTo cite the CAMIS project work in online content or presentations please use: “Content reproduced with the permission of PHUSE CAMIS - A DVOST Working Group”.\n\n\n\nConference name\nDate (2024)\nLocation\nName Attending\nDetails\nWebsite\n\n\n\n\nPhuse US Connect\n25-28 Feb\nBethesa, Maryland, USA\nSoma Sekhar Sriadibhatla, Vikash Jain, Brian Varney\nPoster\nConnect\n\n\nPhuse/FDA CSS\n3-5 June\nSilver Spring Maryland, USA\nSoma Sekhar Sriadibhatla, Vikash Jain, Harshal Khanolkar\nWorkshop\nCSS\n\n\nRSS Local Group Seminar\n28 Feb\nSheffield, England\nLyn Taylor\nSeminar\nRSS\n\n\n\n\n\nYearly Conference Planner\nTo help to plan our attendance throughout the year, here is a list of conferences we are looking to send representation to. If you plan to attend one of these conferences and are interested in representing us, then please get in touch.\n\n\n\nConference name\nUsual Abstract Deadline\nUsual Conference Date\nRegion\nLinks\n\n\n\n\nJoint Statistical Meetings (JSM) American Statistical Association (ASA)\n1st February\n1st week of August\nUSA\nJSM-ASA\n\n\nASA Biopharmaceutical Section Regulatory-Industry Statistics workshop\nEnd March\nLast week of September\nUSA\nBIOP\n\n\nPhuse US Connect\nNovember\nLast week of Feb\nUSA\nCDISC\n\n\nPhuse/FDA Computational Science Symposium(CSS)\nDecember\n1st week of June\nUSA\nCSS\n\n\nIASCT (ConSPIC)\nMid March\nEarly May\nIndia\nIASCT\n\n\nSociety of Clinical Trials (SCT)\nJanuary\nMid May\nUSA\nSCT\n\n\nPharmaSUG\nMid Jan\nMid May\nUSA\nPharmaSUG\n\n\nuseR\nEarly March\nEarly July\nEurope/Online\nuseR\n\n\nPSI\nNov-oral, Feb-Poster\nMid June\nEurope\nPSI\n\n\nDIA Global\nEnd Feb-poster\nMid June\nUSA\nDIA-USA\n\n\nDIA Europe\nNov\nMid March\nEurope\nDIA-Europe\n\n\nDIA China\nJan\nMid May\nChina\nDIA-China\n\n\nInternational Society for Clinical Biostatistics (ISCB)\nMid Feb\nMid July\nEurope\nISCB\n\n\nRoyal Statistical Society (RSS)\nEarly April\nEarly September\nEngland\nRSS\n\n\nSouthEast SAS User Group (SESUG)\nEnd Feb\nEnd Sept\nMaryland, USA\nSESUG\n\n\nPHUSE EU Connect\nMid March\nMid Nov\nEurope\nPHUSE EU Connect\n\n\nR/Pharma\nApril\nMid October\nVirtual\nR/pharma\n\n\nPOSIT conf.\nInvite only\nSeptember\nUSA\nPOSIT conf" }, { - "objectID": "Comp/r-sas_kruskalwallis.html", - "href": "Comp/r-sas_kruskalwallis.html", - "title": "Kruskal Wallis R v SAS", + "objectID": "contribution.html", + "href": "contribution.html", + "title": "Contributions", "section": "", - "text": "From the individual R and SAS pages, performing the Kruskal-Wallis test in R using:\n\nkruskal.test(Sepal_Width~Species, data=iris_sub)\n\nand in SAS using:\n\nproc npar1way data=iris_sub wilcoxon;\nclass Species;\nvar Sepal_Width;\nexact;\nrun;\n\nproduced the same results for the test statistic and asymptotic p-value.\nThere is a difference between languages in that SAS provides the EXACT option to easily output the exact p-value, where R does not seem to have an equivalent. A Monte Carlo permutation test may offer an alternative to the exact test on R. The coin package could help in implementing this." + "text": "Request for Contributions\nAlthough this project does have a core team, the endeavor of tracking all these comparisons will fail without community contributions. We welcome a wide variety of contributions from correcting small typos all the way to full write-ups comparing software (languages) for a method.\nPlease contribute by submitting a pull request and our team will review it. If you are adding a page please follow one of our templates:\n\nR template\n\nInstructions for Contributions to the CAMIS repository\nThe following instructions assume you have not done a contribution in the past. If you have contributed in the past skip down to step 5\n\nYou will need to get git, github, and RStudio setup to talk to each other. To do this you will need to have a github account and git installed on your computer. To connect your computer to github, we tend to recommend using a PAT because it is bit easier than SSH. We have a script that can help you set that up, found here. For more information Jenny Bryan has a great bookdown explaining how to get setup, alternatively see the following link for a short guidance.\nNow with RStudio all setup, you will need to “fork” the repository, which basically mean you want to make a copy of this repository that you own, so it will be under your github profile. This will allow you to make changes, without needing direct permission. To do this you will need to go into github, into the CAMIS repo, and click “fork”. This will give you some options of how you want to fork the repo, honestly you can just keep the defaults and then click “Create fork”\n\nOnce you’ve created a copy of this repository, you’ll need to clone it from GitHub to your computer. Click the “code” button to do this. The method you’ll use, either “HTTPS” or “SSH”, depends on how you’ve connected your computer to GitHub. If you’ve set up using a PAT, select the “HTTPS” tab. If you’ve used “SSH”, then choose that tab. Either way, you will need to copy the location in the box.\n\nIn RStudio, you will need to create a new project and select “Version Control” in the project wizard. Then you will select “Git” and finally paste the location copied from github into the URL box. Finally hit “Create Project” and you should be good to go!\n\nGo into RStudio and Create a branch – Give you are working from your own fork, this step it is a bit optional. It is up to you if you want to make a separate branch or not. But, it is generally considered good practice, especially if you are planning on contributing regularly. To do this from RStudio click the branch button (on the git tab top right). Within the box that comes up ensure you are on the “remote=origin” and “Sync branch with remote” is checked. You can name the branch something to do with the amends you intend to make.\nEdit and /or add files within the CAMIS directories. If you are adding SAS guidance store under sas folder, R guidance store under r folder, for “SAS vs R” comparison store under comp. Follow the naming convention of the files already stored in those folders.\nWithin R studio - Commit each change or new file added, and push to the repo from within R studio. Once you have completed the change you want to make, it is time for a pull request. Before we start though, it is good to check that your branch on github contains all the update you have done. If not you may need to push from Rstudio before moving onto the pull request.\n\n\n\nThis is what it will look like if you still need to push\n\n\nPull request in github - Back on your fork in github you will see that your repo is now ahead of the main CAMIS repository. The first thing you want to do is make sure there aren’t any conflict that have arisen with the main repository, so you need to click ‘Sync fork’.\n\nIf that is all good then you can create a pull request by clicking on ‘Contribute’ and then ‘Open pull request’. This brings you to a page where you can explain your pull request if you like or you can just confirm you would like to go through with this pull request. The final step is to add a reviewer, please add DrLynTaylor and statasaurus. For more details about making pull requests see create a pull request.\nOnce your change is approved, and merged into the origin, you will be able to see your changes on CAMIS. If you have made a branch in your fork the branch will be deleted and you will need to create a new branch to add further contributions. NOTE: you can make the new branch called the same as the old one if you wish but ensure you select to overwrite the previous one." }, { - "objectID": "Comp/r-sas_kruskalwallis.html#kruskal-wallis-r-and-sas", - "href": "Comp/r-sas_kruskalwallis.html#kruskal-wallis-r-and-sas", - "title": "Kruskal Wallis R v SAS", + "objectID": "Comp/r-sas_ttest_1Sample.html", + "href": "Comp/r-sas_ttest_1Sample.html", + "title": "R vs SAS One Sample T-Test", "section": "", - "text": "From the individual R and SAS pages, performing the Kruskal-Wallis test in R using:\n\nkruskal.test(Sepal_Width~Species, data=iris_sub)\n\nand in SAS using:\n\nproc npar1way data=iris_sub wilcoxon;\nclass Species;\nvar Sepal_Width;\nexact;\nrun;\n\nproduced the same results for the test statistic and asymptotic p-value.\nThere is a difference between languages in that SAS provides the EXACT option to easily output the exact p-value, where R does not seem to have an equivalent. A Monte Carlo permutation test may offer an alternative to the exact test on R. The coin package could help in implementing this." + "text": "The following table shows the types of One Sample t-test analysis, the capabilities of each language, and whether or not the results from each language match.\n\n\n\nAnalysis\nSupported in R\nSupported in SAS\nResults Match\nNotes\n\n\n\n\nOne sample t-test, normal data\nYes\nYes\nYes\nIn Base R, use mu parameter on t.test() function to set null hypothesis value\n\n\nOne sample t-test, lognormal data\nMaybe\nYes\nNA\nMay be supported by envstats package\n\n\n\n\n\n\n\nHere is a table of comparison values between t.test(), proc_ttest(), and SAS PROC TTEST:\n\n\n\n\n\n\n\n\n\n\n\nStatistic\nt.test()\nproc_ttest()\nPROC TTEST\nMatch\nNotes\n\n\n\n\nDegrees of Freedom\n29\n29\n29\nYes\n\n\n\nt value\n2.364306\n2.364306\n2.364306\nYes\n\n\n\np value\n0.0249741\n0.0249741\n0.0249741\nYes\n\n\n\n\n\n\n\nSince there is currently no known support for lognormal t-test in R, this comparison is not applicable." }, { - "objectID": "Comp/r-sas-summary-stats.html", - "href": "Comp/r-sas-summary-stats.html", - "title": "Deriving Quantiles or Percentiles in R vs SAS", + "objectID": "Comp/r-sas_ttest_1Sample.html#comparison-results", + "href": "Comp/r-sas_ttest_1Sample.html#comparison-results", + "title": "R vs SAS One Sample T-Test", "section": "", - "text": "Data\nThe following data will be used show the differences between the default percentile definitions used by SAS and R:\n\n10, 20, 30, 40, 150, 160, 170, 180, 190, 200\n\n\n\nSAS Code\nAssuming the data above is stored in the variable aval within the dataset adlb, the 25th and 40th percentiles could be calculated using the following code.\n\nproc univariate data=adlb;\n var aval;\n output out=stats pctlpts=25 40 pctlpre=p;\nrun;\n\nThis procedure creates the dataset stats containing the variables p25 and p40.\n\n\n\n\n\n\n\n\n\nThe procedure has the option PCTLDEF which allows for five different percentile definitions to be used. The default is PCTLDEF=5.\n\n\nR code\nThe 25th and 40th percentiles of aval can be calculated using the quantile function.\n\nquantile(adlb$aval, probs = c(0.25, 0.4))\n\nThis gives the following output.\n\n\n 25% 40% \n 32.5 106.0 \n\n\nThe function has the argument type which allows for nine different percentile definitions to be used. The default is type = 7.\n\n\nComparison\nThe default percentile definition used by the UNIVARIATE procedure in SAS finds the 25th and 40th percentiles to be 30 and 95. The default definition used by R finds these percentiles to be 32.5 and 106.\nIt is possible to get the quantile function in R to use the same definition as the default used in SAS, by specifying type=2.\n\nquantile(adlb$aval, probs = c(0.25, 0.4), type=2)\n\nThis gives the following output.\n\n\n25% 40% \n 30 95 \n\n\nIt is not possible to get the UNIVARIATE procedure in SAS to use the same definition as the default used in R.\nRick Wicklin provided a blog post showing how SAS has built in support for calculations using 5 of the 9 percentile definitions available in R, and also demonstrated how you can use a SAS/IML function to calculate percentiles using the other 4 definitions.\nMore information about quantile derivation can be found in the SAS blog.\n\n\nKey references:\nCompare the default definitions for sample quantiles in SAS, R, and Python\nSample quantiles: A comparison of 9 definitions\nHyndman, R. J., & Fan, Y. (1996). Sample quantiles in statistical packages. The American Statistician, 50(4), 361-365." + "text": "Here is a table of comparison values between t.test(), proc_ttest(), and SAS PROC TTEST:\n\n\n\n\n\n\n\n\n\n\n\nStatistic\nt.test()\nproc_ttest()\nPROC TTEST\nMatch\nNotes\n\n\n\n\nDegrees of Freedom\n29\n29\n29\nYes\n\n\n\nt value\n2.364306\n2.364306\n2.364306\nYes\n\n\n\np value\n0.0249741\n0.0249741\n0.0249741\nYes\n\n\n\n\n\n\n\nSince there is currently no known support for lognormal t-test in R, this comparison is not applicable." }, { "objectID": "Comp/r-sas_ttest_Paired.html", @@ -476,228 +455,242 @@ "text": "Here is a table of comparison values between t.test(), proc_ttest(), and SAS PROC TTEST:\n\n\n\n\n\n\n\n\n\n\n\nStatistic\nt.test()\nproc_ttest()\nPROC TTEST\nMatch\nNotes\n\n\n\n\nDegrees of Freedom\n11\n11\n11\nYes\n\n\n\nt value\n-1.089648\n-1.089648\n-1.089648\nYes\n\n\n\np value\n0.2992\n0.2992\n0.2992\nYes\n\n\n\n\n\n\n\nSince there is currently no known support for lognormal t-test in R, this comparison is not applicable." }, { - "objectID": "Comp/r-sas_ttest_1Sample.html", - "href": "Comp/r-sas_ttest_1Sample.html", - "title": "R vs SAS One Sample T-Test", + "objectID": "Comp/r-sas-summary-stats.html", + "href": "Comp/r-sas-summary-stats.html", + "title": "Deriving Quantiles or Percentiles in R vs SAS", "section": "", - "text": "The following table shows the types of One Sample t-test analysis, the capabilities of each language, and whether or not the results from each language match.\n\n\n\nAnalysis\nSupported in R\nSupported in SAS\nResults Match\nNotes\n\n\n\n\nOne sample t-test, normal data\nYes\nYes\nYes\nIn Base R, use mu parameter on t.test() function to set null hypothesis value\n\n\nOne sample t-test, lognormal data\nMaybe\nYes\nNA\nMay be supported by envstats package\n\n\n\n\n\n\n\nHere is a table of comparison values between t.test(), proc_ttest(), and SAS PROC TTEST:\n\n\n\n\n\n\n\n\n\n\n\nStatistic\nt.test()\nproc_ttest()\nPROC TTEST\nMatch\nNotes\n\n\n\n\nDegrees of Freedom\n29\n29\n29\nYes\n\n\n\nt value\n2.364306\n2.364306\n2.364306\nYes\n\n\n\np value\n0.0249741\n0.0249741\n0.0249741\nYes\n\n\n\n\n\n\n\nSince there is currently no known support for lognormal t-test in R, this comparison is not applicable." + "text": "Data\nThe following data will be used show the differences between the default percentile definitions used by SAS and R:\n\n10, 20, 30, 40, 150, 160, 170, 180, 190, 200\n\n\n\nSAS Code\nAssuming the data above is stored in the variable aval within the dataset adlb, the 25th and 40th percentiles could be calculated using the following code.\n\nproc univariate data=adlb;\n var aval;\n output out=stats pctlpts=25 40 pctlpre=p;\nrun;\n\nThis procedure creates the dataset stats containing the variables p25 and p40.\n\n\n\n\n\n\n\n\n\nThe procedure has the option PCTLDEF which allows for five different percentile definitions to be used. The default is PCTLDEF=5.\n\n\nR code\nThe 25th and 40th percentiles of aval can be calculated using the quantile function.\n\nquantile(adlb$aval, probs = c(0.25, 0.4))\n\nThis gives the following output.\n\n\n 25% 40% \n 32.5 106.0 \n\n\nThe function has the argument type which allows for nine different percentile definitions to be used. The default is type = 7.\n\n\nComparison\nThe default percentile definition used by the UNIVARIATE procedure in SAS finds the 25th and 40th percentiles to be 30 and 95. The default definition used by R finds these percentiles to be 32.5 and 106.\nIt is possible to get the quantile function in R to use the same definition as the default used in SAS, by specifying type=2.\n\nquantile(adlb$aval, probs = c(0.25, 0.4), type=2)\n\nThis gives the following output.\n\n\n25% 40% \n 30 95 \n\n\nIt is not possible to get the UNIVARIATE procedure in SAS to use the same definition as the default used in R.\nRick Wicklin provided a blog post showing how SAS has built in support for calculations using 5 of the 9 percentile definitions available in R, and also demonstrated how you can use a SAS/IML function to calculate percentiles using the other 4 definitions.\nMore information about quantile derivation can be found in the SAS blog.\n\n\nKey references:\nCompare the default definitions for sample quantiles in SAS, R, and Python\nSample quantiles: A comparison of 9 definitions\nHyndman, R. J., & Fan, Y. (1996). Sample quantiles in statistical packages. The American Statistician, 50(4), 361-365." }, { - "objectID": "Comp/r-sas_ttest_1Sample.html#comparison-results", - "href": "Comp/r-sas_ttest_1Sample.html#comparison-results", - "title": "R vs SAS One Sample T-Test", + "objectID": "Comp/r-sas_kruskalwallis.html", + "href": "Comp/r-sas_kruskalwallis.html", + "title": "Kruskal Wallis R v SAS", "section": "", - "text": "Here is a table of comparison values between t.test(), proc_ttest(), and SAS PROC TTEST:\n\n\n\n\n\n\n\n\n\n\n\nStatistic\nt.test()\nproc_ttest()\nPROC TTEST\nMatch\nNotes\n\n\n\n\nDegrees of Freedom\n29\n29\n29\nYes\n\n\n\nt value\n2.364306\n2.364306\n2.364306\nYes\n\n\n\np value\n0.0249741\n0.0249741\n0.0249741\nYes\n\n\n\n\n\n\n\nSince there is currently no known support for lognormal t-test in R, this comparison is not applicable." + "text": "From the individual R and SAS pages, performing the Kruskal-Wallis test in R using:\n\nkruskal.test(Sepal_Width~Species, data=iris_sub)\n\nand in SAS using:\n\nproc npar1way data=iris_sub wilcoxon;\nclass Species;\nvar Sepal_Width;\nexact;\nrun;\n\nproduced the same results for the test statistic and asymptotic p-value.\nThere is a difference between languages in that SAS provides the EXACT option to easily output the exact p-value, where R does not seem to have an equivalent. A Monte Carlo permutation test may offer an alternative to the exact test on R. The coin package could help in implementing this." }, { - "objectID": "contribution.html", - "href": "contribution.html", - "title": "Contributions", + "objectID": "Comp/r-sas_kruskalwallis.html#kruskal-wallis-r-and-sas", + "href": "Comp/r-sas_kruskalwallis.html#kruskal-wallis-r-and-sas", + "title": "Kruskal Wallis R v SAS", "section": "", - "text": "Request for Contributions\nAlthough this project does have a core team, the endeavor of tracking all these comparisons will fail without community contributions. We welcome a wide variety of contributions from correcting small typos all the way to full write-ups comparing software (languages) for a method.\nPlease contribute by submitting a pull request and our team will review it. If you are adding a page please follow one of our templates:\n\nR template\n\nInstructions for Contributions to the CAMIS repository\nThe following instructions assume you have not done a contribution in the past. If you have contributed in the past skip down to step 5\n\nYou will need to get git, github, and RStudio setup to talk to each other. To do this you will need to have a github account and git installed on your computer. To connect your computer to github, we tend to recommend using a PAT because it is bit easier than SSH. We have a script that can help you set that up, found here. For more information Jenny Bryan has a great bookdown explaining how to get setup, alternatively see the following link for a short guidance.\nNow with RStudio all setup, you will need to “fork” the repository, which basically mean you want to make a copy of this repository that you own, so it will be under your github profile. This will allow you to make changes, without needing direct permission. To do this you will need to go into github, into the CAMIS repo, and click “fork”. This will give you some options of how you want to fork the repo, honestly you can just keep the defaults and then click “Create fork”\n\nOnce you’ve created a copy of this repository, you’ll need to clone it from GitHub to your computer. Click the “code” button to do this. The method you’ll use, either “HTTPS” or “SSH”, depends on how you’ve connected your computer to GitHub. If you’ve set up using a PAT, select the “HTTPS” tab. If you’ve used “SSH”, then choose that tab. Either way, you will need to copy the location in the box.\n\nIn RStudio, you will need to create a new project and select “Version Control” in the project wizard. Then you will select “Git” and finally paste the location copied from github into the URL box. Finally hit “Create Project” and you should be good to go!\n\nGo into RStudio and Create a branch – Give you are working from your own fork, this step it is a bit optional. It is up to you if you want to make a separate branch or not. But, it is generally considered good practice, especially if you are planning on contributing regularly. To do this from RStudio click the branch button (on the git tab top right). Within the box that comes up ensure you are on the “remote=origin” and “Sync branch with remote” is checked. You can name the branch something to do with the amends you intend to make.\nEdit and /or add files within the CAMIS directories. If you are adding SAS guidance store under sas folder, R guidance store under r folder, for “SAS vs R” comparison store under comp. Follow the naming convention of the files already stored in those folders.\nWithin R studio - Commit each change or new file added, and push to the repo from within R studio. Once you have completed the change you want to make, it is time for a pull request. Before we start though, it is good to check that your branch on github contains all the update you have done. If not you may need to push from Rstudio before moving onto the pull request.\n\n\n\nThis is what it will look like if you still need to push\n\n\nPull request in github - Back on your fork in github you will see that your repo is now ahead of the main CAMIS repository. The first thing you want to do is make sure there aren’t any conflict that have arisen with the main repository, so you need to click ‘Sync fork’.\n\nIf that is all good then you can create a pull request by clicking on ‘Contribute’ and then ‘Open pull request’. This brings you to a page where you can explain your pull request if you like or you can just confirm you would like to go through with this pull request. The final step is to add a reviewer, please add DrLynTaylor and statasaurus. For more details about making pull requests see create a pull request.\nOnce your change is approved, and merged into the origin, you will be able to see your changes on CAMIS. If you have made a branch in your fork the branch will be deleted and you will need to create a new branch to add further contributions. NOTE: you can make the new branch called the same as the old one if you wish but ensure you select to overwrite the previous one." + "text": "From the individual R and SAS pages, performing the Kruskal-Wallis test in R using:\n\nkruskal.test(Sepal_Width~Species, data=iris_sub)\n\nand in SAS using:\n\nproc npar1way data=iris_sub wilcoxon;\nclass Species;\nvar Sepal_Width;\nexact;\nrun;\n\nproduced the same results for the test statistic and asymptotic p-value.\nThere is a difference between languages in that SAS provides the EXACT option to easily output the exact p-value, where R does not seem to have an equivalent. A Monte Carlo permutation test may offer an alternative to the exact test on R. The coin package could help in implementing this." }, { - "objectID": "Conferences.html", - "href": "Conferences.html", - "title": "Conferences 2024", + "objectID": "Comp/r-sas_mmrm.html", + "href": "Comp/r-sas_mmrm.html", + "title": "R vs SAS MMRM", "section": "", - "text": "2024 Conference Schedule\nList of seminars and conferences that the CAMIS team will be attending in 2024.\nIf you are a volunteer on the CAMIS project and plan to present at a seminar or conference, please add details of the conference below. For help with slides or content go to HERE.\nTo cite the CAMIS project work in online content or presentations please use: “Content reproduced with the permission of PHUSE CAMIS - A DVOST Working Group”.\n\n\n\nConference name\nDate (2024)\nLocation\nName Attending\nDetails\nWebsite\n\n\n\n\nPhuse US Connect\n25-28 Feb\nBethesa, Maryland, USA\nSoma Sekhar Sriadibhatla, Vikash Jain, Brian Varney\nPoster\nConnect\n\n\nPhuse/FDA CSS\n3-5 June\nSilver Spring Maryland, USA\nSoma Sekhar Sriadibhatla, Vikash Jain, Harshal Khanolkar\nWorkshop\nCSS\n\n\nRSS Local Group Seminar\n28 Feb\nSheffield, England\nLyn Taylor\nSeminar\nRSS\n\n\n\n\n\nYearly Conference Planner\nTo help to plan our attendance throughout the year, here is a list of conferences we are looking to send representation to. If you plan to attend one of these conferences and are interested in representing us, then please get in touch.\n\n\n\nConference name\nUsual Abstract Deadline\nUsual Conference Date\nRegion\nLinks\n\n\n\n\nJoint Statistical Meetings (JSM) American Statistical Association (ASA)\n1st February\n1st week of August\nUSA\nJSM-ASA\n\n\nASA Biopharmaceutical Section Regulatory-Industry Statistics workshop\nEnd March\nLast week of September\nUSA\nBIOP\n\n\nPhuse US Connect\nNovember\nLast week of Feb\nUSA\nCDISC\n\n\nPhuse/FDA Computational Science Symposium(CSS)\nDecember\n1st week of June\nUSA\nCSS\n\n\nIASCT (ConSPIC)\nMid March\nEarly May\nIndia\nIASCT\n\n\nSociety of Clinical Trials (SCT)\nJanuary\nMid May\nUSA\nSCT\n\n\nPharmaSUG\nMid Jan\nMid May\nUSA\nPharmaSUG\n\n\nuseR\nEarly March\nEarly July\nEurope/Online\nuseR\n\n\nPSI\nNov-oral, Feb-Poster\nMid June\nEurope\nPSI\n\n\nDIA Global\nEnd Feb-poster\nMid June\nUSA\nDIA-USA\n\n\nDIA Europe\nNov\nMid March\nEurope\nDIA-Europe\n\n\nDIA China\nJan\nMid May\nChina\nDIA-China\n\n\nInternational Society for Clinical Biostatistics (ISCB)\nMid Feb\nMid July\nEurope\nISCB\n\n\nRoyal Statistical Society (RSS)\nEarly April\nEarly September\nEngland\nRSS\n\n\nSouthEast SAS User Group (SESUG)\nEnd Feb\nEnd Sept\nMaryland, USA\nSESUG\n\n\nPHUSE EU Connect\nMid March\nMid Nov\nEurope\nPHUSE EU Connect\n\n\nR/Pharma\nApril\nMid October\nVirtual\nR/pharma\n\n\nPOSIT conf.\nInvite only\nSeptember\nUSA\nPOSIT conf" + "text": "In this vignette we briefly compare the mmrm::mmrm, SAS’s PROC GLIMMIX, nlme::gls, lme4::lmer, and glmmTMB::glmmTMB functions for fitting mixed models for repeated measures (MMRMs). A primary difference in these implementations lies in the covariance structures that are supported “out of the box”. In particular, PROC GLIMMIX and mmrm are the only procedures which provide support for many of the most common MMRM covariance structures. Most covariance structures can be implemented in gls, though users are required to define them manually. lmer and glmmTMB are more limited. We find that mmmrm converges more quickly than other R implementations while also producing estimates that are virtually identical to PROC GLIMMIX’s." + }, + { + "objectID": "Comp/r-sas_mmrm.html#fev-data", + "href": "Comp/r-sas_mmrm.html#fev-data", + "title": "R vs SAS MMRM", + "section": "FEV Data", + "text": "FEV Data\nThe FEV dataset contains measurements of FEV1 (forced expired volume in one second), a measure of how quickly the lungs can be emptied. Low levels of FEV1 may indicate chronic obstructive pulmonary disease (COPD). It is summarized below.\n Stratified by ARMCD\n Overall PBO TRT\n n 800 420 380\n USUBJID (%)\n PT[1-200] 200 105 (52.5) 95 (47.5)\n AVISIT\n VIS1 200 105 95\n VIS2 200 105 95\n VIS3 200 105 95\n VIS4 200 105 95\n RACE (%)\n Asian 280 (35.0) 152 (36.2) 128 (33.7)\n Black or African American 300 (37.5) 184 (43.8) 116 (30.5)\n White 220 (27.5) 84 (20.0) 136 (35.8)\n SEX = Female (%) 424 (53.0) 220 (52.4) 204 (53.7)\n FEV1_BL (mean (SD)) 40.19 (9.12) 40.46 (8.84) 39.90 (9.42)\n FEV1 (mean (SD)) 42.30 (9.32) 40.24 (8.67) 44.45 (9.51)\n WEIGHT (mean (SD)) 0.52 (0.23) 0.52 (0.23) 0.51 (0.23)\n VISITN (mean (SD)) 2.50 (1.12) 2.50 (1.12) 2.50 (1.12)\n VISITN2 (mean (SD)) -0.02 (1.03) 0.01 (1.07) -0.04 (0.98)" + }, + { + "objectID": "Comp/r-sas_mmrm.html#bcva-data", + "href": "Comp/r-sas_mmrm.html#bcva-data", + "title": "R vs SAS MMRM", + "section": "BCVA Data", + "text": "BCVA Data\nThe BCVA dataset contains data from a randomized longitudinal ophthalmology trial evaluating the change in baseline corrected visual acuity (BCVA) over the course of 10 visits. BCVA corresponds to the number of letters read from a visual acuity chart. A summary of the data is given below:\n Stratified by ARMCD\n Overall CTL TRT\n n 8605 4123 4482\n USUBJID (%)\n PT[1-1000] 1000 494 (49.4) 506 (50.6)\n AVISIT\n VIS1 983 482 501\n VIS2 980 481 499\n VIS3 960 471 489\n VIS4 946 458 488\n VIS5 925 454 471\n VIS6 868 410 458\n VIS7 816 388 428\n VIS8 791 371 420\n VIS9 719 327 392\n VIS10 617 281 336\n RACE (%)\n Asian 297 (29.7) 151 (30.6) 146 (28.9)\n Black or African American 317 (31.7) 149 (30.1) 168 (33.2)\n White 386 (38.6) 194 (39.3) 192 (37.9)\n BCVA_BL (mean (SD)) 75.12 (9.93) 74.90 (9.76) 75.40 (10.1)\n BCVA_CHG (mean (SD))\n VIS1 5.59 (1.31) 5.32 (1.23) 5.86 (1.33)\n VIS10 9.18 (2.91) 7.49 (2.58) 10.60 (2.36)" }, { - "objectID": "index.html", - "href": "index.html", - "title": "CAMIS - A PHUSE DVOST Working Group", - "section": "", - "text": "Introduction\nSeveral discrepancies have been discovered in statistical analysis results between different programming languages, even in fully qualified statistical computing environments. Subtle differences exist between the fundamental approaches implemented by each language, yielding differences in results which are each correct in their own right. The fact that these differences exist causes unease on the behalf of sponsor companies when submitting to a regulatory agency, as it is uncertain if the agency will view these differences as problematic. In its Statistical Software Clarifying Statement, the US Food and Drug Administration (FDA) states that it “FDA does not require use of any specific software for statistical analyses” and that “the computer software used for data management and statistical analysis should be reliable.” Observing differences across languages can reduce the analyst’s confidence in reliability and, by understanding the source of any discrepancies, one can reinstate confidence in reliability.\n\nMotivation\nThe goal of this project is to demystify conflicting results between software and to help ease the transitions to new languages by providing comparison and comprehensive explanations.\n\n\nRepository\nThe repository below provides examples of statistical methodology in different software and languages, along with a comparison of the results obtained and description of any discrepancies.\n\n\n\n\n\n\n\n\nMethods\nR\nSAS\nComparison\n\n\n\n\nSummary Statistics\nRounding\nR\nSAS\nR vs SAS\n\n\nSummary statistics\nR\nSAS\nR vs SAS\n\n\nGeneral Linear Models\nOne Sample t-test\nR\nSAS\nR vs SAS\n\n\nPaired t-test\nR\nSAS\nR vs SAS\n\n\nTwo Sample t-test\n\nSAS\n\n\n\nANOVA\nR\nSAS\nR vs SAS\n\n\nANCOVA\nR\n\n\n\n\nMANOVA\nR\nSAS\nR vs SAS\n\n\nLinear Regression\nR\nSAS\nR vs SAS\n\n\nGeneralized Linear Models\nLogistic Regression\nR\n\n\n\n\nPoisson/Negative Binomial Regression\n\n\n\n\n\nCategorical Repeated Measures\n\n\n\n\n\nCategorical Multiple Imputation\n\n\n\n\n\nNon-parametric Analysis\nWilcoxon signed rank\n\n\n\n\n\nMann-Whitney U/Wilcoxon rank sum\nR\n\n\n\n\nKolmogorov-Smirnov test\n\n\n\n\n\nKruskall-Wallis test\nR\nSAS\nR vs SAS\n\n\nFriedman test\n\n\n\n\n\nJonckheere test\n\n\n\n\n\nCategorical Data Analysis\nBinomial test\n\n\n\n\n\nMcNemar's test\nR\nSAS\nR vs SAS\n\n\nChi-Square Association/Fishers exact\nR\n\nR vs SAS\n\n\nCochran Mantel Haenszel\nR\nSAS\nR vs SAS\n\n\nConfidence Intervals for proportions\n\n\n\n\n\nLinear Mixed Models\nMMRM\nR\nSAS\nR vs SAS\n\n\nGeneralized Linear Mixed Models\nMMRM\n\n\n\n\n\nMultiple Imputation - Continuous Data MAR\nMCMC\n\n\n\n\n\nLinear regression\n\n\n\n\n\nPredictive Mean Matching\n\n\n\n\n\nPropensity Scores\n\n\n\n\n\nMultiple Imputation - Continuous Data MNAR\nDelta Adjustment/Tipping Point\n\n\n\n\n\nReference-Based Imputation/Sequential Methods\n\n\n\n\n\nReference-Based Imputation/Joint Modelling\n\n\n\n\n\nCorrelation\nPearson's/ Spearman's/ Kendall's Rank\nR\n\n\n\n\nSurvival Models\nKaplan-Meier Log-rank test and Cox-PH\nR\nSAS\nR vs SAS\n\n\nAccelerated Failure Time\n\n\n\n\n\nNon-proportional hazards methods\n\n\n\n\n\nSample size /Power calculations\nSingle timepoint analysis\n\n\n\n\n\nGroup-sequential designs\n\n\n\n\n\nMultivariate methods\nClustering\n\n\n\n\n\nFactor analysis\n\n\n\n\n\nPCA\n\n\n\n\n\nCanonical correlation\n\n\n\n\n\nPLS\n\n\n\n\n\nOther Methods\nSurvey statistics\n\n\n\n\n\nNearest neighbour\n\n\n\n\n\nCausal inference\n\n\n\n\n\nMachine learning" + "objectID": "Comp/r-sas_mmrm.html#ante-dependence-heterogeneous", + "href": "Comp/r-sas_mmrm.html#ante-dependence-heterogeneous", + "title": "R vs SAS MMRM", + "section": "Ante-dependence (heterogeneous)", + "text": "Ante-dependence (heterogeneous)\n\nPROC GLIMMIX\nPROC GLIMMIX DATA = fev_data;\nCLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;\nMODEL FEV1 = AVISIT|ARMCD / ddfm=satterthwaite solution chisq;\nRANDOM AVISIT / subject=USUBJID type=ANTE(1);\n\n\n\nmmrm\nmmrm(\n formula = FEV1 ~ ARMCD * AVISIT + adh(VISITN | USUBJID),\n data = fev_data\n)" }, { - "objectID": "Comp/r-sas_cmh.html", - "href": "Comp/r-sas_cmh.html", - "title": "R vs SAS CMH", - "section": "", - "text": "The CMH procedure tests for conditional independence in partial contingency tables for a 2 x 2 x K design. However, it can be generalized to tables of X x Y x K dimensions.\n\n\n\n\n\n\n\n\n\n\n\nFor the remainder of this document, we adopt the following naming convention when referring to variables of a contingency table:\n\nX = exposure\nY = response\nK = control\n\n\n\n\nThe scale of the exposure (X) and response (Y) variables dictate which test statistic is computed for the contingency table. Each test statistic is evaluated on different degrees of freedom (df):\n\nGeneral association statistic (X and Y both nominal) results in (X-1) * (Y-1) dfs\nRow mean scores statistic (X is nominal and Y is ordinal) results in X-1 dfs\nNonzero correlation statistic (X and Y both ordinal) results in 1 df" + "objectID": "Comp/r-sas_mmrm.html#ante-dependence-homogeneous", + "href": "Comp/r-sas_mmrm.html#ante-dependence-homogeneous", + "title": "R vs SAS MMRM", + "section": "Ante-dependence (homogeneous)", + "text": "Ante-dependence (homogeneous)\n\nmmrm\nmmrm(\n formula =FEV1 ~ ARMCD * AVISIT + ad(VISITN | USUBJID),\n data = fev_data\n)" }, { - "objectID": "Comp/r-sas_cmh.html#naming-convention", - "href": "Comp/r-sas_cmh.html#naming-convention", - "title": "R vs SAS CMH", - "section": "", - "text": "For the remainder of this document, we adopt the following naming convention when referring to variables of a contingency table:\n\nX = exposure\nY = response\nK = control" + "objectID": "Comp/r-sas_mmrm.html#auto-regressive-heterogeneous", + "href": "Comp/r-sas_mmrm.html#auto-regressive-heterogeneous", + "title": "R vs SAS MMRM", + "section": "Auto-regressive (heterogeneous)", + "text": "Auto-regressive (heterogeneous)\n\nPROC GLIMMIX\nPROC GLIMMIX DATA = fev_data;\nCLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;\nMODEL FEV1 = AVISIT|ARMCD / ddfm=satterthwaite solution chisq;\nRANDOM AVISIT / subject=USUBJID type=ARH(1);\n\n\n\nmmrm\nmmrm(\n formula = FEV1 ~ ARMCD * AVISIT + ar1h(VISITN | USUBJID),\n data = fev_data\n)\n\n\n\ngls\ngls(\n formula = FEV1 ~ ARMCD * AVISIT,\n data = fev_data,\n correlation = corCAR1(form = ~AVISIT | USUBJID),\n weights = varIdent(form = ~1|AVISIT),\n na.action = na.omit\n)" }, { - "objectID": "Comp/r-sas_cmh.html#scale", - "href": "Comp/r-sas_cmh.html#scale", - "title": "R vs SAS CMH", - "section": "", - "text": "The scale of the exposure (X) and response (Y) variables dictate which test statistic is computed for the contingency table. Each test statistic is evaluated on different degrees of freedom (df):\n\nGeneral association statistic (X and Y both nominal) results in (X-1) * (Y-1) dfs\nRow mean scores statistic (X is nominal and Y is ordinal) results in X-1 dfs\nNonzero correlation statistic (X and Y both ordinal) results in 1 df" + "objectID": "Comp/r-sas_mmrm.html#auto-regressive-homogeneous", + "href": "Comp/r-sas_mmrm.html#auto-regressive-homogeneous", + "title": "R vs SAS MMRM", + "section": "Auto-regressive (homogeneous)", + "text": "Auto-regressive (homogeneous)\n\nPROC GLIMMIX\nPROC GLIMMIX DATA = fev_data;\nCLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;\nMODEL FEV1 = ARMCD|AVISIT / ddfm=satterthwaite solution chisq;\nRANDOM AVISIT / subject=USUBJID type=AR(1);\n\n\n\nmmrm\nmmrm(\n formula = FEV1 ~ ARMCD * AVISIT + ar1(VISITN | USUBJID),\n data = fev_data\n)\n\n\n\ngls\ngls(\n formula = FEV1 ~ ARMCD * AVISIT,\n data = fev_data,\n correlation = corCAR1(form = ~AVISIT | USUBJID),\n na.action = na.omit\n)\n\n\n\nglmmTMB\nglmmTMB(\n FEV1 ~ ARMCD * AVISIT + ar1(0 + AVISIT | USUBJID),\n dispformula = ~ 0,\n data = fev_data\n)" }, { - "objectID": "Comp/r-sas_cmh.html#data", - "href": "Comp/r-sas_cmh.html#data", - "title": "R vs SAS CMH", - "section": "Data", - "text": "Data\nTo begin investigating the differences in the SAS and R implementations of the CMH test, we decided to use the CDISC Pilot data set, which is publicly available on the PHUSE Test Data Factory repository. We applied very basic filtering conditions upfront (see below) and this data set served as the basis of the examples to follow.\n\n\n# A tibble: 6 × 36\n STUDYID SITEID SITEGR1 USUBJID TRTSDT TRTEDT TRTP TRTPN AGE AGEGR1\n <chr> <chr> <chr> <chr> <date> <date> <chr> <dbl> <dbl> <chr> \n1 CDISCPI… 701 701 01-701… 2014-01-02 2014-07-02 Plac… 0 63 <65 \n2 CDISCPI… 701 701 01-701… 2012-08-05 2012-09-01 Plac… 0 64 <65 \n3 CDISCPI… 701 701 01-701… 2013-07-19 2014-01-14 Xano… 81 71 65-80 \n4 CDISCPI… 701 701 01-701… 2014-03-18 2014-03-31 Xano… 54 74 65-80 \n5 CDISCPI… 701 701 01-701… 2014-07-01 2014-12-30 Xano… 81 77 65-80 \n6 CDISCPI… 701 701 01-701… 2013-02-12 2013-03-09 Plac… 0 85 >80 \n# ℹ 26 more variables: AGEGR1N <dbl>, RACE <chr>, RACEN <dbl>, SEX <chr>,\n# ITTFL <chr>, EFFFL <chr>, COMP24FL <chr>, AVISIT <chr>, AVISITN <dbl>,\n# VISIT <chr>, VISITNUM <dbl>, ADY <dbl>, ADT <date>, PARAMCD <chr>,\n# PARAM <chr>, PARAMN <dbl>, AVAL <dbl>, ANL01FL <chr>, DTYPE <chr>,\n# AWRANGE <chr>, AWTARGET <dbl>, AWTDIFF <dbl>, AWLO <dbl>, AWHI <dbl>,\n# AWU <chr>, QSSEQ <dbl>" + "objectID": "Comp/r-sas_mmrm.html#compound-symmetry-heterogeneous", + "href": "Comp/r-sas_mmrm.html#compound-symmetry-heterogeneous", + "title": "R vs SAS MMRM", + "section": "Compound symmetry (heterogeneous)", + "text": "Compound symmetry (heterogeneous)\n\nPROC GLIMMIX\nPROC GLIMMIX DATA = fev_data;\nCLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;\nMODEL FEV1 = AVISIT|ARMCD / ddfm=satterthwaite solution chisq;\nRANDOM AVISIT / subject=USUBJID type=CSH;\n\n\n\nmmrm\nmmrm(\n formula = FEV1 ~ ARMCD * AVISIT + csh(VISITN | USUBJID),\n data = fev_data\n)\n\n\n\ngls\ngls(\n formula = FEV1 ~ ARMCD * AVISIT,\n data = fev_data,\n correlation = corCompSymm(form = ~AVISIT | USUBJID),\n weights = varIdent(form = ~1|AVISIT),\n na.action = na.omit\n)\n\n\n\nglmmTMB\nglmmTMB(\n FEV1 ~ ARMCD * AVISIT + cs(0 + AVISIT | USUBJID),\n dispformula = ~ 0,\n data = fev_data\n)" }, { - "objectID": "Comp/r-sas_cmh.html#schemes", - "href": "Comp/r-sas_cmh.html#schemes", - "title": "R vs SAS CMH", - "section": "Schemes", - "text": "Schemes\nIn order to follow a systematic approach to testing, and to cover variations in the CMH test, we considered the traditional 2 x 2 x K design as well as scenarios where the generalized CMH test is employed (e.g. 5 x 3 x 3).\nWe present 5 archetype test scenarios that illustrate diverging results, possibly related to sparse data and possibly considered edge cases.\n\n\n\n\n\n\n\n\n\n\nNumber\nSchema\nVariables\nRelevant Test\nDescription\n\n\n\n\n1\n2x2x2\nX = TRTP, Y = SEX, K = AGEGR1\nGeneral Association\nTRTP and AGEGR1 were limited to two categories, overall the the groups were rather balanced\n\n\n3\n2x2x3\nX = TRTP, Y = SEX, K = RACE\nGeneral Association\nGives back NaN in R because RACE is very imbalanced\n\n\n6\n2x5x2\nX = TRTP, Y = AVAL, K = SEX\nRow Means\nCompare Row Means results for R and SAS because Y is ordinal\n\n\n9\n3x5x17\nX = TRTP, Y = AVAL, K = SITEID\nRow Means\nSITEID has many strata and provokes sparse groups, AVAL is ordinal, therefore row means statistic applies here, R threw an error\n\n\n10\n5x3x3\nX = AVAL, Y = AGEGR1, K = TRTP\nCorrelation\nX and Y are ordinal variables and therefore the correlation statistics has to be taken here" + "objectID": "Comp/r-sas_mmrm.html#compound-symmetry-homogeneous", + "href": "Comp/r-sas_mmrm.html#compound-symmetry-homogeneous", + "title": "R vs SAS MMRM", + "section": "Compound symmetry (homogeneous)", + "text": "Compound symmetry (homogeneous)\n\nPROC GLIMMIX\nPROC GLIMMIX DATA = fev_data;\nCLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;\nMODEL FEV1 = AVISIT|ARMCD / ddfm=satterthwaite solution chisq;\nRANDOM AVISIT / subject=USUBJID type=CS;\n\n\n\nmmrm\nmmrm(\n formula = FEV1 ~ ARMCD * AVISIT + cs(VISITN | USUBJID),\n data = fev_data\n)\n\n\n\ngls\ngls(\n formula = FEV1 ~ ARMCD * AVISIT,\n data = fev_data,\n correlation = corCompSymm(form = ~AVISIT | USUBJID),\n na.action = na.omit\n)" }, { - "objectID": "Comp/r-sas_cmh.html#cmh-statistics", - "href": "Comp/r-sas_cmh.html#cmh-statistics", - "title": "R vs SAS CMH", - "section": "CMH Statistics", - "text": "CMH Statistics\n\n\nLoading required package: vcd\n\n\nLoading required package: grid\n\n\nLoading required package: gnm\n\n\n\nAttaching package: 'vcdExtra'\n\n\nThe following object is masked from 'package:dplyr':\n\n summarise\n\n\nscenarios this is a test\nAs it can be seen, there are two schemata where R does not provide any results:\n\n\n\n\n\n\n\n\n\nTest\nChi-Square\ndf\np-value\n\n\nSAS\nR\nSAS\nR\nSAS\nR\n\n\n\n\n1\nCorrelation\n0.01767040\n0.0008689711\n1\n1\n0.89424870\n0.9764831\n\n\nRow Means\n0.01767040\n2.4820278527\n1\n2\n0.89424870\n0.2890910\n\n\nGeneral Association\n0.01767040\n2.4820278527\n1\n2\n0.89424870\n0.2890910\n\n\n3*\nCorrelation\n0.00278713\nNaN\n1\n1\n0.95789662\nNaN\n\n\nRow Means\n2.38606985\nNaN\n2\n2\n0.30329938\nNaN\n\n\nGeneral Association\n2.38606985\nNaN\n2\n2\n0.30329938\nNaN\n\n\n6\nCorrelation\n1.14720000\n0.1115439738\n1\n1\n0.28410000\n0.7383931\n\n\nRow Means\n1.14720000\n2.6632420358\n1\n2\n0.28410000\n0.2640489\n\n\nGeneral Association\n2.56720000\n6.5238474681\n4\n8\n0.63260000\n0.5887637\n\n\n9†\nCorrelation\n0.08544312\nNA\n1\nNA\n0.77005225\nNA\n\n\nRow Means\n2.47631367\nNA\n2\nNA\n0.28991809\nNA\n\n\nGeneral Association\n7.03387844\nNA\n8\nNA\n0.53298189\nNA\n\n\n10\nCorrelation\n2.73816085\n0.8715295423\n1\n1\n0.09797747\n0.3505322\n\n\nRow Means\n4.40701092\n3.0445270087\n4\n4\n0.35371641\n0.5504018\n\n\nGeneral Association\n5.73053819\n5.7305381934\n8\n8\n0.67738613\n0.6773861\n\n\n\n* Reason for NaN in schema 3: Stratum k = AMERICAN INDIAN OR ALASKA NATIVE can not be compared because there are only values for one treatment and one gender.\n\n\n† Reason for Error 4: For large sparse table (many strata) CMHTest will occasionally throw an error in solve.default(AVA) because of singularity" + "objectID": "Comp/r-sas_mmrm.html#spatial-exponential", + "href": "Comp/r-sas_mmrm.html#spatial-exponential", + "title": "R vs SAS MMRM", + "section": "Spatial exponential", + "text": "Spatial exponential\n\nPROC GLIMMIX\nPROC GLIMMIX DATA = fev_data;\nCLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;\nMODEL FEV1 = AVISIT|ARMCD / ddfm=satterthwaite solution chisq;\nRANDOM / subject=USUBJID type=sp(exp)(visitn) rcorr;\n\n\n\nmmrm\nmmrm(\n formula = FEV1 ~ ARMCD * AVISIT + sp_exp(VISITN | USUBJID),\n data = fev_data\n)\n\n\n\ngls\ngls(\n formula = FEV1 ~ ARMCD * AVISIT,\n data = fev_data,\n correlation = corExp(form = ~AVISIT | USUBJID),\n weights = varIdent(form = ~1|AVISIT),\n na.action = na.omit\n)\n\n\n\nglmmTMB\n# NOTE: requires use of coordinates\nglmmTMB(\n FEV1 ~ ARMCD * AVISIT + exp(0 + AVISIT | USUBJID),\n dispformula = ~ 0,\n data = fev_data\n)" }, { - "objectID": "Comp/r-sas_survival.html", - "href": "Comp/r-sas_survival.html", - "title": "R vs SAS - Kaplan Meier and Cox-proportion hazards modelling", - "section": "", - "text": "Results from the examples shown for R here and SAS here were compared below.\nComparing the non-stratified model results side-by-side, the CIs for the quartile estimates and landmark estimates are different between R and SAS. HR and CI also have slight differences.\n\n\n\n\n\n\n\n\n\n\n\nThe default methods for handling ties in a Cox regression model are different which can lead to a different result for the Hazard ratio and associated confidence interval.\nR uses “efron” by default. SAS uses “breslow” by default. Both R and SAS are able to change these default options. By making the changes to the code below, we can force R to use “breslow” to match SAS, or SAS to use “efron” to match R. When the software use the same methods, then we obtain an identical HR and CI.\n\nR: change method for ties to use “breslow”\n\n\nfit.cox <- coxph(Surv(LENFOLY, FSTAT) ~ AFB, ties = \"breslow\", data = dat)\n\n\nSAS: change method for ties to use “efron”\n\n\nproc phreg data=dat;\nclass afb;\nmodel lenfol*fstat(0) = afb/rl ties = efron;\nrun;\n\nIf there are no tied event times, then the methods are equivalent.\nThe Breslow approximation is the easiest to program and hence it historically became the first option coded for almost all software. It then ended up as the default option when other options were added in order to maintain “backwards compatibility”. The Efron option is more accurate if there are a large number of ties, and it was therefore selected as the default option in R. In practice the number of ties is usually small, in which case all the methods are statistically indistinguishable.\nFrom the arguments of coxph in R, there are three possible choices for handling tied event times ‘ties=breslow’, ‘ties=efron’, or ‘ties=’clogit’. This last option is an exact partial likelihood approach. See [here](\ncoxph function - RDocumentation) for more detail.\n\n\n\nThe default methods for calculation of the confidence interval of a KM estimator are different in the two languages (for example, for calculation of the CI associated with the Median Survival estimate, the 25th percentile and the 75th percentile).\nR uses “log” by default, and SAS uses “log-log” by default. As shown below, using ‘conf.type’ option, R can be forced to use the “log-log” method to match SAS. Alternatively, using the ‘conftype=’ option, SAS can be forced to use the “log” method to match R.\n\nR: change to “log-log”\n\n\nfit.km <- survfit(Surv(LENFOLY, FSTAT) ~ AFB, conf.type = \"log-log\", data = dat)\n\n\nSAS: change to “log”\n\n\nproc lifetest data=dat conftype=log;\ntime lenfoly*fstat(0);\nstrata afb;\nrun;\n\n“log-log” prevents the problem of having confidence intervals of >1 or <0, which might happen if using “log” transformation. However, both R and SAS will clip the interval at [0, 1] and report a bound >1 as 1 and <0 as 0.\nFrom a reference: The appeal of the log-log interval is clear, but the log-scale interval has the advantage of variance stabilization. As a result, simulation studies have generally found it to have better (closer to nominal) coverage; for this reason, it is the default in the survival package.\nNow if we change the confidence interval type in SAS to “log” and tie handling to “efron”, the results will be identical to the results in R.\n\n\n\n\n\n\n\n\n\nBelow is the side-by-side comparison for stratified analysis with default methods in SAS matched to R’s, the results are also identical." + "objectID": "Comp/r-sas_mmrm.html#toeplitz-heterogeneous", + "href": "Comp/r-sas_mmrm.html#toeplitz-heterogeneous", + "title": "R vs SAS MMRM", + "section": "Toeplitz (heterogeneous)", + "text": "Toeplitz (heterogeneous)\n\nPROC GLIMMIX\nPROC GLIMMIX DATA = fev_data;\nCLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;\nMODEL FEV1 = AVISIT|ARMCD / ddfm=satterthwaite solution chisq;\nRANDOM AVISIT / subject=USUBJID type=TOEPH;\n\n\n\nmmrm\nmmrm(\n formula = FEV1 ~ ARMCD * AVISIT + toeph(AVISIT | USUBJID),\n data = fev_data\n)\n\n\n\nglmmTMB\n glmmTMB(\n FEV1 ~ ARMCD * AVISIT + toep(0 + AVISIT | USUBJID),\n dispformula = ~ 0,\n data = fev_data\n)" }, { - "objectID": "Comp/r-sas_survival.html#reason-1-cox-regression-handling-of-tied-survival-times", - "href": "Comp/r-sas_survival.html#reason-1-cox-regression-handling-of-tied-survival-times", - "title": "R vs SAS - Kaplan Meier and Cox-proportion hazards modelling", - "section": "", - "text": "The default methods for handling ties in a Cox regression model are different which can lead to a different result for the Hazard ratio and associated confidence interval.\nR uses “efron” by default. SAS uses “breslow” by default. Both R and SAS are able to change these default options. By making the changes to the code below, we can force R to use “breslow” to match SAS, or SAS to use “efron” to match R. When the software use the same methods, then we obtain an identical HR and CI.\n\nR: change method for ties to use “breslow”\n\n\nfit.cox <- coxph(Surv(LENFOLY, FSTAT) ~ AFB, ties = \"breslow\", data = dat)\n\n\nSAS: change method for ties to use “efron”\n\n\nproc phreg data=dat;\nclass afb;\nmodel lenfol*fstat(0) = afb/rl ties = efron;\nrun;\n\nIf there are no tied event times, then the methods are equivalent.\nThe Breslow approximation is the easiest to program and hence it historically became the first option coded for almost all software. It then ended up as the default option when other options were added in order to maintain “backwards compatibility”. The Efron option is more accurate if there are a large number of ties, and it was therefore selected as the default option in R. In practice the number of ties is usually small, in which case all the methods are statistically indistinguishable.\nFrom the arguments of coxph in R, there are three possible choices for handling tied event times ‘ties=breslow’, ‘ties=efron’, or ‘ties=’clogit’. This last option is an exact partial likelihood approach. See [here](\ncoxph function - RDocumentation) for more detail." + "objectID": "Comp/r-sas_mmrm.html#toeplitz-homogeneous", + "href": "Comp/r-sas_mmrm.html#toeplitz-homogeneous", + "title": "R vs SAS MMRM", + "section": "Toeplitz (homogeneous)", + "text": "Toeplitz (homogeneous)\n\nPROC GLIMMIX\nPROC GLIMMIX DATA = fev_data;\nCLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;\nMODEL FEV1 = AVISIT|ARMCD / ddfm=satterthwaite solution chisq;\nRANDOM AVISIT / subject=USUBJID type=TOEP;\n\n\n\nmmrm\nmmrm(\n formula = FEV1 ~ ARMCD * AVISIT + toep(AVISIT | USUBJID),\n data = fev_data\n)" }, { - "objectID": "Comp/r-sas_survival.html#reason-2-kaplan-meier-median-survival-confidence-intervals", - "href": "Comp/r-sas_survival.html#reason-2-kaplan-meier-median-survival-confidence-intervals", - "title": "R vs SAS - Kaplan Meier and Cox-proportion hazards modelling", - "section": "", - "text": "The default methods for calculation of the confidence interval of a KM estimator are different in the two languages (for example, for calculation of the CI associated with the Median Survival estimate, the 25th percentile and the 75th percentile).\nR uses “log” by default, and SAS uses “log-log” by default. As shown below, using ‘conf.type’ option, R can be forced to use the “log-log” method to match SAS. Alternatively, using the ‘conftype=’ option, SAS can be forced to use the “log” method to match R.\n\nR: change to “log-log”\n\n\nfit.km <- survfit(Surv(LENFOLY, FSTAT) ~ AFB, conf.type = \"log-log\", data = dat)\n\n\nSAS: change to “log”\n\n\nproc lifetest data=dat conftype=log;\ntime lenfoly*fstat(0);\nstrata afb;\nrun;\n\n“log-log” prevents the problem of having confidence intervals of >1 or <0, which might happen if using “log” transformation. However, both R and SAS will clip the interval at [0, 1] and report a bound >1 as 1 and <0 as 0.\nFrom a reference: The appeal of the log-log interval is clear, but the log-scale interval has the advantage of variance stabilization. As a result, simulation studies have generally found it to have better (closer to nominal) coverage; for this reason, it is the default in the survival package.\nNow if we change the confidence interval type in SAS to “log” and tie handling to “efron”, the results will be identical to the results in R.\n\n\n\n\n\n\n\n\n\nBelow is the side-by-side comparison for stratified analysis with default methods in SAS matched to R’s, the results are also identical." + "objectID": "Comp/r-sas_mmrm.html#unstructured", + "href": "Comp/r-sas_mmrm.html#unstructured", + "title": "R vs SAS MMRM", + "section": "Unstructured", + "text": "Unstructured\n\nPROC GLIMMIX\nPROC GLIMMIX DATA = fev_data;\nCLASS AVISIT(ref = 'VIS1') ARMCD(ref = 'PBO') USUBJID;\nMODEL FEV1 = ARMCD|AVISIT / ddfm=satterthwaite solution chisq;\nRANDOM AVISIT / subject=USUBJID type=un;\n\n\n\nmmrm\nmmrm(\n formula = FEV1 ~ ARMCD * AVISIT + us(AVISIT | USUBJID),\n data = fev_data\n)\n\n\n\ngls\ngls(\n formula = FEV1 ~ ARMCD * AVISIT,\n data = fev_data,\n correlation = corSymm(form = ~AVISIT | USUBJID),\n weights = varIdent(form = ~1|AVISIT),\n na.action = na.omit\n)\n\n\n\nlmer\nlmer(\n FEV1 ~ ARMCD * AVISIT + (0 + AVISIT | USUBJID),\n data = fev_data,\n control = lmerControl(check.nobs.vs.nRE = \"ignore\"),\n na.action = na.omit\n)\n\n\n\nglmmTMB\nglmmTMB(\n FEV1 ~ ARMCD * AVISIT + us(0 + AVISIT | USUBJID),\n dispformula = ~ 0,\n data = fev_data\n)" }, { - "objectID": "Comp/r-sas_survival.html#differences-observed-in-the-km-estimators", - "href": "Comp/r-sas_survival.html#differences-observed-in-the-km-estimators", - "title": "R vs SAS - Kaplan Meier and Cox-proportion hazards modelling", - "section": "Differences Observed in the KM Estimators", - "text": "Differences Observed in the KM Estimators\nSuppose we are interested to know the 25%, 50% and 75% quartile estimates, and the day 80, 100, and 120 estimates.\nBelow is the R code:\n\nfit.km <- survfit(Surv(time, status) ~ 1, conf.type = \"log-log\", data = test)\n## quantile estimates\nquantile(fit.km, probs = c(0.25, 0.5, 0.75))\n## landmark estimates at 80, 100, 120-day\nsummary(fit.km, times = c(80, 100, 120), extend = T)\n\nBelow is the SAS code:\n\nproc lifetest data=dat outsurv=_SurvEst timelist= 80 100 120 reduceout stderr; \ntime lenfoly*fstat(0);\nrun;\n\nBelow is the side-by-side comparison:" + "objectID": "Comp/r-sas_mmrm.html#convergence-times", + "href": "Comp/r-sas_mmrm.html#convergence-times", + "title": "R vs SAS MMRM", + "section": "Convergence Times", + "text": "Convergence Times\n\nFEV Data\nThe mmrm, PROC GLIMMIX, gls, lmer, and glmmTMB functions are applied to the FEV dataset 10 times. The convergence times are recorded for each replicate and are reported in the table below.\n\n\n\nComparison of convergence times: milliseconds\n\n\nImplementation\nMedian\nFirst Quartile\nThird Quartile\n\n\n\n\nmmrm\n56.15\n55.76\n56.30\n\n\nPROC GLIMMIX\n100.00\n100.00\n100.00\n\n\nlmer\n247.02\n245.25\n257.46\n\n\ngls\n687.63\n683.50\n692.45\n\n\nglmmTMB\n715.90\n708.70\n721.57\n\n\n\n\n\nIt is clear from these results that mmrm converges significantly faster than other R functions. Though not demonstrated here, this is generally true regardless of the sample size and covariance structure used. mmrm is faster than PROC GLIMMIX.\n\n\nBCVA Data\nThe MMRM implementations are now applied to the BCVA dataset 10 times. The convergence times are presented below.\n\n\n\nComparison of convergence times: seconds\n\n\nImplementation\nMedian\nFirst Quartile\nThird Quartile\n\n\n\n\nmmrm\n3.36\n3.32\n3.46\n\n\nglmmTMB\n18.65\n18.14\n18.87\n\n\nPROC GLIMMIX\n36.25\n36.17\n36.29\n\n\ngls\n164.36\n158.61\n165.93\n\n\nlmer\n165.26\n157.46\n166.42\n\n\n\n\n\nWe again find that mmrm produces the fastest convergence times on average." }, { - "objectID": "Comp/r-sas_survival.html#reasons", - "href": "Comp/r-sas_survival.html#reasons", - "title": "R vs SAS - Kaplan Meier and Cox-proportion hazards modelling", - "section": "Reasons", - "text": "Reasons\nThe reasons for the differences are because:\nReason 1: Survival estimate does not cross the 50% percentile.\nThe kth quantile for a survival curve S(t) is the location at which a horizontal line at height p= 1-k intersects the plot of S(t) as shown in the KM curve below. Since S(t) is a step function, it is possible for the curve to have a horizontal segment at exactly 1-k, in which case the midpoint of the horizontal segment is returned.\nFor example, using the data above, the survival probability is exactly 0.5 at time=87 and remains at 0.5 until the last censored observation at 118.\n\n\n\n\n\n\n\n\n\nWhen using R, the median is the smallest time which survival estimate is <= 0.5 –> (87+118) / 2 = 102.5. However, SAS searches the smallest time which survival estimate is < 0.5, which does not exist in this dataset, so it gives “NE” (Not evaluable).\n\npl <- survminer::ggsurvplot(fit.km, \n conf.int = TRUE,\n ggtheme = theme_light()) \npl$plot + geom_hline(yintercept = 0.5, color = \"black\", linetype = \"solid\") \n\nsummary(fit.km)\n\nReason 2: Last event censored and prior to the required landmark estimate.\nFor the 120-day event-free estimate, SAS considers that 120 days is beyond the maximum observed day in the data (which was a censored event at time =118). Therefore, SAS considers this as Unknown and returns a result of “NE” (Not-evaluable). However, R uses the rate at last observed censored date to estimate the 120-day event free rate. As the event-free estimate at time of the last censored event at 118 was 0.5 (0.184, 0.753), R makes the assumption that this is the best estimate for the event-free rate at Time =120.\nIf we change the last observation in the dataset to be an event (instead of censored), R and SAS will both give 0 for the event-free survival estimate, because it is for sure that all subjects did not survive beyond 120 days.\n\ntest <- tibble(time = c(54, 75, 77, 84, 87, 92, 103, 105, 112, 118), \n status = c(1, 1, 1, 1, 1, 0, 0, 0, 0, 1))\n\ntest\n\n# A tibble: 10 × 2\n time status\n <dbl> <dbl>\n 1 54 1\n 2 75 1\n 3 77 1\n 4 84 1\n 5 87 1\n 6 92 0\n 7 103 0\n 8 105 0\n 9 112 0\n10 118 1" + "objectID": "Comp/r-sas_mmrm.html#marginal-treatment-effect-estimates-comparison", + "href": "Comp/r-sas_mmrm.html#marginal-treatment-effect-estimates-comparison", + "title": "R vs SAS MMRM", + "section": "Marginal Treatment Effect Estimates Comparison", + "text": "Marginal Treatment Effect Estimates Comparison\nWe next estimate the marginal mean treatment effects for each visit in the FEV and BCVA datasets using the MMRM fitting procedures. All R implementations’ estimates are reported relative to PROC GLIMMIX’s estimates. Convergence status is also reported.\n\nFEV Data\n\n\n\n\n\n\n\n\n\nThe R procedures’ estimates are very similar to those output by PROC GLIMMIX, though mmrm and gls generate the estimates that are closest to those produced when using SAS. All methods converge using their default optimization arguments.\n\n\nBCVA Data\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nmmrm, gls and lmer produce estimates that are virtually identical to PROC GLIMMIX’s, while glmmTMB does not. This is likely explained by glmmTMB’s failure to converge. Note too that lmer fails to converge." }, { - "objectID": "Comp/r-sas_rounding.html", - "href": "Comp/r-sas_rounding.html", - "title": "R v SAS rounding", - "section": "", - "text": "Rounding; R and SAS\nOn comparing the documentation of rounding rules for both languages, it will be noted that the default rounding rule (implemented in the respective language’s round() function) are different. Numerical differences arise in the knife-edge case where the number being rounded is equidistant between the two possible results. The round() function in SAS will round the number ‘away from zero’, meaning that 12.5 rounds to the integer 13. The round() function in Base R will round the number ‘to even’, meaning that 12.5 rounds to the integer 12. SAS does provide the rounde() function which rounds to even and the janitor package in R contains a function round_half_up() that rounds away from zero. In this use case, SAS produces a correct result from its round() function, based on its documentation, as does R. Both are right based on what they say they do, but they produce different results (Rimler, M.S. et al.).\nReferences\nRimler M.S., Rickert J., Jen M-H., Stackhouse M. Understanding differences in statistical methodology implementations across programming languages (2022, Fall). ASA Biopharmaceutical Report Issue 3, Volume 29.  Retrieved from https://higherlogicdownload.s3.amazonaws.com/AMSTAT/fa4dd52c-8429-41d0-abdf-0011047bfa19/UploadedImages/BIOP%20Report/BioPharm_fall2022FINAL.pdf" + "objectID": "Comp/r-sas_mmrm.html#impact-of-missing-data-on-convergence-rates", + "href": "Comp/r-sas_mmrm.html#impact-of-missing-data-on-convergence-rates", + "title": "R vs SAS MMRM", + "section": "Impact of Missing Data on Convergence Rates", + "text": "Impact of Missing Data on Convergence Rates\nThe results of the previous benchmark suggest that the amount of patients missing from later time points affect certain implementations’ capacity to converge. We investigate this further by simulating data using a data-generating process similar to that of the BCVA datasets, though with various rates of patient dropout.\nTen datasets of 200 patients are generated each of the following levels of missingness: none, mild, moderate, and high. In all scenarios, observations are missing at random. The number patients observed at each visit is obtained for one replicated dataset at each level of missingness is presented in the table below.\n\n\n\nNumber of patients per visit\n\n\n\nnone\nmild\nmoderate\nhigh\n\n\n\n\nVIS01\n200\n196.7\n197.6\n188.1\n\n\nVIS02\n200\n195.4\n194.4\n182.4\n\n\nVIS03\n200\n195.1\n190.7\n175.2\n\n\nVIS04\n200\n194.1\n188.4\n162.8\n\n\nVIS05\n200\n191.6\n182.5\n142.7\n\n\nVIS06\n200\n188.2\n177.3\n125.4\n\n\nVIS07\n200\n184.6\n168.0\n105.9\n\n\nVIS08\n200\n178.5\n155.4\n82.6\n\n\nVIS09\n200\n175.3\n139.9\n58.1\n\n\nVIS10\n200\n164.1\n124.0\n39.5\n\n\n\n\n\nThe convergence rates of all implementations for stratified by missingness level is presented in the plot below.\n\n\n\n\n\n\n\n\n\nmmrm, gls, and PROC GLIMMIX are resilient to missingness, only exhibiting some convergence problems in the scenarios with the most missingness. These implementations converged in all the other scenarios’ replicates. glmmTMB, on the other hand, has convergence issues in the no-, mild-, and high-missingness datasets, with the worst convergence rate occurring in the datasets with the most dropout. Finally, lmer is unreliable in all scenarios, suggesting that it’s convergence issues stem from something other than the missing observations.\nNote that the default optimization schemes are used for each method; these schemes can be modified to potentially improve convergence rates.\nA more comprehensive simulation study using data-generating processes similar to the one used here is outlined in the simulations/missing-data-benchmarks subdirectory. In addition to assessing the effect of missing data on software convergence rates, we also evaluate these methods’ fit times and empirical bias, variance, 95% coverage rates, type I error rates and type II error rates. mmrm is found to be the most most robust software for fitting MMRMs in scenarios where a large proportion of patients are missing from the last time points. Additionally, mmrm has the fastest average fit times regardless of the amount of missingness. All implementations considered produce similar empirical biases, variances, 95% coverage rates, type I error rates and type II error rates." }, { - "objectID": "Comp/r-sas_anova.html", - "href": "Comp/r-sas_anova.html", - "title": "R vs SAS Linear Models", + "objectID": "Comp/r-sas_mcnemar.html", + "href": "Comp/r-sas_mcnemar.html", + "title": "R v SAS McNemar’s test", "section": "", - "text": "Matching Contrasts: R and SAS\nIt is recommended to use the emmeans package when attempting to match contrasts between R and SAS. In SAS, all contrasts must be manually defined, whereas in R, we have many ways to use pre-existing contrast definitions. The emmeans package makes simplifies this process, and provides syntax that is similar to the syntax of SAS.\nThis is how we would define a contrast in SAS.\n\n# In SAS\nproc glm data=work.mycsv;\n class drug;\n model post = drug pre / solution;\n estimate 'C vs A' drug -1 1 0;\n estimate 'E vs CA' drug -1 -1 2;\nrun;\n\nAnd this is how we would define the same contrast in R, using the emmeans package.\n\nlm(formula = post ~ pre + drug, data = df_trial) %>% \n emmeans(\"drug\") %>% \n contrast(method = list(\n \"C vs A\" = c(-1, 1, 0),\n \"E vs CA\" = c(-1, -1, 2)\n ))\n\nNote, however, that there are some cases where the scale of the parameter estimates between SAS and R is off, though the test statistics and p-values are identical. In these cases, we can adjust the SAS code to include a divisor. As far as we can tell, this difference only occurs when using the predefined Base R contrast methods like contr.helmert.\n\nproc glm data=work.mycsv;\n class drug;\n model post = drug pre / solution;\n estimate 'C vs A' drug -1 1 0 / divisor = 2;\n estimate 'E vs CA' drug -1 -1 2 / divisor = 6;\nrun;" + "text": "McNemar’s test; R and SAS\nIn R, the mcNemar function from the epibasix package can be used to perform McNemar’s test.\n\nX<-table(colds$age12,colds$age14)\nsummary(mcNemar(X))\n\nThe FREQ procedure can be used in SAS with the AGREE option to run the McNemar test, with OR, and RISKDIFF options stated for production of odds ratios and risk difference. These options were added as epibasix::mcNemar outputs the odds ratio and risk difference with confidence limits as default. In contrast to R, SAS outputs the Kappa coefficients with confident limits as default.\n\nproc freq data=colds;\n tables age12*age14 / agree or riskdiff;\nrun;\n\nWhen calculating the odds ratio and risk difference confidence limits, SAS is not treating the data as matched-pairs. There is advice on the SAS blog and SAS support page to amend this, which requires a lot of additional coding.\nR is using Edward’s continuity correction with no option to remove this. In contrast, there is no option to include Edward’s continuity correction in SAS, but this can be manually coded to agree with R. However, its use is controversial due to being seen as overly conservative.\nR’s use of the continuity correction is consistent with other functions within the epibasix package, which was categorised as ‘High Risk’ by the Risk Assessment Shiny App created by the R Validation Hub. Risk is quantified by the app through a number of metrics relating to maintenance and community usage. It was found that the author is no longer maintaining the package and there was no documentation available for certain methods used. Therefore, the use of the epibasix package is advised against and other packages may be more suitable.\nThe mcnemar.test function in the stats package provides the option to remove continuity corrections which results in a match with SAS. This function does not output any other coefficients for agreement/difference in proportions etc. but (if required) these can be achieved within other functions and/or packages.\n\nmcnemar.test(X, correct = FALSE)" }, { - "objectID": "Comp/r-sas_linear-regression.html", - "href": "Comp/r-sas_linear-regression.html", - "title": "R vs SAS Linear Regression", + "objectID": "SAS/ttest_2Sample.html", + "href": "SAS/ttest_2Sample.html", + "title": "Independant Two-Sample t-test", "section": "", - "text": "Summary of R vs SAS Comparison for Linear Regression\nTo date the lm function in R and proc reg in sas have been found to 100% agree.\nSee R and SAS for more detail." + "text": "Independent Two-Sample t-test in SAS\nThe null hypothesis of the Independent Samples t-test is, the means for the two populations are equal.\nIn SAS the following code was used to test the mean comparison (mean of Weight Gain) of two independent treatment groups (Treatment and Placebo).\nFor this example, we’re testing the significant difference in mean of Weight gain (WtGain) between treatment and placebo (trt_grp) using PROC TTEST procedure in SAS.\n\n proc ttest data=d1; \n class trt_grp; \n var WtGain; \n run; \n\nOutput:\n Figure 1: Test results for independent t-test using PROC TTEST in SAS\n\n\n\n\n\n\n\n\n\nHere the t-value is –0.70, degrees of freedom is 30 and P value is 0.4912 which is greater than 0.05, so we accept the null hypothesis that there is no evidence of a significant difference between the means of treatment groups. The mean in placebo group is 75.1875 and mean in Treatment group is 83.1250. The mean difference the treatment groups (Treatment-Placebo) is –7.9375 and the 95% CI for the mean difference is [–31.1984, 15.3234]. The 95% confidence interval includes a treatment difference of 0, which supports the conclusion that the data fail to provide any evidence of a difference between the treatment groups.\nNote: Before entering straight into the t-test we need to check whether the assumptions (like the equality of variance, the observations should be independent, observations should be normally distributed) are met or not. If normality is not satisfied, we may consider using a suitable non-parametric test.\n\nNormality: You can check for data to be normally distributed by plotting a histogram of the data by treatment. Alternatively, you can use the Shapiro-Wilk test or the Kolmogorov-Smirnov test. If the test is <0.05 and your sample is quite small then this suggests you should not use the t-test. However, if your sample in each treatment group is large (say >30 in each group), then you do not need to rely so heavily on the assumption that the data have an underlying normal distribution in order to apply the two-sample t-test. This is where plotting the data using histograms can help to support investigation into the normality assumption. We have checked the normality of the observations using the code below. Here for both the treatment groups we have P value greater than 0.05 (Shapiro-Wilk test is used), therefore the normality assumption is there for our data.\n\n\n proc univariate data=d1 normal; \n qqplot WtGain; \n by trt_grp; \n run; \n\nOutput:\n Figure 2: The results of normality test for Treatment group\n\n\n\n\n\n\n\n\n\n Figure 3: The results of normality test for Placebo group\n\n\n\n\n\n\n\n\n\n\nHomogeneity of variance (or Equality of variance): Homogeniety of variance will be tested by default in PROC TTEST itself by Folded F-test. In our case the P values is 0.6981 which is greater than 0.05. So we accept the null hypothesis of F-test, i.e. variances are same. Then we will consider the pooled method for t-test. If the F test is statistically significant (p<0.05), then the pooled t-test may give erroneous results. In this instance, if it is believed that the population variances may truly differ, then the Satterthwaite (unequal variances) analysis results should be used. These are provided in the SAS output alongside the Pooled results as default.\n\nOutput:\n Figure 4: Folded F-test result in PROC TTEST" }, { - "objectID": "Comp/r-sas_chi-sq.html", - "href": "Comp/r-sas_chi-sq.html", - "title": "R/SAS Chi-Squared Comparision", + "objectID": "SAS/cmh.html", + "href": "SAS/cmh.html", + "title": "CMH Test", "section": "", - "text": "Chi-Squared test is a hypothesis test for independent contingency tables, dependent on rows and column totals. The test assumes:\n\nobservations are independent of each other\nall values are 1 or more and at least 80% of the cells are greater than 5.\ndata should be categorical\n\nThe Chi-Squared statistic is found by:\n\\[\n\\chi^2=\\frac{\\sum(O-E)^2}{E}\n\\]\nWhere O is the observed and E is the expected.\nFor an r x c table (where r is the number of rows and c the number of columns), the Chi-squared distribution’s degrees of freedom is (r-1)*(c-1). The resultant statistic with correct degrees of freedom follows this distribution when its expected values are aligned with the assumptions of the test, under the null hypothesis. The resultant p value informs the magnitude of disagreement with the null hypothesis and not the magnitude of association\nFor this example we will use data about cough symptoms and history of bronchitis.\n\nbronch <- matrix(c(26, 247, 44, 1002), ncol = 2)\nrow.names(bronch) <- c(\"cough\", \"no cough\")\ncolnames(bronch) <- c(\"bronchitis\", \"no bronchitis\")\n\nTo a chi-squared test in R you will use the following code.\n\nchisq.test(bronch)\n\n\n Pearson's Chi-squared test with Yates' continuity correction\n\ndata: bronch\nX-squared = 11.145, df = 1, p-value = 0.0008424\n\n\nTo run a chi-squared test in SAS you used the following code.\n\nproc freq data=proj1.bronchitis;\ntables Cough*Bronchitis / chisq;\nrun;\n\nThe result in the “Chi-Square” section of the results table in SAS will not match R, in this case it will be 12.1804 with a p-value of 0.0005. This is because by default R does a Yate’s continuity adjustment. To change this set correct to false.\n\nchisq.test(bronch, correct = FALSE)\n\n\n Pearson's Chi-squared test\n\ndata: bronch\nX-squared = 12.18, df = 1, p-value = 0.0004829\n\n\nAlternatively, SAS also produces the correct chi-square value by default. It is the “Continuity Adj. Chi-Square” value in the results table." + "text": "The CMH procedure tests for conditional independence in partial contingency tables for a 2 x 2 x K design. However, it can be generalized to tables of X x Y x K dimensions.\n\n\nThe cmh test is calculated in SAS using the PROC FREQ procedure. By default, it outputs the chi square statistic, degrees of freedom and p-value for each of the three alternative hypothesis: general association, row means differ, and nonzero correlation. It is up to the statistical analyst or statistician to know which result is appropriate for their analysis.\nWhen the design of the contingency table is 2 x 2 x K (i.e, X == 2 levels, Y == 2 levels, K >= 2 levels), the Mantel-Haenszel Common Odds Ratio (odds ratio estimate, 95% CI, P-value) and the Breslow-Day Test for Homogeneity of the Odds Ratios (chi-square statistic, degrees of freedom, P-value) are also output.\nBelow is the syntax to conduct a CMH analysis in SAS:\n\nProc freq data = filtered_data; \ntables K * X * Y / cmh; \n* the order of K, X, and Y appearing on the line is important!;\nrun;" }, { - "objectID": "Comp/r-sas_manova.html", - "href": "Comp/r-sas_manova.html", - "title": "Multivariate Analysis of Variance in R vs SAS", + "objectID": "SAS/cmh.html#cmh-in-sas", + "href": "SAS/cmh.html#cmh-in-sas", + "title": "CMH Test", "section": "", - "text": "MANOVA: Testing for group mean vectors are the same vs at least one is different\nWhen applying the following hypothesis, SAS and R match identically see R and SAS.\n\nH0: Group mean vectors are the same for all groups or they don’t differ significantly.\nH1: At least one of the group mean vectors is different from the rest.\n\nHowever, if interest was in comparing 1 level of a parameter vs the others, this was only achieved using SAS. Contrast statements in SAS were easy to implement as shown here SAS however R did not replicate these results and to date a solution has not been found.\nNOTE: if you feel you can help with the above discrepancy please contribute to the CAMIS repo by following the instructions on the contributions page." + "text": "The cmh test is calculated in SAS using the PROC FREQ procedure. By default, it outputs the chi square statistic, degrees of freedom and p-value for each of the three alternative hypothesis: general association, row means differ, and nonzero correlation. It is up to the statistical analyst or statistician to know which result is appropriate for their analysis.\nWhen the design of the contingency table is 2 x 2 x K (i.e, X == 2 levels, Y == 2 levels, K >= 2 levels), the Mantel-Haenszel Common Odds Ratio (odds ratio estimate, 95% CI, P-value) and the Breslow-Day Test for Homogeneity of the Odds Ratios (chi-square statistic, degrees of freedom, P-value) are also output.\nBelow is the syntax to conduct a CMH analysis in SAS:\n\nProc freq data = filtered_data; \ntables K * X * Y / cmh; \n* the order of K, X, and Y appearing on the line is important!;\nrun;" }, { - "objectID": "SAS/anova.html", - "href": "SAS/anova.html", - "title": "linear-models", + "objectID": "SAS/manova.html", + "href": "SAS/manova.html", + "title": "Multivariate Analysis of Variance in SAS", "section": "", - "text": "Getting Started\nTo demonstrate the various types of sums of squares, we'll create a data frame called df_disease taken from the SAS documentation (reference). For more information/to investigate this data go here\n\n\nThe Model\nFor this example, we’re testing for a significant difference in stem_length using ANOVA.\n\nproc glm;\n class drug disease;\n model y=drug disease drug*disease;\nrun;\n\n\n\nSums of Squares Tables\nSAS has four types of sums of squares calculations. To get these calculations, the sum of squares option needs to be added (/ ss1 ss2 ss3 ss4) to the model statement.\n\nproc glm;\n class drug disease;\n model y=drug disease drug*disease / ss1 ss2 ss3 ss4;\nrun;\n\n\nType I\n\n\n\n\n\n\n\n\n\n\n\nType II\n\n\n\n\n\n\n\n\n\n\n\nType III\n\n\n\n\n\n\n\n\n\n\n\nType IV" + "text": "Example 39.6 Multivariate Analysis of Variance from SAS MANOVA User Guide\nThis example employs multivariate analysis of variance (MANOVA) to measure differences in the chemical characteristics of ancient pottery found at four kiln sites in Great Britain. The data are from Tubb, Parker, and Nickless (1980), as reported in Hand et al. (1994).\nFor each of 26 samples of pottery, the percentages of oxides of five metals are measured. The following statements create the data set and invoke the GLM procedure to perform a one-way MANOVA. Additionally, it is of interest to know whether the pottery from one site in Wales (Llanederyn) differs from the samples from other sites; a CONTRAST statement is used to test this hypothesis.\n\n #Example code\n title \"Romano-British Pottery\";\n data pottery;\n input Site $12. Al Fe Mg Ca Na;\n datalines;\n Llanederyn 14.4 7.00 4.30 0.15 0.51\n Llanederyn 13.8 7.08 3.43 0.12 0.17\n Llanederyn 14.6 7.09 3.88 0.13 0.20\n Llanederyn 11.5 6.37 5.64 0.16 0.14\n Llanederyn 13.8 7.06 5.34 0.20 0.20\n Llanederyn 10.9 6.26 3.47 0.17 0.22\n Llanederyn 10.1 4.26 4.26 0.20 0.18\n Llanederyn 11.6 5.78 5.91 0.18 0.16\n Llanederyn 11.1 5.49 4.52 0.29 0.30\n Llanederyn 13.4 6.92 7.23 0.28 0.20\n Llanederyn 12.4 6.13 5.69 0.22 0.54\n Llanederyn 13.1 6.64 5.51 0.31 0.24\n Llanederyn 12.7 6.69 4.45 0.20 0.22\n Llanederyn 12.5 6.44 3.94 0.22 0.23\n Caldicot 11.8 5.44 3.94 0.30 0.04\n Caldicot 11.6 5.39 3.77 0.29 0.06\n IslandThorns 18.3 1.28 0.67 0.03 0.03\n IslandThorns 15.8 2.39 0.63 0.01 0.04\n IslandThorns 18.0 1.50 0.67 0.01 0.06\n IslandThorns 18.0 1.88 0.68 0.01 0.04\n IslandThorns 20.8 1.51 0.72 0.07 0.10\n AshleyRails 17.7 1.12 0.56 0.06 0.06\n AshleyRails 18.3 1.14 0.67 0.06 0.05\n AshleyRails 16.7 0.92 0.53 0.01 0.05\n AshleyRails 14.8 2.74 0.67 0.03 0.05\n AshleyRails 19.1 1.64 0.60 0.10 0.03\n ;\n run;\n \n proc glm data=pottery;\n class Site;\n model Al Fe Mg Ca Na = Site;\n contrast 'Llanederyn vs. the rest' Site 1 1 1 -3;\n manova h=_all_ / printe printh;\n run;\n\nAfter the summary information (1), PROC GLM produces the univariate analyses for each of the dependent variables (2-6). These analyses show that sites are significantly different for all oxides individually. You can suppress these univariate analyses by specifying the NOUNI option in the MODEL statement.\n1 Summary Information about Groups\n\n\n\n\n\n\n\n\n\n2 Univariate Analysis of Variance for Aluminum Oxide (AI)\n\n\n\n\n\n\n\n\n\n3 Univariate Analysis of Variance for Iron Oxide (Fe)\n\n\n\n\n\n\n\n\n\n4 Univariate Analysis of Variance for Calcium Oxide (Ca)\n\n\n\n\n\n\n\n\n\n5 Univariate Analysis of Variance for Magnesium Oxide (Mg)\n\n\n\n\n\n\n\n\n\n6 Analysis of Variance for Sodium Oxide (Na)\n\n\n\n\n\n\n\n\n\nThe PRINTE option in the MANOVA statement displays the elements of the error matrix (7), also called the Error Sums of Squares and Crossproducts matrix. The diagonal elements of this matrix are the error sums of squares from the corresponding univariate analyses.\nThe PRINTE option also displays the partial correlation matrix (7) associated with the E matrix. In this example, none of the oxides are very strongly correlated; the strongest correlation (r=0.488) is between magnesium oxide and calcium oxide.\n7 Error SSCP Matrix and Partial Correlations\n\n\n\n\n\n\n\n\n\nThe PRINTH option produces the SSCP matrix for the hypotheses being tested (Site and the contrast); (8 and 9). Since the Type III SS are the highest-level SS produced by PROC GLM by default, and since the HTYPE= option is not specified, the SSCP matrix for Site gives the Type III H matrix. The diagonal elements of this matrix are the model sums of squares from the corresponding univariate analyses.\nFour multivariate tests are computed, all based on the characteristic roots and vectors of \\(E^{-1}H\\). These roots and vectors are displayed along with the tests. All four tests can be transformed to variates that have distributions under the null hypothesis. Note that the four tests all give the same results for the contrast, since it has only one degree of freedom. In this case, the multivariate analysis matches the univariate results: there is an overall difference between the chemical composition of samples from different sites, and the samples from Llanederyn are different from the average of the other sites.\n8 Hypothesis SSCP Matrix and Multivariate Tests for Overall Site Effect\n\n\n\n\n\n\n\n\n\n9 Hypothesis SSCP Matrix and Multivariate Tests for Differences between Llanederyn and the Other Sites\n\n\n\n\n\n\n\n\n\nReferences\nSAS MANOVA User Guide" }, { - "objectID": "SAS/rounding.html", - "href": "SAS/rounding.html", - "title": "Rounding in SAS", + "objectID": "SAS/survival.html", + "href": "SAS/survival.html", + "title": "Survival Analysis Using SAS", "section": "", - "text": "There are two rounding functions in SAS.\nThe round() function in SAS will round to the nearest whole number and ‘away from zero’ or ‘rounding up’ when equidistant meaning that exactly 12.5 rounds to the integer 13.\nThe rounde() function in SAS will round to the nearest whole number and ‘rounding to the even number’ when equidistant, meaning that exactly 12.5 rounds to the integer 12.\nBoth functions allow you to specify the number of decimal places you want to round to.\nFor example (See references for source of the example)\n\n #Example code\n data XXX;\n my_number=2.2; output;\n my_number=3.99; output;\n my_number=1.2345; output;\n my_number=7.876; output;\n my_number=13.8739; output;\n run;\n\n data xxx2;\n set xxx;\n r_1_dec = round(my_number, 0.1);\n r_2_dec = round(my_number, 0.01);\n r_3_dec = round(my_number, 0.001);\n \n re_1_dec = rounde(my_number, 0.1);\n re_2_dec = rounde(my_number, 0.01);\n re_3_dec = rounde(my_number, 0.001);\n run;\n\n\n\n\n\n\n\n\n\n\n\n\n\nmy_number\nr_1_dec\nr_2_de\nr_3_dec\nre_1_dec\nre_2_dec\nre_3_dec\n\n\n\n\n2.2\n2.2\n2.2\n2.2\n2.2\n2.2\n2.2\n\n\n3.99\n4\n3.99\n3.99\n4\n3.99\n3.99\n\n\n1.2345\n1.2\n1.23\n1.235\n1.2\n1.23\n1.234\n\n\n7.876\n7.9\n7.88\n7.876\n7.9\n7.88\n7.876\n\n\n13.8739\n13.9\n13.87\n13.874\n13.9\n13.87\n13.874\n\n\n\nReferences\nHow to Round Numbers in SAS - SAS Example Code" + "text": "The most commonly used survival analysis methods in clinical trials include:\nAdditionally, other methods for analyzing time-to-event data are available, such as:\nWhile these models may be explored in a separate document, this particular document focuses solely on the three most prevalent methods: KM estimators, log-rank test and Cox PH model." }, { - "objectID": "SAS/summary-stats.html", - "href": "SAS/summary-stats.html", - "title": "Deriving Quantiles or Percentiles in SAS", - "section": "", - "text": "Percentiles can be calculated in SAS using the UNIVARIATE procedure. The procedure has the option PCTLDEF which allows for five different percentile definitions to be used. The default is PCTLDEF=5, which uses the empirical distribution function to find percentiles.\nThis is how the 25th and 40th percentiles of aval in the dataset adlb could be calculated, using the default option for PCTLDEF.\n\nproc univariate data=adlb;\n var aval;\n output out=stats pctlpts=25 40 pctlpre=p;\nrun;\n\nThe pctlpre=p option tells SAS the prefix to use in the output dataset for the percentile results. In the above example, SAS will create a dataset called stats, containing variables p25 and p40." + "objectID": "SAS/survival.html#example-data", + "href": "SAS/survival.html#example-data", + "title": "Survival Analysis Using SAS", + "section": "Example Data", + "text": "Example Data\nData source: https://stats.idre.ucla.edu/sas/seminars/sas-survival/\nThe data include 500 subjects from the Worcester Heart Attack Study. This study examined several factors, such as age, gender and BMI, that may influence survival time after heart attack. Follow up time for all participants begins at the time of hospital admission after heart attack and ends with death or loss to follow up (censoring). The variables used here are:\n\nlenfol: length of followup, terminated either by death or censoring - time variable\nfstat: loss to followup = 0, death = 1 - censoring variable\nafb: atrial fibrillation, no = 0, 1 = yes - explanatory variable\ngender: males = 0, females = 1 - stratification factor\n\n\nlibname mylib \"..\\data\";\n\ndata dat;\nset mylib.whas500;\nlenfoly = round(lenfol/365.25, 0.01); /* change follow-up days to years for better visualization*/\nrun;" }, { - "objectID": "SAS/ttest_Paired.html", - "href": "SAS/ttest_Paired.html", - "title": "Paired t-test", - "section": "", - "text": "The Paired t-test is used when two samples are naturally correlated. In the Paired t-test, the difference of the means between the two samples is compared to a given number that represents the null hypothesis. For a Paired t-test, the number of observations in each sample must be equal.\nIn SAS, a Paired t-test is typically performed using PROC TTEST.\n\n\nBy default, SAS PROC TTEST t-test assumes normality in the data and uses a classic Student’s t-test.\n\n\nThe following data was used in this example.\n data pressure;\n input SBPbefore SBPafter @@;\n datalines;\n 120 128 124 131 130 131 118 127\n 140 132 128 125 140 141 135 137\n 126 118 130 132 126 129 127 135\n ;\n\n\n\nThe following code was used to test the comparison of two paired samples of Systolic Blood Pressure before and after a procedure.\n proc ttest data=pressure;\n paired SBPbefore*SBPafter;\n run;\nOutput:\n\n\n\n\n\n\n\n\n\n\n\n\n\nThe SAS paired t-test also supports analysis of lognormal data. Here is the data used for the lognormal analysis.\n\n\n data auc;\n input TestAUC RefAUC @@;\n datalines;\n 103.4 90.11 59.92 77.71 68.17 77.71 94.54 97.51\n 69.48 58.21 72.17 101.3 74.37 79.84 84.44 96.06\n 96.74 89.30 94.26 97.22 48.52 61.62 95.68 85.80\n ;\n\n\n\nFor cases when the data is lognormal, SAS offers the “DIST” option to chose between a normal and lognormal distribution. The procedure also offers the TOST option to specify the equivalence bounds.\n proc ttest data=auc dist=lognormal tost(0.8, 1.25);\n paired TestAUC*RefAUC;\n run;\nOutput:\n\n\n\n\n\n\n\n\n\nAs can be seen in the figure above, the lognormal variation of the TTEST procedure offers additional results for geometric mean, coefficient of variation, and TOST equivalence analysis. The output also includes multiple p-values." + "objectID": "SAS/survival.html#the-non-stratified-model", + "href": "SAS/survival.html#the-non-stratified-model", + "title": "Survival Analysis Using SAS", + "section": "The Non-stratified Model", + "text": "The Non-stratified Model\nFirst we try a non-stratified analysis following the mock-up above to describe the association between survival time and afb (atrial fibrillation).\nThe KM estimators and log-rank test are from PROC LIFETEST, and Cox PH model is conducted using PROC PHREG.\n\nKM estimators and log-rank test\n\nproc lifetest data=dat outsurv=_SurvEst timelist= 1 3 5 reduceout stderr; \ntime lenfoly*fstat(0);\nstrata afb;\nrun;\n\nThe landmark estimates and quartile estimates for AFB = 0 group are as shown in below:\n\n\n\n\n\n\n\n\n\nThe logrank test result is in below:\n\n\n\n\n\n\n\n\n\n\n\nCox PH model\n\nproc phreg data = dat;\nclass afb;\nmodel lenfol*fstat(0) = afb/rl;\nrun;\n\nThe hazard ratio and confidence intervals are shown as below:" }, { - "objectID": "SAS/ttest_Paired.html#normal", - "href": "SAS/ttest_Paired.html#normal", - "title": "Paired t-test", - "section": "", - "text": "By default, SAS PROC TTEST t-test assumes normality in the data and uses a classic Student’s t-test.\n\n\nThe following data was used in this example.\n data pressure;\n input SBPbefore SBPafter @@;\n datalines;\n 120 128 124 131 130 131 118 127\n 140 132 128 125 140 141 135 137\n 126 118 130 132 126 129 127 135\n ;\n\n\n\nThe following code was used to test the comparison of two paired samples of Systolic Blood Pressure before and after a procedure.\n proc ttest data=pressure;\n paired SBPbefore*SBPafter;\n run;\nOutput:" + "objectID": "SAS/survival.html#the-stratified-model", + "href": "SAS/survival.html#the-stratified-model", + "title": "Survival Analysis Using SAS", + "section": "The Stratified Model", + "text": "The Stratified Model\nIn a stratified model, the Kaplan-Meier estimators remain the same as those in the non-stratified model. To implement stratified log-rank tests and Cox proportional hazards models, simply add the STRATA option in both PROC LIFETEST and PROC PHREG.\n\n# KM estimators and log-rank test\nproc lifetest data=dat;\ntime lenfoly*fstat(0);\nstrata gender/group = afb;\nrun;\n\n# Cox PH model\nproc phreg data=dat;\nclass afb;\nmodel lenfol*fstat(0) = afb/rl;\nstrata gender;\nrun;" }, { - "objectID": "SAS/ttest_Paired.html#lognormal", - "href": "SAS/ttest_Paired.html#lognormal", - "title": "Paired t-test", + "objectID": "SAS/mmrm.html", + "href": "SAS/mmrm.html", + "title": "MMRM in SAS", "section": "", - "text": "The SAS paired t-test also supports analysis of lognormal data. Here is the data used for the lognormal analysis.\n\n\n data auc;\n input TestAUC RefAUC @@;\n datalines;\n 103.4 90.11 59.92 77.71 68.17 77.71 94.54 97.51\n 69.48 58.21 72.17 101.3 74.37 79.84 84.44 96.06\n 96.74 89.30 94.26 97.22 48.52 61.62 95.68 85.80\n ;\n\n\n\nFor cases when the data is lognormal, SAS offers the “DIST” option to chose between a normal and lognormal distribution. The procedure also offers the TOST option to specify the equivalence bounds.\n proc ttest data=auc dist=lognormal tost(0.8, 1.25);\n paired TestAUC*RefAUC;\n run;\nOutput:\n\n\n\n\n\n\n\n\n\nAs can be seen in the figure above, the lognormal variation of the TTEST procedure offers additional results for geometric mean, coefficient of variation, and TOST equivalence analysis. The output also includes multiple p-values." + "text": "Mixed Models\n\nFitting the MMRM in SAS\nIn SAS the following code was used (assessments at avisitn=0 should also be removed from the response variable):\n\nproc mixed data=adlbh;\n where base ne . and avisitn not in (., 99);\n class usubjid trtpn(ref=\"0\") avisitn;\n by paramcd param;\n model chg=base trtpn avisitn trtpn*avisitn / solution cl alpha=0.05 ddfm=KR;\n repeated avisitn/subject=usubjid type=&covar;\n lsmeans trtpn * avisitn / diff cl slice=avisitn;\n lsmeans trtpn / diff cl;\nrun;\n\nwhere the macro variable covar could be UN, CS or AR(1). The results were stored in .csv files that were post-processed in R and compared with the results from R." }, { - "objectID": "SAS/linear-regression.html", - "href": "SAS/linear-regression.html", - "title": "Linear Regression", + "objectID": "SAS/mcnemar.html", + "href": "SAS/mcnemar.html", + "title": "McNemar’s test in SAS", "section": "", - "text": "To demonstrate the use of linear regression we examine a dataset that illustrates the relationship between Height and Weight in a group of 237 teen-aged boys and girls. The dataset is available at (../data/htwt.csv) and is imported to sas using proc import procedure.\n\nDescriptive Statistics\nThe first step is to obtain the simple descriptive statistics for the numeric variables of htwt data, and one-way frequencies for categorical variables. This is accomplished by employing proc means and proc freq procedures There are 237 participants who are from 13.9 to 25 years old. It is a cross-sectional study, with each participant having one observation. We can use this data set to examine the relationship of participants’ height to their age and sex.\n\nproc means data=htwt;\nrun;\n\n Descriptive Statistics for HTWT Data Set \n The MEANS Procedure\n\nVariable Label N Mean Std Dev Minimum Maximum\n-----------------------------------------------------------------------------\nAGE AGE 237 16.4430380 1.8425767 13.9000000 25.0000000\nHEIGHT HEIGHT 237 61.3645570 3.9454019 50.5000000 72.0000000\nWEIGHT WEIGHT 237 101.3080169 19.4406980 50.5000000 171.5000000\n----------------------------------------------------------------------------\n\n\nproc freq data=htwt;\ntables sex;\nrun;\n\n Oneway Frequency Tabulation for Sex for HTWT Data Set \n The FREQ Procedure\n\n Cumulative Cumulative\nSEX Frequency Percent Frequency Percent\n-------------------------------------------------------------\nf 111 46.84 111 46.84\nm 126 53.16 237 100.00\n\nIn order to create a regression model to demonstrate the relationship between age and height for females, we first need to create a flag variable identifying females and an interaction variable between age and female gender flag.\n\ndata htwt2;\n set htwt;\n if sex=\"f\" then female=1;\n if sex=\"m\" then female=0; \n \n *model to demonstrate interaction between female gender and age;\n fem_age = female * age; \nrun;\n\n\n\nRegression Analysis\nNext, we fit a regression model, representing the relationships between gender, age, height and the interaction variable created in the datastep above. We again use a where statement to restrict the analysis to those who are less than or equal to 19 years old. We use the clb option to get a 95% confidence interval for each of the parameters in the model. The model that we are fitting is height = b0 + b1 x female + b2 x age + b3 x fem_age + e\n\nproc reg data=htwt2;\n where age <=19;\n model height = female age fem_age / clb;\nrun; quit;\n\n Number of Observations Read 219\n Number of Observations Used 219\n\n Analysis of Variance\n\n Sum of Mean\n Source DF Squares Square F Value Pr > F\n Model 3 1432.63813 477.54604 60.93 <.0001\n Error 215 1684.95730 7.83701\n Corrected Total 218 3117.59543\n\n Root MSE 2.79947 R-Square 0.4595\n Dependent Mean 61.00457 Adj R-Sq 0.4520\n Coeff Var 4.58895\n\nWe examine the parameter estimates in the output below.\n\n Parameter Estimates\n Parameter Standard\n Variable DF Estimate Error t Value Pr > |t| 95% Confidence Limits\n Intercept 1 28.88281 2.87343 10.05 <.0001 23.21911 34.54650\n female 1 13.61231 4.01916 3.39 0.0008 5.69031 21.53432\n AGE 1 2.03130 0.17764 11.44 <.0001 1.68117 2.38144\n fem_age 1 -0.92943 0.24782 -3.75 0.0002 -1.41791 -0.44096\n\nFrom the parameter estimates table the coefficients b0,b1,b2,b3 are estimated as b0=28.88 b1=13.61 b2=2.03 b3=-0.92942\nThe resulting regression model for height, age and gender based on the available data is height=28.88281 + 13.61231 x female + 2.03130 x age -0.92943 x fem_age" + "text": "Performing McNemar’s test in SAS\nTo demonstrate McNemar’s test in SAS, data concerning the presence or absence of cold symptoms was used. The symptoms were recorded by the same children at the age of 12 and 14. A total of 2638 participants were involved.\n\nUsing PROC FREQ\nTesting for a significant difference in cold symptoms between ages, using McNemar’s test in SAS, can be performed as below. The AGREE option is stated within the FREQ procedure to produce agreement tests and measures, including McNemar’s test.\n\nproc freq data=colds;\n tables age12*age14 / agree;\nrun;\n\n\n\nResults\n\n\n\n\n\n\n\n\n\nSAS outputs the tabulated data for proportions, the McNemar’s Chi-square statistic, and the Kappa coefficient with 95% confidence limits. There is no continuity correction used and no option to include this." }, { - "objectID": "SAS/ttest_1Sample.html", - "href": "SAS/ttest_1Sample.html", - "title": "One Sample t-test", + "objectID": "SAS/kruskal_wallis.html", + "href": "SAS/kruskal_wallis.html", + "title": "Kruskal Wallis SAS", "section": "", - "text": "In SAS, a one sample t-test is usually performed using PROC TTEST. The one sample t-test compares the mean of the sample to a provided null hypothesis, called “h0”. The h0 value is provided as an option. By default, the h0 value is zero (0). Running the procedure produces a set of results that suggest whether or not the null hypothesis should be rejected.\n\n\nThe following data was used in this example.\n data read;\n input score count @@;\n datalines;\n 40 2 47 2 52 2 26 1 19 2\n 25 2 35 4 39 1 26 1 48 1\n 14 2 22 1 42 1 34 2 33 2\n 18 1 15 1 29 1 41 2 44 1\n 51 1 43 1 27 2 46 2 28 1\n 49 1 31 1 28 1 54 1 45 1\n ;\n\n\n\nBy default, SAS PROC TTEST t-test assumes normality in the data and uses a classic Student’s t-test.\n\n\nThe following code was used to test the comparison of a reading scores against a baseline hypothesis value of 30:\n proc ttest data=read h0=30;\n var score;\n run;\nOutput:\n\n\n\n\n\n\n\n\n\n\n\n\n\nThe SAS one sample t-test also supports lognormal analysis for a one sample t-test.\n\n\nUsing the same data as above, we will set the “DIST” option to “lognormal” to perform this analysis:\n proc ttest data=read h0=30 dist=lognormal;\n var score;\n run;\nOutput:\n\n\n\n\n\n\n\n\n\nAs can be seen in the figure above, the lognormal variation of the one sample TTEST provides results for geometric mean, coefficient of variation, and 95% confidence limits for the coefficient of variation." + "text": "The Kruskal-Wallis test is a non-parametric equivalent to the one-way ANOVA. For this example, the data used is a subset of R’s datasets::iris, testing for difference in sepal width between species of flower. This data was subset in R and input manually to SAS with a data step.\n\ndata iris_sub;\n input Species $ Sepal_Width;\n datalines;\nsetosa 3.4\nsetosa 3.0\nsetosa 3.4\nsetosa 3.2\nsetosa 3.5\nsetosa 3.1\nversicolor 2.7\nversicolor 2.9\nversicolor 2.7\nversicolor 2.6\nversicolor 2.5\nversicolor 2.5\nvirginica 3.0\nvirginica 3.0\nvirginica 3.1\nvirginica 3.8\nvirginica 2.7\nvirginica 3.3\n;\nrun;" }, { - "objectID": "SAS/ttest_1Sample.html#normal", - "href": "SAS/ttest_1Sample.html#normal", - "title": "One Sample t-test", + "objectID": "SAS/kruskal_wallis.html#introduction", + "href": "SAS/kruskal_wallis.html#introduction", + "title": "Kruskal Wallis SAS", "section": "", - "text": "By default, SAS PROC TTEST t-test assumes normality in the data and uses a classic Student’s t-test.\n\n\nThe following code was used to test the comparison of a reading scores against a baseline hypothesis value of 30:\n proc ttest data=read h0=30;\n var score;\n run;\nOutput:" + "text": "The Kruskal-Wallis test is a non-parametric equivalent to the one-way ANOVA. For this example, the data used is a subset of R’s datasets::iris, testing for difference in sepal width between species of flower. This data was subset in R and input manually to SAS with a data step.\n\ndata iris_sub;\n input Species $ Sepal_Width;\n datalines;\nsetosa 3.4\nsetosa 3.0\nsetosa 3.4\nsetosa 3.2\nsetosa 3.5\nsetosa 3.1\nversicolor 2.7\nversicolor 2.9\nversicolor 2.7\nversicolor 2.6\nversicolor 2.5\nversicolor 2.5\nvirginica 3.0\nvirginica 3.0\nvirginica 3.1\nvirginica 3.8\nvirginica 2.7\nvirginica 3.3\n;\nrun;" }, { - "objectID": "SAS/ttest_1Sample.html#lognormal", - "href": "SAS/ttest_1Sample.html#lognormal", - "title": "One Sample t-test", - "section": "", - "text": "The SAS one sample t-test also supports lognormal analysis for a one sample t-test.\n\n\nUsing the same data as above, we will set the “DIST” option to “lognormal” to perform this analysis:\n proc ttest data=read h0=30 dist=lognormal;\n var score;\n run;\nOutput:\n\n\n\n\n\n\n\n\n\nAs can be seen in the figure above, the lognormal variation of the one sample TTEST provides results for geometric mean, coefficient of variation, and 95% confidence limits for the coefficient of variation." + "objectID": "SAS/kruskal_wallis.html#implementing-kruskal-wallis-in-sas", + "href": "SAS/kruskal_wallis.html#implementing-kruskal-wallis-in-sas", + "title": "Kruskal Wallis SAS", + "section": "Implementing Kruskal-Wallis in SAS", + "text": "Implementing Kruskal-Wallis in SAS\nThe Kruskal-Wallis test can be implemented in SAS using the NPAR1WAY procedure with WILCOXON option. Below, the test is defined with the indicator variable (Species) by the CLASS statement, and the response variable (Sepal_Width) by the VAR statement. Adding the EXACT statement outputs the exact p-value in addition to the asymptotic result. The null hypothesis is that the samples are from identical populations.\n\nproc npar1way data=iris_sub wilcoxon;\nclass Species;\nvar Sepal_Width;\nexact;\nrun;" }, { - "objectID": "data-info/sas_disease.html", - "href": "data-info/sas_disease.html", - "title": "SAS Disease", - "section": "", - "text": "To demonstrate the various types of sums of squares, we’ll create a data frame called `df_disease` taken from the SAS documentation (__reference__). The summary of the data is shown.\nThe general summary of the data is as follows\n\nsummary(df_disease)\n\n drug disease y \n 1:18 1:24 Min. :-6.00 \n 2:18 2:24 1st Qu.: 9.00 \n 3:18 3:24 Median :21.00 \n 4:18 Mean :18.88 \n 3rd Qu.:28.00 \n Max. :44.00 \n NA's :14 \n\nskim(df_disease)\n\n\nData summary\n\n\nName\ndf_disease\n\n\nNumber of rows\n72\n\n\nNumber of columns\n3\n\n\n_______________________\n\n\n\nColumn type frequency:\n\n\n\nfactor\n2\n\n\nnumeric\n1\n\n\n________________________\n\n\n\nGroup variables\nNone\n\n\n\nVariable type: factor\n\n\n\n\n\n\n\n\n\n\n\nskim_variable\nn_missing\ncomplete_rate\nordered\nn_unique\ntop_counts\n\n\n\n\ndrug\n0\n1\nFALSE\n4\n1: 18, 2: 18, 3: 18, 4: 18\n\n\ndisease\n0\n1\nFALSE\n3\n1: 24, 2: 24, 3: 24\n\n\n\nVariable type: numeric\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nskim_variable\nn_missing\ncomplete_rate\nmean\nsd\np0\np25\np50\np75\np100\nhist\n\n\n\n\ny\n14\n0.81\n18.88\n12.8\n-6\n9\n21\n28\n44\n▅▆▆▇▂" + "objectID": "SAS/kruskal_wallis.html#results", + "href": "SAS/kruskal_wallis.html#results", + "title": "Kruskal Wallis SAS", + "section": "Results", + "text": "Results\n\n\n\n\n\n\n\n\n\nAs seen above, SAS outputs a table of Wilcoxon Scores for Sepal_Width by each Species including (per group): the number (N); the sum of scores; the expected sum of scores under the null hypothesis; the standard deviation under the null hypothesis, and the observed mean score. The table also includes a footnote to specify that ties were handled by using the average score.\nA table of the test results gives the Kruskal-Wallis rank sum statistic (10.922), the degrees of freedom (2), and the asymptotic p-value of the test (0.0042), and the exact p-value (0.0008). Therefore, the difference in population medians is statistically significant at the 5% level." }, { "objectID": "templates/R_template.html",