Merge pull request #281 from michaelwalshe/survey-stats-python-comp

Survey Statistics - Example/Comparison (Python)
PSIAIMS · Aug 19, 2024 · 035b87a · 035b87a
2 parents a39ccaa + 0bc4005
commit 035b87a
Show file tree

Hide file tree

Showing 11 changed files with 9,134 additions and 30 deletions.
diff --git a/.github/workflows/action.yml b/.github/workflows/action.yml
@@ -19,7 +19,7 @@ jobs:
 
       - uses: actions/setup-python@v5
         with:
-          python-version: '3.9'
+          python-version: '3.12'
           cache: 'pip' # caching pip dependencies
       - run: pip install -r requirements.txt
 

diff --git a/.github/workflows/pull_request_action.yml b/.github/workflows/pull_request_action.yml
@@ -19,7 +19,7 @@ jobs:
 
       - uses: actions/setup-python@v5
         with:
-          python-version: '3.9'
+          python-version: '3.12'
           cache: 'pip' # caching pip dependencies
       - run: pip install -r requirements.txt
 

diff --git a/Comp/r-sas_survey-stats-summary.qmd → Comp/r-sas-python_survey-stats-summary.qmd b/Comp/r-sas_survey-stats-summary.qmd → Comp/r-sas-python_survey-stats-summary.qmd
diff --git a/R/survey-stats-summary.qmd b/R/survey-stats-summary.qmd
@@ -15,6 +15,8 @@ When conducting large-scale trials on samples of the population, it can be neces
 
 All of these designs need to be taken into account when calculating statistics, and when producing models. Only summary statistics are discussed in this document, and variances are calculated using the default Taylor series linearisation methods. For a more detailed introduction to survey statistics in R, see [@Lohr_2022] or [@tlumley_2004].
 
+We will use the [`{survey}`](https://cran.r-project.org/web/packages/survey/index.html) package, which is the standard for survey statistics in R. Note that for those who prefer the tidyverse, the [`{srvyr}`](https://cran.r-project.org/web/packages/srvyr/index.html) package is a wrapper around `{survey}` with `{dplyr}` like syntax.
+
 # Simple Survey Designs
 
 We will use the [API]((https://r-survey.r-forge.r-project.org/survey/html/api.html)) dataset [@API_2000], which contains a number of datasets based on different samples from a dataset of academic performance. Initially we will just cover the methodology with a simple random sample and a finite population correction to demonstrate functionality.
@@ -145,7 +147,6 @@ svyby(~HI_CHOL, ~race, nhanes_design, svymean, na.rm=TRUE, deff=TRUE)
 ```{r}
 #| echo: false
 si <- sessioninfo::session_info("survey", dependencies = FALSE)
-si$external <- structure(list("SAS" = "9.04.01M7P080520"), class = c("external_info", "list"))
 si
 ```
 :::
diff --git a/SAS/survey-stats-summary.qmd b/SAS/survey-stats-summary.qmd
@@ -15,6 +15,8 @@ When conducting large-scale trials on samples of the population, it can be neces
 
 All of these designs need to be taken into account when calculating statistics, and when producing models. Only summary statistics are discussed in this document, and variances are calculated using the default Taylor series linearisation methods. For a more detailed introduction to survey statistics in SAS, see [@Lohr_2022] or [@SAS_2018].
 
+For survey summary statistics in SAS, we can use the `SURVEYMEANS` and `SURVEYFREQ` procedures.
+
 # Simple Survey Designs
 
 We will use the [API]((https://r-survey.r-forge.r-project.org/survey/html/api.html)) dataset [@API_2000], which contains a number of datasets based on different samples from a dataset of academic performance. Initially we will just cover the methodology with a simple random sample and a finite population correction to demonstrate functionality.

diff --git a/data/apisrs.csv b/data/apisrs.csv