diff --git a/_site.yml b/_site.yml
index e0b5776..2c5bfcb 100644
--- a/_site.yml
+++ b/_site.yml
@@ -35,6 +35,8 @@ navbar:
href: notes/09_model-selection_bda3-7.html
- text: "Section 10. Decision analysis"
href: notes/10_decision-analysis_bda3-9.html
+ - text: "Section 11. Normal approximation & Frequency properties"
+ href: notes/11_normal-approx-freq-properties_bda3-04.html
- text: "Exercises"
menu:
- text: "Chapter 1"
diff --git a/docs/about.html b/docs/about.html
index f884857..8ced4e7 100644
--- a/docs/about.html
+++ b/docs/about.html
@@ -2394,6 +2394,7 @@
@@ -2549,6 +2550,12 @@
Sections
(none) |
assignment 9 |
+
+11. Normal approximation & Frequency properties |
+notes |
+(none) |
+(none) |
+
Stan models
diff --git a/docs/notes/11_normal-approx-freq-properties_bda3-04.Rmd b/docs/notes/11_normal-approx-freq-properties_bda3-04.Rmd
new file mode 100644
index 0000000..11ac09a
--- /dev/null
+++ b/docs/notes/11_normal-approx-freq-properties_bda3-04.Rmd
@@ -0,0 +1,92 @@
+---
+title: "11. Normal approximation & Frequency properties"
+date: "2021-11-19"
+output: distill::distill_article
+---
+
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = TRUE, dpi = 300, comment = "#>")
+```
+
+## Resources
+
+- reading:
+ - BDA3 ch 4. *Asymptotics and connections to non-Bayesian approaches*
+ - [reading instructions](../reading-instructions/BDA3_ch04_reading-instructions.pdf)
+- lectures:
+ - [Lecture 11.1. 'Normal approximation (Laplace approximation)'](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=e22fedc7-9fd3-4d1e-8318-ab1000ca45a4)
+ - [Lecture 11.2. 'Large sample theory and counter examples'](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=a8e38a95-a944-4f3d-bf95-ab1000dbdf73)
+- [slides](../slides/slides_ch04.pdf)
+
+## Notes
+
+### Reading instructions
+
+- chapter outline:
+ - 4.1 Normal approximation (Laplace’s method)
+ - 4.2 Large-sample theory
+ - 4.3 Counterexamples
+ - 4.4 Frequency evaluation (not part of the course, but interesting)
+ - 4.5 Other statistical methods (not part of the course, but interesting)
+
+### Chapter 4. Asymptotics and connections to non-Bayesian approaches
+
+- *asymptotic theory*: as sample size increased, the influence of the prior on the posteior decreases
+ - used as the justification of non-informative priors in many cases
+
+#### 4.1 Normal approximations of the posterior distribution
+
+- if the posterior distribution $p(\theta|y)$ is unimodal and symmetric, it is useful to approximate it as a normal
+ - therefore, the log of the posterior is a quadratic function of $\theta$
+- "For a finite sample size $n$, the normal approximation is typically more accurate for conditional and marginal distributions of components of $\theta$ than for the full joint distribution." (pg. 85)
+- common to use the normal approximations to quickly debug or sanity-check a model's code
+
+#### 4.2 Large-sample theory
+
+*asymptotic normality of the posterior distribution*: with more data from the same underlying process, the posterior distribution of the parameter vector approaches multivariate normality even if the true distribution of the data is not within the parametric family under consideration (pg. 87)
+ - particularly for independent samples from the data-generating process
+- summary at the limit of large $n$:
+ - the posterior mode $\hat\theta$ approaches the true $\theta_0$
+ - the likelihood dominates the prior distribution
+
+#### 4.3 Counterexamples to the theorems
+
+- there are many instances where large amounts of data do not allow for the normal approximation:
+- **underidentified models and nonidentified parameters**
+ - "the model is *underidentified* given data $y$ if the likelihood $p(\theta|y)$ is equal for a range of $\theta$" (pg. 89)
+ - there is no single point $\theta_0$ to which the posterior distribution can converge given infinite data
+ - a parameter can be nonidentified if there is no supply of information about it
+ - results in its posterior being identical to its prior
+- **number of parameters increasing with sample size**
+ - in complicated problems, the number of parameters can scale with the amount of data
+ - e.g. Gaussian processes or hierarchical models
+- **aliasing**
+ - the same likelihood function repeats at a discrete set of points
+ - a special case of underidentified parameters
+ - e.g. a mixture model with two mixed distributions with the same parameters
+- **unbounded likelihoods**
+ - if the likelihood is unbounded, there might not be any posterior mode with the parameter space
+ - invalidates bot the consistency results and the normal approximation
+- **improper posterior distributions**
+ - an improper posterior integrates to infinity, not to 1 as is required by the normal approximation theory
+ - an improper posterior can only occur with an improper prior
+- **prior distributions that exclude the point of convergence**
+- **convergence to the edge of parameter space**
+ - if $\theta_0$ is at the edge of the parameter space, the distribution cannot be symmetric
+- **tails of the distribution**
+ - the normal approximation can be true for almost all of the mass of the posterior but not be true at the tails
+
+### Lecture notes
+
+#### Lecture 11.1. 'Normal approximation (Laplace approximation)'
+
+(no additional notes)
+
+#### Lecture 11.2. 'Large sample theory and counter examples'
+
+- *large sample theory*:
+ - *consistency*: if the true distribution is included in the parametric family then the posterior converges to a point $\theta_0$ when $n \rightarrow \infty$
+ - "included in the parametric family": $f(y) = p(y|\theta_0)$ for some $\theta_0$
+ - the point does not have uncertainty
+ - same result as MLE
+ - if the true distribution is not included in the parameteric family, then there is no true $\theta_0$, so replace it with the $\theta_0$ that minimized the KL divergence
diff --git a/docs/notes/11_normal-approx-freq-properties_bda3-04.html b/docs/notes/11_normal-approx-freq-properties_bda3-04.html
new file mode 100644
index 0000000..b1766cf
--- /dev/null
+++ b/docs/notes/11_normal-approx-freq-properties_bda3-04.html
@@ -0,0 +1,2625 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
11. Normal approximation & Frequency properties
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
11. Normal approximation & Frequency properties
+
+
+
+
+
+
+
+
Resources
+
+- reading:
+
+- lectures:
+
+- slides
+
+
Notes
+
Reading instructions
+
+- chapter outline:
+
+- 4.1 Normal approximation (Laplace’s method)
+- 4.2 Large-sample theory
+- 4.3 Counterexamples
+- 4.4 Frequency evaluation (not part of the course, but interesting)
+- 4.5 Other statistical methods (not part of the course, but interesting)
+
+
+
Chapter 4. Asymptotics and connections to non-Bayesian approaches
+
+- asymptotic theory: as sample size increased, the influence of the prior on the posteior decreases
+
+- used as the justification of non-informative priors in many cases
+
+
+
4.1 Normal approximations of the posterior distribution
+
+- if the posterior distribution \(p(\theta|y)\) is unimodal and symmetric, it is useful to approximate it as a normal
+
+- therefore, the log of the posterior is a quadratic function of \(\theta\)
+
+- “For a finite sample size \(n\), the normal approximation is typically more accurate for conditional and marginal distributions of components of \(\theta\) than for the full joint distribution.” (pg. 85)
+- common to use the normal approximations to quickly debug or sanity-check a model’s code
+
+
4.2 Large-sample theory
+
asymptotic normality of the posterior distribution: with more data from the same underlying process, the posterior distribution of the parameter vector approaches multivariate normality even if the true distribution of the data is not within the parametric family under consideration (pg. 87) - particularly for independent samples from the data-generating process - summary at the limit of large \(n\): - the posterior mode \(\hat\theta\) approaches the true \(\theta_0\) - the likelihood dominates the prior distribution
+
4.3 Counterexamples to the theorems
+
+- there are many instances where large amounts of data do not allow for the normal approximation:
+- underidentified models and nonidentified parameters
+
+- “the model is underidentified given data \(y\) if the likelihood \(p(\theta|y)\) is equal for a range of \(\theta\)” (pg. 89)
+- there is no single point \(\theta_0\) to which the posterior distribution can converge given infinite data
+- a parameter can be nonidentified if there is no supply of information about it
+
+- results in its posterior being identical to its prior
+
+
+- number of parameters increasing with sample size
+
+- in complicated problems, the number of parameters can scale with the amount of data
+- e.g. Gaussian processes or hierarchical models
+
+- aliasing
+
+- the same likelihood function repeats at a discrete set of points
+
+- a special case of underidentified parameters
+
+- e.g. a mixture model with two mixed distributions with the same parameters
+
+- unbounded likelihoods
+
+- if the likelihood is unbounded, there might not be any posterior mode with the parameter space
+
+- invalidates bot the consistency results and the normal approximation
+
+
+- improper posterior distributions
+
+- an improper posterior integrates to infinity, not to 1 as is required by the normal approximation theory
+- an improper posterior can only occur with an improper prior
+
+- prior distributions that exclude the point of convergence
+- convergence to the edge of parameter space
+
+- if \(\theta_0\) is at the edge of the parameter space, the distribution cannot be symmetric
+
+- tails of the distribution
+
+- the normal approximation can be true for almost all of the mass of the posterior but not be true at the tails
+
+
+
Lecture notes
+
Lecture 11.1. ‘Normal approximation (Laplace approximation)’
+
(no additional notes)
+
Lecture 11.2. ‘Large sample theory and counter examples’
+
+- large sample theory:
+
+- consistency: if the true distribution is included in the parametric family then the posterior converges to a point \(\theta_0\) when \(n \rightarrow \infty\)
+
+- “included in the parametric family”: \(f(y) = p(y|\theta_0)\) for some \(\theta_0\)
+- the point does not have uncertainty
+- same result as MLE
+
+- if the true distribution is not included in the parameteric family, then there is no true \(\theta_0\), so replace it with the \(\theta_0\) that minimized the KL divergence
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/docs/project-guidelines.html b/docs/project-guidelines.html
index 11891ba..5eaac40 100644
--- a/docs/project-guidelines.html
+++ b/docs/project-guidelines.html
@@ -2317,6 +2317,7 @@
${suggestion.title}
Section 8. Model checking & Cross-validation
Section 9. Model comparison and selection
Section 10. Decision analysis
+
Section 11. Normal approximation & Frequency properties
diff --git a/docs/reading-instructions/BDA3_ch04_reading-instructions.pdf b/docs/reading-instructions/BDA3_ch04_reading-instructions.pdf
new file mode 100644
index 0000000..a5ca4d6
Binary files /dev/null and b/docs/reading-instructions/BDA3_ch04_reading-instructions.pdf differ
diff --git a/docs/search.json b/docs/search.json
index 4029a82..f12a985 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -6,7 +6,7 @@
"description": "Some additional details about the website",
"author": [],
"contents": "\n\n\n\n",
- "last_modified": "2021-11-18T07:09:12-05:00"
+ "last_modified": "2021-12-02T06:27:56-05:00"
},
{
"path": "index.html",
@@ -18,14 +18,14 @@
"url": "https://joshuacook.netlify.app"
}
],
- "contents": "\nResources\nCourse website\n2021 Schedule\nGitHub repo (my fork)\nBayesian Data Analysis (3e) (BDA3) (exercise solutions)\nChapter Notes\nVideo lectures or individually lists here\nLecture slides\nHow to study\n\nThe following are recommendations from the course creators on how to take the course.\n\nThe recommended way to go through the material is:\nRead the reading instructions for a chapter in the chapter notes.\nRead the chapter in BDA3 and check that you find the terms listed in the reading instructions.\nWatch the corresponding video lecture to get explanations for most important parts.\nRead corresponding additional information in the chapter notes.\nRun the corresponding demos in R demos or Python demos.\nRead the exercise instructions and make the corresponding assignments. Demo codes in R demos and Python demos have a lot of useful examples for handling data and plotting figures. If you have problems, visit TA sessions or ask in course slack channel.\nIf you want to learn more, make also self study exercises listed below.\nSections\nSection\nNotes\nBook exercises\nAssignments\n1. Course Introduction\nnotes\nexercises\nassignment 1\n2. Basics of Bayesian Inference\nnotes\nexercises\nassignment 2\n3. Multidimensional Posterior\nnotes\nexercises\nassignment 3\n4. Monte Carlo\nnotes\nexercises\nassignment 4\n5. Markov chain Monte Carlo\nnotes\nexercises\nassignment 5\n6. HMC, NUTS, and Stan\nnotes\nexercises\nassignment 6\n7. Hierarchical models and exchangeability\nnotes\nexercises\nassignment 7\n8. Model checking & Cross-validation\nnotes\n(none)\n(none)\n9. Model comparison and selection\nnotes\n(none)\nassignment 8\n10. Decision analysis\nnotes\n(none)\nassignment 9\nStan models\nDrug bioassay model for Assignment 6\n8 school SAT model\nDrownings for Assignment 7\nFactory machine measurements for Assignments 7 & 8:\npooled\nseparate\nhierarchical (also used in Assignment 9)\n\n",
- "last_modified": "2021-11-18T07:09:13-05:00"
+ "contents": "\nResources\nCourse website\n2021 Schedule\nGitHub repo (my fork)\nBayesian Data Analysis (3e) (BDA3) (exercise solutions)\nChapter Notes\nVideo lectures or individually lists here\nLecture slides\nHow to study\n\nThe following are recommendations from the course creators on how to take the course.\n\nThe recommended way to go through the material is:\nRead the reading instructions for a chapter in the chapter notes.\nRead the chapter in BDA3 and check that you find the terms listed in the reading instructions.\nWatch the corresponding video lecture to get explanations for most important parts.\nRead corresponding additional information in the chapter notes.\nRun the corresponding demos in R demos or Python demos.\nRead the exercise instructions and make the corresponding assignments. Demo codes in R demos and Python demos have a lot of useful examples for handling data and plotting figures. If you have problems, visit TA sessions or ask in course slack channel.\nIf you want to learn more, make also self study exercises listed below.\nSections\nSection\nNotes\nBook exercises\nAssignments\n1. Course Introduction\nnotes\nexercises\nassignment 1\n2. Basics of Bayesian Inference\nnotes\nexercises\nassignment 2\n3. Multidimensional Posterior\nnotes\nexercises\nassignment 3\n4. Monte Carlo\nnotes\nexercises\nassignment 4\n5. Markov chain Monte Carlo\nnotes\nexercises\nassignment 5\n6. HMC, NUTS, and Stan\nnotes\nexercises\nassignment 6\n7. Hierarchical models and exchangeability\nnotes\nexercises\nassignment 7\n8. Model checking & Cross-validation\nnotes\n(none)\n(none)\n9. Model comparison and selection\nnotes\n(none)\nassignment 8\n10. Decision analysis\nnotes\n(none)\nassignment 9\n11. Normal approximation & Frequency properties\nnotes\n(none)\n(none)\nStan models\nDrug bioassay model for Assignment 6\n8 school SAT model\nDrownings for Assignment 7\nFactory machine measurements for Assignments 7 & 8:\npooled\nseparate\nhierarchical (also used in Assignment 9)\n\n",
+ "last_modified": "2021-12-02T06:27:58-05:00"
},
{
"path": "project-guidelines.html",
"author": [],
"contents": "\nProject work details\nProject work involves choosing a data set and performing a whole analysis according to all the parts of Bayesian workflow studied along the course.\nThe project work is meant to be done in period II.\nIn the beginning of the period II\nForm a group. We prefer groups of 3, but the project can be done in groups of 1-2.\nSelect a topic. You may ask in the course chat channel #project for opinion whether it’s a good topic and a good dataset. You can change the topic later.\nStart planning.\n\n\nThe main work for the project and the presentation will be done in the second half of the period II after all the workflow parts have been discussed in the course.\nThe online presentations will be made on the evaluation week after period II.\nAll suspected plagiarism will be reported and investigated. See more about the Aalto University Code of Academic Integrity and Handling Violations Thereof.\nProject schedule\nForm a group and pick a topic. Register the group before end of 8th Nov, 2021.\nGroups of 3 can reserve a presentation slot starting TBA.\nGroups of 2 can reserve a presentation slot starting TBA.\nGroups of 1 can reserve a presentation slot starting TBA.\nGroups that register late can reserve a presentation slot starting TBA.\nWork on the project. TA session queue is also for project questions.\nProject report deadline December 6, 2021. Submit in peergrade (separate “class”, the link will be added to MyCourses).\nProject report peer grading December 7-9, 2021 (so that you’ll get feedback for the report before the presentations).\nProject presentations December 13-17, 2021.\nGroups\nProject work is done in groups of 1-3 persons. Preferred group size is 3, because you learn more when you talk about the project with someone else.\nIf you don’t have a group, you can ask other students in the group chat channel #project. Tell what kind of data you are interested in (e.g. medicine, health, biological, engineering, political, business), whether you prefer R or Python, and whether you have already more concrete idea for the topic.\nGroups of 3 students can choose their presentation time slot before 1-2 student groups. 3 person group is expected to do a bit more work than 1-2 person groups.\nYou can do the project alone, but the amount of work is expected to the same for 2 person groups.\nTA sessions\nThe groups will get help for the project work in TA sessions. When there are no weekly assignments, the TA sessions are still organized for helping in the project work.\nEvaluation\nThe project work’s evaluation consists of:\npeer-graded project report (40%) (within peergrade submission 80% and feedack 20%)\npresentation and oral exam graded by the course staff (60%)\nclarity of slides + use of figures\nclarity of oral presentation + flow of the presentation\nall required parts included (not necessarily all in main slides, but it needs to be clear that all required steps were performed)\naccuracy of use of terms (oral exam)\nresponses to questions (oral exam)\n\nProject report\nIn the project report you practice presenting the problem and data analysis results, which means that minimal listing of code and figures is not a good report. There are different levels for how data analysis project could be reported. This report should be more than a summary of results without workflow steps. While describing the steps and decisions made during the workflow, to keep the report readable some of the diagnostic outputs and code can be put in the appendix. If you are uncertain you can ask TAs in TA sessions whether you are on a good level of amount of details.\nThe report should include\nIntroduction describing\nthe motivation\nthe problem\nand the main modeling idea.\nShowing some illustrative figure is recommended.\nDescription of the data and the analysis problem. Provide information where the data was obtained, and if it has been previously used in some online case study and how your analysis differs from the existing analyses.\nDescription of at least two models, for example:\nnon hierarchical and hierarchical,\nlinear and non linear,\nvariable selection with many models.\nInformative or weakly informative priors, and justification of their choices.\nStan, rstanarm or brms code.\nHow to the Stan model was run, that is, what options were used. This is also more clear as combination of textual explanation and the actual code line.\nConvergence diagnostics (\\(\\widehat{R}\\), ESS, divergences) and what was done if the convergence was not good with the first try.\nPosterior predictive checks and what was done to improve the model.\nModel comparison (e.g. with LOO-CV).\nPredictive performance assessment if applicable (e.g. classification accuracy) and evaluation of practical usefulness of the accuracy.\nSensitivity analysis with respect to prior choices (i.e. checking whether the result changes a lot if prior is changed)\nDiscussion of issues and potential improvements.\nConclusion what was learned from the data analysis.\nSelf-reflection of what the group learned while making the project.\nProject presentation\nIn addition to the submitted report, each project must be presented by the authoring group, according to the following guidelines:\nThe presentation should be high level but sufficiently detailed information should be readily available to help answering questions from the audience.\nThe duration of the presentation should be 10 minutes (groups of 1-2 students) or 15 minutes (groups of 3 students).\nAt the end of the presentation there will be an extra 5-10 minutes of questions by anyone in the audience or two members of the course staff who are present. The questions from lecturer/TAs can be considered as an oral exam questions, and if answers to these questions reveal weak knowledge of the methods and workflow steps which should be part of the project, that can reduce the grade.\nGrading will be done by the two members of the course staff using standardized grading instructions.\nSpecific recommendations for the presentations include:\nThe first slide should include project’s title and group members’ names.\nThe chosen statistical model(s), including observation model and priors, must be explained and justified,\nMake sure the font is big enough to be easily readable by the audience. This includes figure captions, legends and axis information,\nThe last slide should be a summary or take-home-messages and include contact information or link to a further information. (The grade will be reduced by one if the last slide has only something like “Thank you” or “Questions?”),\nIn general, the best presentations are often given by teams that have frequently attended TA sessions and gotten feedback, so we strongly recommend attending these sessions.\nMore details on the presentation sessions\nIf you don’t have microphone or video camera (e.g. in your laptop or mobile phone) then we’ll arrange your presentation on campus in period III.\nIf you reserved a presentation slot but need to cancel, do it asap.\nZoom meeting link for all time slots available in the course chat.\nAs we have many presentation in each slot join the meeting in time. Late arrivals will lower the grade. Very late arrivals will fail the presentation and can present in period III.\nPresenting group needs to have video and audio on.\nIt is easiest if just one from the group shares the slides, but it is expected that all group members present some part of the presentation orally.\nPresentation time is 10 min for 1-2 person groups and 15min for 3 person groups\nTime limit is strict. It’s good idea to practice the talk so that you get the timing right. Staff will announce 2min and 1min left and time ended. Going overtime reduces the grade.\nAfter the presentation there will be 5min for questions, answers, and feedback.\nEach student has to come up with at least one question during the session. Students can ask more questions. Questions by students are posted in chat, and they can be posted already during the presentation.\nStaff Will ask further questions (kind of oral exam)\nGrading of the project presentation takes int account\nclarity of slides + use of figures\nclarity of oral presentation + flow of the presentation\nall required parts included (not necessarily all in main slides, but it needs to be clear that all required steps were performed)\naccuracy of use of terms (oral exam)\nresponses to questions (oral exam)\n\nStudents will also self-evaluate their project. After the presentation each student who just presented sends a private message to one of the staff members with a self evaluation grade from themselves and for each group member (if applicable).\nData sets\nAs some data sets have been overused for these particular goals, note that the following ones are forbidden in this work (more can be added to this list so make sure to check it regularly):\nextremly common data sets like titanic, mtcars, iris\nBaseball batting (used by Bob Carpenter’s StanCon case study).\nData sets used in the course demos\nIt’s best to use a dataset for which there is no ready made analysis in internet, but if you choose a dataset used already in some online case study, provide the link to previous studies and report how your analysis differs from those (for example if someone has made non-Bayesian analysis and you do the full Bayesian analysis).\nDepending on the model and the structure of the data, a good data set would have more than 100 observations but less than 1 million. If you know an interesting big data set, you can use a smaller subset of the data to keep the computation times feasible. It would be good that the data has some structure, so that it is sensible to use multilevel/hierarchical models.\nModel requirements\nEvery parameter needs to have an explicit proper prior. Improper flat priors are not allowed.\nA hierarchical model is a model where the prior of certain parameter contain other parameters that are also estimated in the model. For instance, b ~ normal(mu, sigma), mu ~ normal(0, 1), sigma ~ exponential(1).\nDo not impose hard constrains on a parameter unless they are natural to them. uniform(a, b) should not be used unless the boundaries are really logical boundaries and values beyond the boundaries are completely impossible.\nAt least some models should include covariates. Modelling the outcome without predictors is likely too simple for the project.\nbrms can be used, but the Stan code must be included, briefly commented, and all priors need to be checked from the Stan code and adjusted to be weakly informative based on some justified explanation.\nSome examples\nThe following case study examples demonstrate how text, equations, figures, and code, and inference results can be included in one report. These examples don’t necessarily have all the workflow steps required in your report, but different steps are illustrated in different case studies and you can get good ideas for your report just by browsing through them.\nBDA R and Python demos are quite minimal in description of the data and discussion of the results, but show many diagnostics and basic plots.\nSome Stan case studies focus on some specific methods, but there are many case studies that are excellent examples for this course. They don’t include all the steps required in this course, but are good examples of writing. Some of them are longer or use more advanced models than required in this course.\nBayesian workflow for disease transmission modeling in Stan\nModel-based Inference for Causal Effects in Completely Randomized Experments\nTagging Basketball Events with HMM in Stan\nModel building and expansion for golf putting\nA Dyadic Item Response Theory Model\nPredator-Prey Population Dynamics: the Lotka-Volterra model in Stan\nSome StanCon case studies (scroll down) can also provide good project ideas.\n",
- "last_modified": "2021-11-18T07:09:14-05:00"
+ "last_modified": "2021-12-02T06:27:59-05:00"
}
],
"collections": []
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index af36cec..b72a8f3 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -6,10 +6,10 @@
https://jhrcook.github.io/bayesian-data-analysis-course/
- 2021-11-18T07:07:17-05:00
+ 2021-12-02T06:26:56-05:00
https://jhrcook.github.io/bayesian-data-analysis-course/project-guidelines.html
- 2021-11-18T07:09:13-05:00
+ 2021-12-02T06:27:58-05:00
diff --git a/docs/slides/slides_ch4.pdf b/docs/slides/slides_ch4.pdf
new file mode 100644
index 0000000..769a6e6
Binary files /dev/null and b/docs/slides/slides_ch4.pdf differ
diff --git a/index.Rmd b/index.Rmd
index 04b4a86..5c2b0c9 100644
--- a/index.Rmd
+++ b/index.Rmd
@@ -15,7 +15,6 @@ site: distill::distill_website
knitr::opts_chunk$set(echo = FALSE, dpi = 300)
```
-
## Resources
- [Course website](https://avehtari.github.io/BDA_course_Aalto/)
@@ -54,6 +53,7 @@ The recommended way to go through the material is:
| **8. Model checking & Cross-validation** | [notes](notes/08_model-checking-and-cv_bda3-6-7.html) | (none) | (none) |
| **9. Model comparison and selection** | [notes](notes/09_model-selection_bda3-7.html) | (none) | [assignment 8](assignments/jhcook-assignment-08.html) |
| **10. Decision analysis** | [notes](notes/10_decision-analysis_bda3-9.html) | (none) | [assignment 9](assignments/jhcook-assignment-09.html) |
+| **11. Normal approximation & Frequency properties** | [notes](notes/11_normal-approx-freq-properties_bda3-04.html) | (none) | (none) |
## Stan models
diff --git a/notes/11_normal-approx-freq-properties_bda3-04.Rmd b/notes/11_normal-approx-freq-properties_bda3-04.Rmd
new file mode 100644
index 0000000..11ac09a
--- /dev/null
+++ b/notes/11_normal-approx-freq-properties_bda3-04.Rmd
@@ -0,0 +1,92 @@
+---
+title: "11. Normal approximation & Frequency properties"
+date: "2021-11-19"
+output: distill::distill_article
+---
+
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = TRUE, dpi = 300, comment = "#>")
+```
+
+## Resources
+
+- reading:
+ - BDA3 ch 4. *Asymptotics and connections to non-Bayesian approaches*
+ - [reading instructions](../reading-instructions/BDA3_ch04_reading-instructions.pdf)
+- lectures:
+ - [Lecture 11.1. 'Normal approximation (Laplace approximation)'](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=e22fedc7-9fd3-4d1e-8318-ab1000ca45a4)
+ - [Lecture 11.2. 'Large sample theory and counter examples'](https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=a8e38a95-a944-4f3d-bf95-ab1000dbdf73)
+- [slides](../slides/slides_ch04.pdf)
+
+## Notes
+
+### Reading instructions
+
+- chapter outline:
+ - 4.1 Normal approximation (Laplace’s method)
+ - 4.2 Large-sample theory
+ - 4.3 Counterexamples
+ - 4.4 Frequency evaluation (not part of the course, but interesting)
+ - 4.5 Other statistical methods (not part of the course, but interesting)
+
+### Chapter 4. Asymptotics and connections to non-Bayesian approaches
+
+- *asymptotic theory*: as sample size increased, the influence of the prior on the posteior decreases
+ - used as the justification of non-informative priors in many cases
+
+#### 4.1 Normal approximations of the posterior distribution
+
+- if the posterior distribution $p(\theta|y)$ is unimodal and symmetric, it is useful to approximate it as a normal
+ - therefore, the log of the posterior is a quadratic function of $\theta$
+- "For a finite sample size $n$, the normal approximation is typically more accurate for conditional and marginal distributions of components of $\theta$ than for the full joint distribution." (pg. 85)
+- common to use the normal approximations to quickly debug or sanity-check a model's code
+
+#### 4.2 Large-sample theory
+
+*asymptotic normality of the posterior distribution*: with more data from the same underlying process, the posterior distribution of the parameter vector approaches multivariate normality even if the true distribution of the data is not within the parametric family under consideration (pg. 87)
+ - particularly for independent samples from the data-generating process
+- summary at the limit of large $n$:
+ - the posterior mode $\hat\theta$ approaches the true $\theta_0$
+ - the likelihood dominates the prior distribution
+
+#### 4.3 Counterexamples to the theorems
+
+- there are many instances where large amounts of data do not allow for the normal approximation:
+- **underidentified models and nonidentified parameters**
+ - "the model is *underidentified* given data $y$ if the likelihood $p(\theta|y)$ is equal for a range of $\theta$" (pg. 89)
+ - there is no single point $\theta_0$ to which the posterior distribution can converge given infinite data
+ - a parameter can be nonidentified if there is no supply of information about it
+ - results in its posterior being identical to its prior
+- **number of parameters increasing with sample size**
+ - in complicated problems, the number of parameters can scale with the amount of data
+ - e.g. Gaussian processes or hierarchical models
+- **aliasing**
+ - the same likelihood function repeats at a discrete set of points
+ - a special case of underidentified parameters
+ - e.g. a mixture model with two mixed distributions with the same parameters
+- **unbounded likelihoods**
+ - if the likelihood is unbounded, there might not be any posterior mode with the parameter space
+ - invalidates bot the consistency results and the normal approximation
+- **improper posterior distributions**
+ - an improper posterior integrates to infinity, not to 1 as is required by the normal approximation theory
+ - an improper posterior can only occur with an improper prior
+- **prior distributions that exclude the point of convergence**
+- **convergence to the edge of parameter space**
+ - if $\theta_0$ is at the edge of the parameter space, the distribution cannot be symmetric
+- **tails of the distribution**
+ - the normal approximation can be true for almost all of the mass of the posterior but not be true at the tails
+
+### Lecture notes
+
+#### Lecture 11.1. 'Normal approximation (Laplace approximation)'
+
+(no additional notes)
+
+#### Lecture 11.2. 'Large sample theory and counter examples'
+
+- *large sample theory*:
+ - *consistency*: if the true distribution is included in the parametric family then the posterior converges to a point $\theta_0$ when $n \rightarrow \infty$
+ - "included in the parametric family": $f(y) = p(y|\theta_0)$ for some $\theta_0$
+ - the point does not have uncertainty
+ - same result as MLE
+ - if the true distribution is not included in the parameteric family, then there is no true $\theta_0$, so replace it with the $\theta_0$ that minimized the KL divergence
diff --git a/notes/11_normal-approx-freq-properties_bda3-04.html b/notes/11_normal-approx-freq-properties_bda3-04.html
new file mode 100644
index 0000000..b1766cf
--- /dev/null
+++ b/notes/11_normal-approx-freq-properties_bda3-04.html
@@ -0,0 +1,2625 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
11. Normal approximation & Frequency properties
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
11. Normal approximation & Frequency properties
+
+
+
+
+
+
+
+
Resources
+
+- reading:
+
+- lectures:
+
+- slides
+
+
Notes
+
Reading instructions
+
+- chapter outline:
+
+- 4.1 Normal approximation (Laplace’s method)
+- 4.2 Large-sample theory
+- 4.3 Counterexamples
+- 4.4 Frequency evaluation (not part of the course, but interesting)
+- 4.5 Other statistical methods (not part of the course, but interesting)
+
+
+
Chapter 4. Asymptotics and connections to non-Bayesian approaches
+
+- asymptotic theory: as sample size increased, the influence of the prior on the posteior decreases
+
+- used as the justification of non-informative priors in many cases
+
+
+
4.1 Normal approximations of the posterior distribution
+
+- if the posterior distribution \(p(\theta|y)\) is unimodal and symmetric, it is useful to approximate it as a normal
+
+- therefore, the log of the posterior is a quadratic function of \(\theta\)
+
+- “For a finite sample size \(n\), the normal approximation is typically more accurate for conditional and marginal distributions of components of \(\theta\) than for the full joint distribution.” (pg. 85)
+- common to use the normal approximations to quickly debug or sanity-check a model’s code
+
+
4.2 Large-sample theory
+
asymptotic normality of the posterior distribution: with more data from the same underlying process, the posterior distribution of the parameter vector approaches multivariate normality even if the true distribution of the data is not within the parametric family under consideration (pg. 87) - particularly for independent samples from the data-generating process - summary at the limit of large \(n\): - the posterior mode \(\hat\theta\) approaches the true \(\theta_0\) - the likelihood dominates the prior distribution
+
4.3 Counterexamples to the theorems
+
+- there are many instances where large amounts of data do not allow for the normal approximation:
+- underidentified models and nonidentified parameters
+
+- “the model is underidentified given data \(y\) if the likelihood \(p(\theta|y)\) is equal for a range of \(\theta\)” (pg. 89)
+- there is no single point \(\theta_0\) to which the posterior distribution can converge given infinite data
+- a parameter can be nonidentified if there is no supply of information about it
+
+- results in its posterior being identical to its prior
+
+
+- number of parameters increasing with sample size
+
+- in complicated problems, the number of parameters can scale with the amount of data
+- e.g. Gaussian processes or hierarchical models
+
+- aliasing
+
+- the same likelihood function repeats at a discrete set of points
+
+- a special case of underidentified parameters
+
+- e.g. a mixture model with two mixed distributions with the same parameters
+
+- unbounded likelihoods
+
+- if the likelihood is unbounded, there might not be any posterior mode with the parameter space
+
+- invalidates bot the consistency results and the normal approximation
+
+
+- improper posterior distributions
+
+- an improper posterior integrates to infinity, not to 1 as is required by the normal approximation theory
+- an improper posterior can only occur with an improper prior
+
+- prior distributions that exclude the point of convergence
+- convergence to the edge of parameter space
+
+- if \(\theta_0\) is at the edge of the parameter space, the distribution cannot be symmetric
+
+- tails of the distribution
+
+- the normal approximation can be true for almost all of the mass of the posterior but not be true at the tails
+
+
+
Lecture notes
+
Lecture 11.1. ‘Normal approximation (Laplace approximation)’
+
(no additional notes)
+
Lecture 11.2. ‘Large sample theory and counter examples’
+
+- large sample theory:
+
+- consistency: if the true distribution is included in the parametric family then the posterior converges to a point \(\theta_0\) when \(n \rightarrow \infty\)
+
+- “included in the parametric family”: \(f(y) = p(y|\theta_0)\) for some \(\theta_0\)
+- the point does not have uncertainty
+- same result as MLE
+
+- if the true distribution is not included in the parameteric family, then there is no true \(\theta_0\), so replace it with the \(\theta_0\) that minimized the KL divergence
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/reading-instructions/BDA3_ch04_reading-instructions.pdf b/reading-instructions/BDA3_ch04_reading-instructions.pdf
new file mode 100644
index 0000000..a5ca4d6
Binary files /dev/null and b/reading-instructions/BDA3_ch04_reading-instructions.pdf differ
diff --git a/slides/slides_ch4.pdf b/slides/slides_ch4.pdf
new file mode 100644
index 0000000..769a6e6
Binary files /dev/null and b/slides/slides_ch4.pdf differ