---
title: "Model Convergence With mig_estimate_rc"
author: "Monica Alexander, Jessie Yeung, Tim Riffe"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: REFERENCES.bib
vignette: >
  %\VignetteIndexEntry{Achieving Model Convergence With mig_estimate_rc}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE, eval=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE,
  fig.width = 8,
  fig.height = 4,
  fig.align = "center"
)
```


## Model Diagnostics in Bayesian Models

### Overview
The `mig_estimate_rc` function estimates Rogers-Castro parameters in a Bayesian framework using a Markov Chain Monte Carlo (MCMC) algorithm, via the Stan programming language [@carpenter2017stan]. This method of estimation is advantageous for models that are difficult to estimate and highly non-linear (such as the Rogers-Castro model). However for models estimated using MCMC algorithms, it is necessary to check the model diagnostics before interpreting model results. 

The output of `mig_estimate_rc` allows you to easily check two diagnostic values in the `check_converge` object of the output: 

* The potential scale reduction statistic, commonly referred to as the **R-hat** statistic, provides insight into whether the model has converged [@gelman1992inference]. You want the R-hat values to be close to 1, and R-hat values far greater than 1 indicate that convergence has not been achieved. Generally, Gelman et al. recommend ensuring that R-hat values are below 1.1, although there is no universally agreed upon threshold [@gelman2013bayesian].

* The **Effective sample size** provides insight into the autocorrelation among samples in the same chain. The larger the effective sample size the better. If your effective sample size is small, you should consider increasing the number of MCMC samples.

The rest of this document focuses on dealing with convergence issues, which will likely be reflected as large R-hat values. It is not uncommon to find that your model has not converged. This is particularly true if you are fitting a the Poisson model (i.e., providing count data) with a large total population size and/or you are fitting either the 11 or 13 parameter models. Resolving a non-convergent model is not always straight-forward, but we will explain some strategies that can be used with `mig_estimate_rc`. 

## Example of a Non-Convergent Model

Here is an example of a non-convergent model. We will be fitting the full 13-parameter model. In this example, we provide count data meaning that we are fitting the Poisson model.  

```{r, include=FALSE}
set.seed(123)
```

```{r}
library(rcbayes)
library(tibble)
library(ggplot2)

ages <- 0:80
migrants <- c(11804, 10606, 9845, 9244, 8471, 7940, 7348, 6885, 6431,
              6055, 5454, 4997, 4845, 4596, 4397, 4814, 4592, 4646, 5386,
              7180, 11374, 14713, 17195, 18937, 19223, 19091, 18507,
              17615, 16929, 15693, 15246, 14152, 13365, 12340, 11609,
              10278, 9547, 8992, 8438, 7883, 7315, 6909, 6730, 6272,
              5994, 6087, 5896, 5592, 5487, 5237, 6021, 5933, 5577,
              5674, 5503, 4916, 5008, 4822, 4824, 4696, 4086, 4019,
              4139, 4054, 4134, 3625, 3871, 4238, 4306, 4440, 3118,
              2980, 2885, 2845, 2795, 2085, 2076, 2035, 2030, 1986, 2037)
pop <- c(105505, 105505, 105505, 105505, 105505, 106126, 106126, 106126,
         106126, 106126, 100104, 100104, 100104, 100104, 100104, 114880,
         114880, 114880, 114880, 114880, 136845, 136845, 136845, 136845,
         136845, 136582, 136582, 136582, 136582, 136582, 141935, 141935,
         141935, 141935, 141935, 134097, 134097, 134097, 134097, 134097,
         130769, 130769, 130769, 130769, 130769, 133718, 133718, 133718,
         133718, 133718, 154178, 154178, 154178, 154178, 154178, 145386,
         145386, 145386, 145386, 145386, 126270, 126270, 126270, 126270,
         126270, 108314, 108314, 108314, 108314, 108314, 79827, 79827,
         79827, 79827, 79827, 59556, 59556, 59556, 59556, 59556, 59556)
```
```{r, eval=FALSE}
rc_res <- mig_estimate_rc(
  ages, migrants, pop,
  pre_working_age = TRUE,
  working_age = TRUE,
  retirement = TRUE,
  post_retirement = TRUE
)
```

```{r, echo=FALSE}
rc_res = list(check_converge = c(), pars_df = c(), fit_df = c())

rc_res[['check_converge']] <- matrix(
  c(
    0.113605451436614,0.00779593103661137,2.15774607128397,3.70790733560849,
    0.111764533173589,0.00797819680947733,2.18281974401689,3.4397804041693,
    0.200212843072318,0.0476641666967773,2.42785538791501,2.33364604863079,
    0.0860134303532162,0.0029298454713518,2.4448397222948,2.36661380639638,
    0.206149088628833,0.00434306449617798,2.62766374221068,2.08317823108183,
    0.011611407086162,0.00465354274179327,2.09013982530816,4.7075734702239,
    0.0111607261406471,0.00204934887455636,7.11402640579904,1.21806784550916,
    21.1903479818411,0.110825507910137,2.42600816481533,2.40555966266519,
    65.8824541176426,0.527963547098352,2.99425165755755,1.73668210871192,
    0.391006779161944,0.00755861905096238,2.98501838004935,1.75178240898006,
    0.564868092984617,0.211502782189408,2.35090673278088,2.66219716061898,
    0.00291870586352614,0.00744665931477216,2.96818132803141,1.90946024259034,
    0.0140624198563911,0.00535502339245935,2.76119187029469,1.92225295693418),
  ncol=4,
  dimnames = list(c("alpha1[1]", "alpha2[1]", "alpha3[1]", 
                    "a1[1]", "a2[1]", "a3[1]","a4[1]",
                    "mu2[1]", "mu3[1]", 
                    "lambda2[1]", "lambda3[1]","lambda4[1]", "c"),
                  c("mean", "se_mean", "n_eff", "Rhat")),
  byrow = TRUE)

rc_res[['fit_df']] <- tibble(ages = 0:80,
       data = migrants/pop,
       median = c(0.11107413, 0.10207562, 0.09392648, 0.08667344, 0.08025095, 0.07451119, 
                  0.06935217, 0.06473068, 0.06059616, 0.05690437, 0.05363063, 0.05073012, 
                  0.04815361, 0.04582636, 0.04373611, 0.04189403, 0.04047357, 0.04090839, 
                  0.04697327, 0.06189863, 0.08392274, 0.10695466, 0.12542619, 0.13688149, 
                  0.14149304, 0.14075261, 0.13643752, 0.13001158, 0.12257184, 0.11481948, 
                  0.10718039, 0.09990469, 0.09312533, 0.08687808, 0.08115890, 0.07595000, 
                  0.07122528, 0.06694617, 0.06308948, 0.05961437, 0.05648621, 0.05366093, 
                  0.05111955, 0.04883263, 0.04678603, 0.04495830, 0.04332511, 0.04186645, 
                  0.04056364, 0.03940147, 0.03836733, 0.03744961, 0.03664648, 0.03594472, 
                  0.03532742, 0.03476267, 0.03425199, 0.03379800, 0.03340861, 0.03307914, 
                  0.03279761, 0.03256650, 0.03236997, 0.03221412, 0.03225893, 0.03356208, 
                  0.03610573, 0.03875565, 0.03972603, 0.03947603, 0.03873864, 0.03777993, 
                  0.03683512, 0.03600479, 0.03562471, 0.03506771, 0.03456024, 0.03413901, 
                  0.03382805, 0.03361884, 0.03349211),
       lower = c(0.10955889, 0.10097274, 0.09318158, 0.08588609, 0.07923866, 0.07338551, 
                 0.06821759, 0.06368101, 0.05970448, 0.05620466, 0.05309501, 0.05028441, 
                 0.04768464, 0.04534359, 0.04322938, 0.04133397, 0.03981681, 0.04017004, 
                 0.04617863, 0.06108230, 0.08293964, 0.10596553, 0.12447152, 0.13592858, 
                 0.14058096, 0.13989090, 0.13564572, 0.12925827, 0.12181410, 0.11405511, 
                 0.10645688, 0.09922248, 0.09247883, 0.08626839, 0.08058662, 0.07542530, 
                 0.07076293, 0.06652748, 0.06270095, 0.05924734, 0.05613392, 0.05330736, 
                 0.05075355, 0.04842025, 0.04629537, 0.04440086, 0.04273021, 0.04124867, 
                 0.03995819, 0.03883248, 0.03787124, 0.03703954, 0.03633343, 0.03569461, 
                 0.03507823, 0.03450711, 0.03399487, 0.03351492, 0.03305895, 0.03264036, 
                 0.03225994, 0.03191384, 0.03162075, 0.03139179, 0.03142643, 0.03237460, 
                 0.03424834, 0.03437457, 0.03452557, 0.03467760, 0.03484352, 0.03502117, 
                 0.03520709, 0.03535974, 0.03483698, 0.03431782, 0.03391187, 0.03354558, 
                 0.03316710, 0.03275000, 0.03232411),
       upper = c(0.11341196, 0.10324869, 0.09465255, 0.08733431, 0.08086402, 0.07510810, 
                 0.06994563, 0.06533161, 0.06118399, 0.05747201, 0.05414250, 0.05118015,
                 0.04861464, 0.04649246, 0.04464422, 0.04301912, 0.04171672, 0.04194436, 
                 0.04780617, 0.06280798, 0.08490775, 0.10794241, 0.12647352, 0.13794525, 
                 0.14251906, 0.14171389, 0.13726329, 0.13077593, 0.12327584, 0.11550739, 
                 0.10784189, 0.10052317, 0.09368927, 0.08740651, 0.08168648, 0.07651546, 
                 0.07181934, 0.06755270, 0.06365780, 0.06011183, 0.05690655, 0.05400963, 
                 0.05145158, 0.04918898, 0.04716633, 0.04535589, 0.04372723, 0.04227591, 
                 0.04096728, 0.03979244, 0.03873892, 0.03778779, 0.03694441, 0.03620099, 
                 0.03560906, 0.03520374, 0.03488964, 0.03464470, 0.03446929, 0.03434677, 
                 0.03427521, 0.03423904, 0.03424497, 0.03428457, 0.03435298, 0.03469481, 
                 0.03713969, 0.03975826, 0.04080284, 0.04042104, 0.03946999, 0.03845057, 
                 0.03761703, 0.03686507, 0.03617822, 0.03629931, 0.03654233, 0.03679272, 
                 0.03707064, 0.03734264, 0.03761274))
```


The `check_converge` object in the function output will provide model diagnostics. 

```{r}
rc_res[['check_converge']]
```

We see that there are many R-hat values that are very large and thus the model has not converged properly. Even if we were to continue to plot the results, we can see that bounds don't look quite right. 

```{r, fig.width=6}
rc_res[["fit_df"]] %>%
  ggplot(aes(ages, data)) +
  geom_point(aes(color = "data")) +
  geom_line(aes(x = ages, y = median, color = "fit")) +
  geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2) +
  scale_color_manual(name = "", values = c(data = "red", fit = "black")) +
  ylab("migration rate")
```

We need to fix this non-convergent model before any of the results can be used. 

## Method 1: Adjusting Stan Parameters

One method for improving convergence is to adjust the Stan parameters. Stan may already provide warnings about divergent transitions, low effective sample size and/or maximum treedepth. These errors may suggest specific parameters to change. 

To adjust Stan parameters, `mig_estimate_rc` can accept additional inputs for Stan. For example, to increase the `max_treedepth`, `adapt_delta`, and number of iterations, you can do the following: 

```{r, eval=FALSE}
res <- mig_estimate_rc(ages, migrants, pop,
                       pre_working_age = TRUE,
                       working_age = TRUE,
                       retirement = TRUE,
                       post_retirement = TRUE,
                       #optional inputs into stan
                       iter = 3000,
                       control = list(adapt_delta = 0.95, max_treedepth = 10)
                       )
```

A full list of Stan parameters is available in the [Stan documentation](https://mc-stan.org/rstan/reference/stan.html). 

## Method 2: Setting initial conditions using `init_rc`

One of the additional parameters that can be provided to Stan is `init`, which specifies initial values for the MCMC algorithm. For the Rogers-Castro model, it is possible to improve the likelihood of achieving convergence by strategically setting the initial values based on the data. This technique is formalized in the `init_rc` function, which will output initial values in a list format that can be provided directly to stan. 

```{r, eval=FALSE}
init_vals <- init_rc(ages,
                     migrants,
                     pop,
                     pre_working_age = TRUE,
                     working_age = TRUE,
                     retirement = TRUE,
                     post_retirement = TRUE,
                     nchains = 4)

res <- mig_estimate_rc(ages, migrants, pop,
                       pre_working_age = TRUE,
                       working_age = TRUE,
                       retirement = TRUE,
                       post_retirement = TRUE,
                       #optional inputs into stan
                       init = init_vals
                       )
```

In particular, the `init_rc` function will crudely estimate particular parameters. Here is a simplified explanation of how it determines initial values. 

* The `a` parameters describe the heights of peaks. The function uses the local maximums within specific age ranges. 
* The `mu` parameters describe the age at which peaks occur. The function uses the ages associated with local maximums described above. 
* The `c` parameter describes the overall migration level. The function uses the minimum observed migration rate. 

It is important that the user runs `init_rc` and `mig_estimate_rc` with the same data, included components, and number of chains. Otherwise, `mig_estimate_rc` will not run. 

## Example: Fixing a Non-Convergent Model

Recall the example above of a non-convergent model. We will fix this model by adjusting the Stan parameters and using `init_rc` to set initial values. 

```{r, eval=FALSE}
init_vals <- init_rc(ages,
                     migrants,
                     pop,
                     pre_working_age = TRUE,
                     working_age = TRUE,
                     retirement = TRUE,
                     post_retirement = TRUE,
                     nchains = 4)

rc_res_fixed <- mig_estimate_rc(ages, migrants, pop,
                       pre_working_age = TRUE,
                       working_age = TRUE,
                       retirement = TRUE,
                       post_retirement = TRUE,
                       #optional inputs into stan
                       control = list(adapt_delta = 0.95, max_treedepth = 10),
                       init = init_vals
)
```

```{r, echo=FALSE}
rc_res_fixed = list(check_converge = c(), pars_df = c(), fit_df = c())

rc_res_fixed[['check_converge']] <- matrix(
  c(0.107162666095038,0.000113905396688904,914.0993478393,1.00234765459143,
    0.103416351522329,5.64917230559274e-05,853.927068031998,1.00281875605411,
    0.210669831871199,0.00140706757932151,988.257923785188,1.00150360579644,
    0.0878408694500156,4.26344484838096e-05,1025.19360646823,1.00125328173498,
    0.202957753238002,5.51135303627361e-05,1096.26976516497,1.00141331160793,
    0.016091838651906,6.14366866169559e-05,1022.81686620825,1.00149544086794,
    0.0107940563428325,0.000202777673594207,770.930151592325,1.00187319942316,
    21.0673530496958,0.00160357875882577,914.9772707573,1.00197537852488,
    66.5184716633019,0.0108069832903751,982.132724052394,1.00157929557214,
    0.395991098815759,0.00017884423246015,1158.66065253182,1.00080606722439,
    0.744339527000941,0.00373523428808142,1503.88004401573,1.00163128898661,
    0.0091660363074148,0.000171349903624075,623.526448086348,1.00399958735813,
    0.0120475931079795,0.000236317198704928,716.808119784298,1.0020118971368),
  ncol=4,
  dimnames = list(c("alpha1[1]", "alpha2[1]", "alpha3[1]", 
                    "a1[1]", "a2[1]", "a3[1]","a4[1]",
                    "mu2[1]", "mu3[1]", 
                    "lambda2[1]", "lambda3[1]","lambda4[1]", "c"),
                  c("mean", "se_mean", "n_eff", "Rhat")),
  byrow = TRUE)

rc_res_fixed[['fit_df']] <- tibble(ages = 0:80,
       data = migrants/pop,
       median = c(0.11066577, 0.10183467, 0.09389724, 0.08677859, 0.08039617, 0.07466633,
                  0.06952911, 0.06491991, 0.06079010, 0.05708701, 0.05377364, 0.05079940, 
                  0.04814050, 0.04576222, 0.04363252, 0.04173797, 0.04023607, 0.04059125, 
                  0.04677098, 0.06199728, 0.08416159, 0.10711504, 0.12541505, 0.13670053, 
                  0.14124400, 0.14053808, 0.13631138, 0.12999417, 0.12263667, 0.11494149, 
                  0.10733717, 0.10006698, 0.09326341, 0.08697700, 0.08122151, 0.07598656, 
                  0.07123679, 0.06694529, 0.06307296, 0.05958576, 0.05644594, 0.05362702, 
                  0.05109293, 0.04882184, 0.04678431, 0.04495985, 0.04332688, 0.04186791, 
                  0.04056457, 0.03940128, 0.03836677, 0.03744697, 0.03663026, 0.03590659, 
                  0.03526751, 0.03470475, 0.03421152, 0.03378090, 0.03340640, 0.03308209, 
                  0.03280512, 0.03256866, 0.03237309, 0.03221376, 0.03217486, 0.03312048, 
                  0.03615995, 0.03904294, 0.04014689, 0.03983118, 0.03889165, 0.03782808, 
                  0.03683831, 0.03599620, 0.03530893, 0.03475818, 0.03433355, 0.03401210, 
                  0.03377150, 0.03360034, 0.03348575),
       lower = c(0.10938343, 0.10085839, 0.09315967, 0.08614682, 0.07983181, 0.07413997, 
                 0.06901742, 0.06442625, 0.06031352, 0.05663560, 0.05335241, 0.05039448, 
                 0.04773767, 0.04535350, 0.04321303, 0.04128219, 0.03975127, 0.04009192, 
                 0.04605109, 0.06116725, 0.08333080, 0.10622850, 0.12446871, 0.13582829, 
                 0.14044398, 0.13978288, 0.13557999, 0.12927923, 0.12196578, 0.11434229, 
                 0.10678598, 0.09956663, 0.09280558, 0.08654702, 0.08081728, 0.07559846, 
                 0.07086616, 0.06659222, 0.06273188, 0.05925050, 0.05611922, 0.05331224, 
                 0.05079336, 0.04853664, 0.04651009, 0.04469189, 0.04306340, 0.04161878, 
                 0.04031990, 0.03916481, 0.03813495, 0.03721814, 0.03640352, 0.03567912, 
                 0.03503863, 0.03447041, 0.03397472, 0.03353608, 0.03314746, 0.03280966, 
                 0.03250885, 0.03224883, 0.03202669, 0.03183978, 0.03180072, 0.03236501, 
                 0.03520049, 0.03831441, 0.03940962, 0.03915478, 0.03828029, 0.03723175, 
                 0.03624018, 0.03539759, 0.03471905, 0.03420878, 0.03381553, 0.03348453, 
                 0.03321844, 0.03300026, 0.03281042),
       upper = c(0.11202162, 0.10283789, 0.09466222, 0.08740818, 0.08093686, 0.07517373, 
                 0.07001886, 0.06539917, 0.06126140, 0.05753726, 0.05420355, 0.05121903, 
                 0.04855045, 0.04618349, 0.04406297, 0.04219851, 0.04071295, 0.04111381, 
                 0.04748327, 0.06285125, 0.08503755, 0.10802562, 0.12635079, 0.13758453, 
                 0.14204044, 0.14127723, 0.13701374, 0.13068959, 0.12331207, 0.11556133, 
                 0.10790406, 0.10057925, 0.09373162, 0.08740192, 0.08162284, 0.07636317, 
                 0.07160537, 0.06729989, 0.06341591, 0.05992296, 0.05677216, 0.05393834, 
                 0.05139425, 0.04911275, 0.04706269, 0.04522956, 0.04358717, 0.04211525, 
                 0.04080241, 0.03963269, 0.03859004, 0.03766736, 0.03684888, 0.03612591, 
                 0.03548708, 0.03492513, 0.03443622, 0.03401193, 0.03364986, 0.03333821, 
                 0.03307808, 0.03285901, 0.03268092, 0.03254230, 0.03254832, 0.03405741, 
                 0.03701209, 0.03986310, 0.04093148, 0.04053379, 0.03951791, 0.03839881, 
                 0.03739697, 0.03653334, 0.03584743, 0.03529488, 0.03485950, 0.03452575, 
                 0.03431082, 0.03420497, 0.03418890))
```


```{r}
rc_res_fixed[['check_converge']]
```

All the R-hat values are very close to 1 and effective sample sizes are sufficiently large. 

We can now plot and interpret the model results. 

```{r, fig.width=6}
rc_res_fixed[["fit_df"]] %>%
  ggplot(aes(ages, data)) +
  geom_point(aes(color = "data")) +
  geom_line(aes(x = ages, y = median, color = "fit")) +
  geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2) +
  scale_color_manual(name = "", values = c(data = "red", fit = "black")) +
  ylab("migration rate")
```

## References