---
title: "Rogers Castro Migration Models with rcbayes"
author: "Monica Alexander, Jessie Yeung, Tim Riffe"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: REFERENCES.bib
vignette: >
  %\VignetteIndexEntry{Rogers Castro Migration Models with rcbayes}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
header-includes:
- \usepackage{amsmath}
- \usepackage{amssymb}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE,
  fig.width = 6,
  fig.height = 4,
  fig.align = "center"
)
```


## Why model migration
Migration happens with multiple transitions over the life course such as entry to education, a new job, or retirement  [@preston2000demography]. These transitions happen more frequently at some ages and come in parallel often with migration. Adult migration usually peaks at young adult ages. Around retirement age, there is a second peak. Due to these regularities, it is possible to model migration by age, which is very important for policymakers and for demographers in estimating population dynamics. Age-specific migration models can help to estimate missing data, smooth noisy data, project trends into the future, and to generalize migration patterns across different populations.

`rcbayes` has the functionality to fit and estimate age-specific migration schedules based on the Rogers-Castro migration model. This vignette briefly introduces the Rogers-Castro model and then gives examples of both calculating age-specific migration curves given a set of parameters, and fitting the Rogers-Castro model given a set of age-specific migration rates.

## The Rogers and Castro model
@rogers1981model developed a mathematical model of migration with up to 13 parameters. Seven of these parameters explain the shape of migration by age, while the rest of parameters represent the intensity of migration. The original formula for the migration rate at age $x$ is:

\begin{equation*}
m(x)= a_1 \exp{[ \alpha_1 x ]} + a_2 \exp{[ -\alpha_2 (x - \mu_2) - \exp{ [ -\lambda_2(x - \mu_2) ]}  ]}+ a_3 \exp{[ - \alpha_3(x-\mu_3) - \exp{[-\lambda_3 (x-\mu_3)]} ]} + a_4\exp{[\lambda_4x ]}+ c
\end{equation*}

The $c$ parameter describes the baseline level of migration. There are four other distinct parts to the equation, which each describe the shape and intensity of migration at different ages:

- pre-working age: $a_1 \exp{[ \alpha_1 x ]}$ (Group 1)
- working age: $a_2 \exp{[ -\alpha_2 (x - \mu_2) - \exp{ [ -\lambda_2(x - \mu_2) ]}  ]}$ (Group 2)
- retirement age: $a_3 \exp{[ - \alpha_3(x-\mu_3) - \exp{[-\lambda_3 (x-\mu_3)]} ]}$ (Group 3)
- post-retirement age: $a_4 \exp{[\lambda_4x ]}$ (Group 4)

For each of the components, the $a_k$ terms describe the heights of the peaks of migration rates. The $\alpha_k$ and $\lambda_k$ parameters describe the shape of each of the components, in terms of the rate of change over age. And $\mu_2$ and $\mu_3$ give the ages at the labour force peak and at the retirement peak, respectively.

The migration model need not have all the 'families' of migration at different age stages. In practice, there are four combinations of families that are the most common [@rogers2010indirect]:

- The 7 parameter model, which has the pre-working and working age components
- The 9 parameter model, which has the pre-working, working and post-retirement age components
- The 11 parameter model, which has the pre-working, working and retirement age components
- The 13 parameter model, which has all components.

The functions in `rcbayes` allow for any combination of the components to be included in the model.

## Examples with `rcbayes`

`rcbayes` includes two migration model-related functions: `mig_calculate_rc`, which returns age-specific migration rates calculated based on an age range and set of parameter inputs, and `mig_estimate_rc`, which estimates parameter values and age-specific migration rates $m(x)$ based on an observed age range and migration rates. This section gives examples of both functions.

### Calculating Rogers-Castro migration schedules

We can calculate the implied age-specific rates from a set of parameter inputs. Parameters are defined the same way as in the equation above, that is, `c` is the overall intensity, the `a`'s are the intensities at each age family, the `alpha`s and `lambda`s are the rate of decrease and increase of the shape at each age family, and the `mu`s are the age of peak migration for working age and retirement.

The following is an example specifying values for each of the 13 possible parameters, with values calculated for each age up to age 100:

```{r}
library(rcbayes)
library(tibble)
library(ggplot2)

pars <- c(a1= 0.09, alpha1= 0.1,
          a2= 0.2, alpha2= 0.1, mu2= 21, lambda2= 0.4,
          a3= 0.02, alpha3= 0.25, mu3= 67, lambda3= 0.6,
          a4= 0.01, lambda4= 0.01,
          c= 0.01)

ages <- 0:100
mx <- mig_calculate_rc(ages = ages, pars = pars)

# plot to see what the schedule looks like
df <- tibble(age = ages, mx = mx)
df %>%
  ggplot(aes(age, mx)) +
  geom_line() +
  ggtitle("Rogers-Castro age-specific migration schedule (13-parameter)")

```

Not all parameters need to be specified. The following shows an example of the 9 parameter specification:

```{r}
pars <- c(a1= 0.09, alpha1= 0.1,
          a2= 0.2, alpha2= 0.05, mu2= 25, lambda2= 0.4,
          c= 0.01)

ages <- 0:100
mx <- mig_calculate_rc(ages = ages, pars = pars)

# plot to see what the schedule looks like
df <- tibble(age = ages, mx = mx)
df %>%
  ggplot(aes(age, mx)) +
  geom_line() +
  ggtitle("Rogers-Castro age-specific migration schedule (9-parameter)")
```

Note, however, that all parameters within a particular component family must be specified. So for example, if one of the working-age family parameters is specified (Group 2), then all must be specified, otherwise an error occurs.


### Estimating migration age schedules using the Rogers-Castro model

#### Overview

The `mig_estimate_rc` function returns estimated Rogers-Castro parameters and $m(x)$ values, based on observed age-specific migration data  and the Rogers-Castro components to be included in the model. The function has the capability of estimating a Rogers-Castro age schedule with any combination of the components `pre_working_age`, `working_age`, `retirement` and `post_retirement`. These are specified as logicals (either `TRUE` or `FALSE`) in the function.

As illustrated above, Rogers-Castro migration age schedules are highly non-linear, as so are not necessarily straight forward to estimate. Previous approaches have used, for example, Excel's Solver function or the `optim` function in `R`.[^1] However, the estimated parameters and schedules are highly sensitive to the initial values chosen for the parameter values, and convergence is difficult to achieve for the 11 and 13 parameter models.

In `rcbayes`, we estimate Rogers-Castro schedules in a Bayesian framework using a Markov Chain Monte Carlo (MCMC) algorithm, via the Stan programming language [@carpenter2017stan]. The use of Bayesian methods allows for priors to be set on parameters, which helps convergence in the estimation process.

[^1]: http://demographicestimation.iussp.org/content/multi-exponential-model-migration-schedule

##### Required Inputs for `mig_estimate_rc`

The following arguments are required for the `mig_estimate_rc` function: 

  - `ages`, `pre_working_age`, `working_age`, `retirement` and `post_retirement`
  - Either
    + `migrants` and `pop` to provide data on the number of age-specific migrants and age-specific population OR
    + `mx` to provide data on age-specific migration rates

That is, users have an option to input their data as counts or as rates. Depending on which one is used, a different model will be run. If the user provides data as counts using `migrants` and `pop`, a Poisson model will be applied. If the user provides data as rates, a Normal model will be applied. 

When running `mig_estimate_rc`, a message will appear informing the user of which model is run. 

##### Optional Inputs for `mig_estimate_rc`

In the case that the Normal model is used (i.e., the user inputs age-specific migration rates using the argument `mx`), the user can use the optional argument `sigma` to input the standard deviation for the normal model. If this optional input is not used, the value of `sigma` is estimated. 

Regardless of which model is used, any additional arguments for `mig_estimate_rc` will be additional inputs to `rstan::stan()`. 

#### Example: Estimating migration rates for population with large retirement peak

In this example, we will fit an 11-parameter model to a set of observed age-specific rates from a population that resembles 1% of the Florida population in the United States. First, we can plot the observed rates to get a sense of what the age schedule looks like

```{r, include=FALSE}
set.seed(123)
```

```{r}
fl_ages <- 0:80
fl_migrants <- c(49, 48, 48, 52, 50, 45, 42, 46, 45, 44, 47, 55, 57, 59, 67, 69, 71, 78, 93, 88, 116,
              106, 102, 104, 102, 123, 112, 102, 112, 105, 100, 83, 81, 77, 78, 77, 66, 64, 65, 64,
              68, 52, 59, 51, 54, 55, 52, 58, 64, 53, 68, 53, 57, 67, 71, 78, 75, 77, 77, 83, 88,
              80, 84, 79, 77, 83, 71, 59, 65, 67, 64, 63, 56, 50, 43, 46, 46, 38, 32, 28, 29)
fl_pop <- c(2028, 2193, 2271, 2370, 2403, 2160, 2109, 2206, 2456, 2334, 2392, 2534, 2542, 2601, 2526,
         2416, 2420, 2344, 2606, 2355, 2867, 2589, 2426, 2390, 2377, 2909, 2753, 2633, 2847, 2819,
         2979, 2608, 2708, 2602, 2745, 2883, 2624, 2607, 2677, 2637, 2964, 2414, 2481, 2464, 2510,
         2695, 2552, 2711, 2794, 2683, 2888, 2439, 2631, 2814, 2854, 2999, 2959, 2852, 2957, 2985,
         2970, 2882, 2839, 2737, 2782, 2799, 2710, 2527, 2512, 2530, 2505, 2521, 2551, 2125, 1838,
         2057, 2037, 1804, 1542, 1470, 1452)

df <- tibble(age = fl_ages, mx = fl_migrants / fl_pop)
df %>%
  ggplot(aes(age, mx)) +
  geom_point() +
  ggtitle("Observed migration rates")
```

Let's fit a Rogers-Castro migration age schedule to these data. Below, we choose to estimate parameters associated with the pre-working age, working and retirement components (but not post retirement). We also provide the data as counts, which means that we are fitting a Poisson model. 


```{r, eval=FALSE}
rc_res <- mig_estimate_rc(
  ages=fl_ages, migrants=fl_migrants, pop=fl_pop,
  pre_working_age = TRUE,
  working_age = TRUE,
  retirement = TRUE,
  post_retirement = FALSE,
  # (optional) arguments for Stan
  chains = 4,
  iter = 2000,
  control = list(adapt_delta = 0.8, max_treedepth = 10)
)
```

The `mig_estimate_rc` function also allows for addition arguments that are related to the Stan model. In the example above, the values listed for `chains`, `iter`, `adapt_delta` and `max_treedepth` are the default values, so need have not been specified. However, depending on the context, it may make sense to increase the value of each of these components to ensure convergence. More details about these arguments can be found in the `R` help files for `rstan::stan`, and also by referring to the [Stan documentation](https://mc-stan.org/users/documentation/).

As mentioned above, this example's data is in the form of count data. In the case that your data is in the form of migration rates, swap out the `migrants` and `pop` arguments for `mx`. 

##### Checking Model Diagnostics
When fitting models in a Bayesian framework using MCMC, as in the case of `mig_estimate_rc`, one cannot simply run the model and use the results without further inspection. It is always necessary to assess the model results to ensure that the model has converged. In Bayesian models, convergence would imply that the model has converged to a particular target distribution, which is a necessary condition before you move on and use the model's results. 

One measure to check for convergence is to look at the potential scale reduction statistic, commonly referred to as the R-hat statistic [@gelman1992inference]. Ideally, you want to see the R-hat values close to 1 as R-hat values far greater than 1 indicate that convergence has not been achieved. Generally, Gelman et al. recommend ensuring that R-hat values are below 1.1, although there is no universally agreed upon threshold [@gelman2013bayesian]. More information about R-hat values is available in the [Stan documentation](https://mc-stan.org/docs/2_26/reference-manual/notation-for-samples-chains-and-draws.html).

In addition to convergence, another difficulty around MCMC algorithms is that the samples may be autocorrelated within a chain. One way to measure this is to look at the effective sample size ($N_{eff}$), which tells us the estimation power of your dependent MCMC samples in terms of hypothetical independent samples. A low effective sample size increases the uncertainty of estimates for posterior means, variances, etc [@geyer2011introduction]. If your effective sample size is small, you should consider increasing the number of MCMC samples. More information about the effective sample size is available in the [Stan documentation](https://mc-stan.org/docs/2_26/reference-manual/effective-sample-size-section.html). 

The `check_converge` object in the function output allows you to check the R-hat values and effective sample size. 

```{r, echo=FALSE}
rc_res = list(check_converge = c(), pars_df = c(), fit_df = c())

rc_res[['check_converge']] <- matrix(
  c(0.870226738894234,0.0106440162045495,2978.76440313297,0.99947658219495,
    0.190530488368833,0.00054434025065072,1786.98524371334,1.000853893877,
    0.205526356090562,0.00149377245438701,1706.25886324742,1.00109410733695,
    0.00472626420715214,6.11293188266945e-05,2180.68698683192,1.00187646979496,
    0.0583736582490384,0.000117918917152783,1821.75108753928,1.00032592498145,
    0.019992049679246,0.00010495874412877,1671.26255704968,1.00289212248805,
    24.9853085965459,0.02063815493269,2198.85598448598,1.00027683051002,
    64.8492122289137,0.0192261420687394,2799.85127600686,1.00071123654953,
    0.140144946218822,0.000320010302131112,1860.81851649046,1.00209202390757,
    0.12811347166421,0.000663115574100773,1639.28943120517,1.00037088664524,
    0.0200589900501721,3.5975555236367e-05,1002.95302272389,1.0015353838189),
  ncol=4,
  dimnames = list(c("alpha1[1]", "alpha2[1]", "alpha3[1]", 
                    "a1[1]", "a2[1]", "a3[1]",
                    "mu2[1]", "mu3[1]", 
                    "lambda2[1]", "lambda3[1]", "c"),
                  c("mean", "se_mean", "n_eff", "Rhat")),
  byrow=TRUE)

rc_res[['pars_df']] <- tibble(variable = c("a1","a2","a3","alpha1","alpha2","alpha3","c","lambda2","lambda3","mu2","mu3"),
       median = c(0.004388728,0.058483556,0.019955846,0.751815037,0.189520468,0.196011541,0.020194227,
                  0.138963085,0.124622576,24.984743365,64.834417626),
       lower = c(4.158378e-04,4.851475e-02, 1.158713e-02, 6.763524e-02, 1.478663e-01, 1.130389e-01,
                 1.718623e-02, 1.157654e-01, 8.611379e-02, 2.308518e+01,6.291426e+01),
       upper = c(0.01106290,  0.06797684,  0.02883773,  2.25796406,  0.23777804,  0.35320899,  0.02185859,
                 0.16919224,  0.19132286, 26.91274982, 66.84958398))

rc_res[['fit_df']] <- tibble(ages = 0:80,
       data = fl_migrants/fl_pop,
       median = c(0.02445917, 0.02212957, 0.02123731, 0.02083668, 0.02063243, 0.02050311, 0.02042182, 
                  0.02038452, 0.02039545, 0.02048247, 0.02069491, 0.02113354, 0.02188968, 0.02309565, 
                  0.02488725, 0.02723605, 0.03001204, 0.03295246, 0.03584502, 0.03844924, 0.04053613, 
                  0.04205060, 0.04286610, 0.04304004, 0.04260980, 0.04172190, 0.04044603, 0.03891856, 
                  0.03727069, 0.03555740, 0.03387381, 0.03229206, 0.03078721, 0.02940696, 0.02814990, 
                  0.02702949, 0.02604895, 0.02518098, 0.02442590, 0.02378689, 0.02322727, 0.02275823, 
                  0.02236749, 0.02203849, 0.02176177, 0.02154706, 0.02139283, 0.02131295, 0.02129494, 
                  0.02137630, 0.02155494, 0.02186323, 0.02233507, 0.02302582, 0.02390995, 0.02484880, 
                  0.02578982, 0.02669928, 0.02746516, 0.02808168, 0.02848546, 0.02864050, 0.02857913, 
                  0.02831552, 0.02791748, 0.02741569, 0.02683604, 0.02623238, 0.02562902, 0.02503111, 
                  0.02447808, 0.02395072, 0.02346067, 0.02302026, 0.02263513, 0.02229433, 0.02200634, 
                  0.02176027, 0.02153727, 0.02135720, 0.02120329),
       lower = c(0.02063855, 0.02002963, 0.01939085, 0.01908285, 0.01891132, 0.01875816, 0.01866277, 
                 0.01863718, 0.01869357, 0.01888990, 0.01927156, 0.01980051, 0.02055228, 0.02144848, 
                 0.02274630, 0.02459381, 0.02706351, 0.03002805, 0.03311143, 0.03588292, 0.03798246, 
                 0.03936151, 0.04008736, 0.04026842, 0.03995621, 0.03921550, 0.03812486, 0.03676263, 
                 0.03525482, 0.03361536, 0.03193636, 0.03032246, 0.02890042, 0.02756915, 0.02644255, 
                 0.02541254, 0.02453467, 0.02377473, 0.02312531, 0.02255027, 0.02203500, 0.02157904, 
                 0.02119271, 0.02084455, 0.02056494, 0.02035391, 0.02020902, 0.02013073, 0.02013357, 
                 0.02015945, 0.02028732, 0.02046103, 0.02075511, 0.02113461, 0.02162741, 0.02227860, 
                 0.02317744, 0.02426672, 0.02524938, 0.02593666, 0.02630505, 0.02648453, 0.02645882, 
                 0.02626886, 0.02594994, 0.02549666, 0.02493988, 0.02434173, 0.02367642, 0.02310194, 
                 0.02260649, 0.02216076, 0.02179043, 0.02142710, 0.02110892, 0.02084890, 0.02060545, 
                 0.02039068, 0.02015061, 0.01993089, 0.01975857),
       upper = c(0.03112025, 0.02513024, 0.02345860, 0.02269542, 0.02235882, 0.02215090, 0.02204610, 
                 0.02199635, 0.02197468, 0.02199267, 0.02209632, 0.02249465, 0.02355002, 0.02522402, 
                 0.02740956, 0.02995684, 0.03269014, 0.03554723, 0.03836364, 0.04098622, 0.04329547, 
                 0.04494095, 0.04580337, 0.04591680, 0.04536656, 0.04421209, 0.04277925, 0.04113415, 
                 0.03937585, 0.03759591, 0.03588578, 0.03424492, 0.03272378, 0.03129877, 0.02997162, 
                 0.02875385, 0.02768118, 0.02671604, 0.02585786, 0.02511999, 0.02447777, 0.02395213, 
                 0.02352872, 0.02319783, 0.02291037, 0.02271109, 0.02258587, 0.02251641, 0.02257225, 
                 0.02280738, 0.02326907, 0.02389398, 0.02456863, 0.02534185, 0.02615691, 0.02704329, 
                 0.02793647, 0.02883292, 0.02971780, 0.03038933, 0.03089967, 0.03112968, 0.03105258, 
                 0.03067914, 0.03009405, 0.02951902, 0.02879010, 0.02812087, 0.02744805, 0.02686322, 
                 0.02629736, 0.02577599, 0.02529982, 0.02485982, 0.02441999, 0.02402877, 0.02364263, 
                 0.02330680, 0.02303527, 0.02278987, 0.02260402),
       diff_sq = c(8.846661e-08, 5.843961e-08, 1.025078e-08, 1.219354e-06, 3.058745e-08, 1.090453e-07, 
                   2.572242e-07, 2.187438e-07, 4.297242e-06, 2.659236e-06, 1.094279e-06, 3.263517e-07, 
                   2.847359e-07, 1.697953e-07, 2.679424e-06, 1.751785e-06, 4.531883e-07, 1.049723e-07, 
                   2.500966e-08, 1.170591e-06, 5.733003e-09, 1.228009e-06, 6.749902e-07, 2.252466e-07, 
                   9.086339e-08, 3.143570e-07, 5.610471e-08, 3.221258e-08, 4.280630e-06, 2.855579e-06, 
                   9.333020e-08, 2.179993e-07, 7.670842e-07, 3.446848e-08, 7.043755e-08, 1.031716e-07, 
                   8.037304e-07, 3.990348e-07, 2.102037e-08, 2.333985e-07, 8.139379e-08, 1.481629e-06, 
                   1.997268e-06, 1.796781e-06, 6.141552e-08, 1.297080e-06, 1.033592e-06, 6.620450e-09, 
                   2.596243e-06, 2.631823e-06, 3.963146e-06, 1.769188e-08, 4.493109e-07, 6.141955e-07, 
                   9.358831e-07, 1.345288e-06, 1.966191e-07, 8.958878e-08, 2.031352e-06, 7.616767e-08, 
                   1.309121e-06, 7.779241e-07, 1.017574e-06, 3.005237e-07, 5.738324e-08, 5.007556e-06, 
                   4.054851e-07, 8.320548e-06, 6.090081e-08, 2.105700e-06, 1.146669e-06, 1.080280e-06, 
                   2.275552e-06, 2.592394e-07, 5.774014e-07, 4.670081e-09, 3.316458e-07, 4.843763e-07, 
                   6.162222e-07, 5.334144e-06, 1.514953e-06))
```


```{r}
rc_res[['check_converge']]
```
In this example, the R-hat values are all close to 1 and effective sample sizes are sufficiently large. We can move on to interpreting the model's results. 

For more details and examples on how to deal with a non-convergent model, please see the `rcbayes` vignette *Achieving Model Convergence With mig_estimate_rc*.

##### Examining Model Results

After ensuring that the model has converged properly, you can interpret the results of the model. There are two objects in the function's output for this purpose. 

The `pars_df` object shows the median estimate and lower and upper bound of a 95% credible interval for the Rogers-Castro parameters. In this example, the working age peak was estimated to be at 24.9 years (95% CI: [23.1, 26.8]).

The `fit_df` object shows the data and estimated median $m(x)$ values at each age $x$, along with the lower and upper bound of the 95% credible interval of the fits, and the squared difference between data and the median estimate.

```{r}
rc_res[['pars_df']]
rc_res[['fit_df']]
```

We can plot the observed data and estimated fit using the `fit_df` object:
```{r}
rc_res[["fit_df"]] %>%
  ggplot(aes(ages, data)) +
  geom_point(aes(color = "data")) +
  geom_line(aes(x = ages, y = median, color = "fit")) +
  geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2) +
  scale_color_manual(name = "", values = c(data = "red", fit = "black")) +
  ylab("migration rate")

```

##### Comments about warnings

When using `mig_estimate_rc` it is not unusual to see warnings from Stan, particularly when the retirement and post-retirement families are included in the model. These may include warnings about divergent transitions, low effective sample size and maximum treedepth. If you see these warnings, you should take special care in determining whether your model converged properly. 

For more in-depth examples of dealing with warnings and convergence issues, please see the `rcbayes` vignette *Achieving Model Convergence With mig_estimate_rc*. 

## References