Expected value and uncertainty without full Monte Carlo simulations

Vasco Grilo🔸

One can often calculate the expected value and uncertainty of expressions involving distributions without running full Monte Carlo simulations. If for nothing else, the following results may be useful for Fermi estimates. In the next sections:

The input distributions , $X_{2}$ , ..., and $X_{N}$ are independent, as is often assumed in Monte carlo simulations.
$E$ and $V$ are the expected value and variance.

Uncertainty of the product between independent lognormal distributions

If $Y = X_{1} X_{2} \dots X_{N}$ , and $r_{i}$ is the ratio between the values of 2 quantiles of $X_{i}$ (e.g. $r_{i}$ = "95th percentile of $X_{i}$ "/"5th percentile of $X_{i}$ "), I think the ratio $R$ between the 2 same quantiles of $Y$ (e.g. $R$ = "95th percentile of $Y$ "/"5th percentile of $Y$ ") is $e^{((ln (r_{1})^{2} + \dots + ln (r_{N})^{2})^{0.5})}$ . For the particular case where all input distributions have the same uncertainty, $r_{i} = r$ , and therefore $R = r^{N^{0.5}}$ . This illustrates the point that performing point estimates with pessimistic and optimistic values overestimates uncertainty:

If the ratio between the 95th and 5th percentile of 3 independent lognormal distributions is $r$ = 100, the naive approach will suggest the product would have an uncertainty (ratio between 95th and 5th percentile) of 100^3 = 10^6.
However, the actual uncertainty of the product will be 100^(3^0.5) = 2.91*10^3, which is only 0.291 % of the above.

The naive approach would only make sense if the input distributions were perfectly (or very highly) correlated.

Sum of independent distributions

If $Y = X_{1} + X_{2} + \dots + X_{N}$ :

$E (Y) = E (X_{1}) + E (X_{2}) + \dots + E (X_{N}) .$
$V (Y) = V (X_{1}) + V (X_{2}) + \dots + V (X_{N}) .$

Product of independent distributions

If $Y = X_{1} X_{2} \dots X_{N}$ :

$E (Y) = E (X_{1}) E (X_{2}) \dots E (X_{N}) .$
$\begin{matrix} V (Y) & = E ((X_{1} X_{2} \dots X_{N})^{2}) - (E (X_{1} X_{2} \dots X_{N}))^{2} = E ({X_{1}}^{2}) E ({X_{2}}^{2}) \dots E ({X_{N}}^{2}) - E (X_{1})^{2} E (X_{2})^{2} \dots E (X_{N})^{2} = (V (X_{1}) + E (X_{1})^{2}) (V (X_{2}) + E (X_{2})^{2}) \dots (V (X_{N}) + E (X_{N})^{2}) - - E (X_{1})^{2} E (X_{2})^{2} \dots E (X_{N})^{2} \end{matrix} .$

Weighted sum of independent distributions

If $Y = w_{1} X_{1} + w_{2} X_{2} + \dots + w_{N} X_{N}$ , where $w_{i}$ are constants (which often add up to 1):

$E (Y) = w_{1} E (X_{1}) + w_{2} E (X_{2}) + \dots + w_{N} E (X_{N}) .$
$V (Y) = {w_{1}}^{2} V (X_{1}) + {w_{2}}^{2} V (X_{2}) + \dots + {w_{N}}^{2} V (X_{N}) .$

Other expressions

If $Y$ can be expressed as a linear function of $E (X_{i})$ and $V (X_{i})$ , one can calculate $E (Y)$ and $V (Y)$ applying the results of the 3 previous sections. For example for $Y = 0.75 X_{1} X_{2} + 0.25 X_{3} X_{4}$ :

$E (Y) = 0.75 E (X_{1}) E (X_{2}) + 0.25 E (X_{3}) E (X_{4}) .$
$\begin{matrix} V (Y) = & {0.75}^{2} ((V (X_{1}) + E (X_{1})^{2}) (V (X_{2}) + E (X_{2})^{2}) - E (X_{1})^{2} E (X_{2})^{2}) + {0.25}^{2} ((V (X_{3}) + E (X_{3})^{2}) (V (X_{4}) + E (X_{4})^{2}) - E (X_{3})^{2} E (X_{4})^{2}) . \end{matrix}$

Otherwise, it is probably better to run a full Monte Carlo simulation. That being said, one can also combine the results of the 3 previous sections with estimates obtained from Monte Carlo simulations which each only involves a single variable. To do this:

Write $E (Y)$ as a linear function of $E (f_{1} (X_{1}))$ , $E (f_{2} (X_{2}))$ , ..., and $E (f_{N} (X_{N}))$ . For example for $Y = \frac{1}{X_{1} {X_{2}}^{2} \dots {X_{N}}^{N}}$ :
1. $E (Y) = E (\frac{1}{X_{1}}) E (\frac{1}{{X_{2}}^{2}}) \dots E (\frac{1}{{X_{N}}^{N}}) .$
Write $V (Y)$ as a linear function of the above, $V (f_{1} (X_{1}))$ , $V (f_{2} (X_{2}))$ , ..., and $V (f_{N} (X_{N}))$ . For the example above:
1. $\begin{matrix} V (Y) & = (V (\frac{1}{X_{1}}) + E {(\frac{1}{X_{1}})}^{2}) ⎛ ⎝ V (\frac{1}{{X_{2}}^{2}}) + E {(\frac{1}{{X_{2}}^{2}})}^{2} ⎞ ⎠ \dots ⎛ ⎝ V (\frac{1}{{X_{N}}^{N}}) + E {(\frac{1}{{X_{N}}^{N}})}^{2} ⎞ ⎠ - - E {(\frac{1}{X_{1}})}^{2} E {(\frac{1}{{X_{2}}^{2}})}^{2} \dots E {(\frac{1}{{X_{N}}^{N}})}^{2} \end{matrix} .$
Generate random samples of $X_{i}$ (e.g. with Guesstimate or Squiggle), and then compute $E (f_{i} (X_{i}))$ and $V (f_{i} (X_{i}))$ . For the example above, $E (\frac{1}{X_{1}})$ , $E (\frac{1}{{X_{2}}^{2}})$ , ..., $E (\frac{1}{{X_{N}}^{N}})$ , $V (\frac{1}{X_{1}})$ , $V (\frac{1}{{X_{2}}^{2}})$ , ..., and $V (\frac{1}{{X_{N}}^{N}})$ .
Determine $E (Y)$ and $V (Y)$ using the expressions of steps 1 and 2 with the results obtained in step 3.

NunoSempereJan 5 20244

In case it's of interest, you can see some similar algebraic manipulations here: https://git.nunosempere.com/personal/squiggle.c/src/branch/master/squiggle_more.c#L165, as well as some explanations of how to get a normal from its 95% confidence interval here: https://git.nunosempere.com/personal/squiggle.c/src/branch/master/squiggle.c#L73.

Vasco Grilo🔸Jan 5 20244

Thanks for sharing, Nuño! Relatedly, I wrote about how to determine distribution parameters from quantiles.

TLDR: Feel free to download or make a copy of this Sheets to calculate the parameters of uniform, normal, loguniform, lognormal, pareto and logistic distributions (including the mean and median), based on the values of 2 quantiles.

Effective Altruism Forum
EA Forum