Correlating Variables in Your Model
To get started, we are going to recap what we did in your first model and build a correlated version of the same model and compare.
Building a Simple Uncorrelated Model
using Distributions, StatsBase, DataFrames, MCHammer
n_trials = 10000
Random.seed!(1)
Revenue = rand(TriangularDist(2500000,4000000,3000000), n_trials)
Expenses = rand(TriangularDist(1400000,3000000,2000000), n_trials)
# The Model
Profit = Revenue - Expenses10000-element Vector{Float64}:
712844.150927736
1.1888840017148217e6
1.1882988676456232e6
999543.5263104322
1.7701450715666742e6
765153.752309157
1.8804101529082211e6
960226.0170223624
805657.2333578216
453283.3296520631
⋮
1.4174186027324959e6
1.158472383905143e6
1.3887126297576292e6
799290.3925696253
1.1640814575272575e6
268649.0616989727
683292.7274327667
1.0124091010737661e6
566960.4683695761#Trial Results : the Profit vector (OUTPUT)
Profit10000-element Vector{Float64}:
712844.150927736
1.1888840017148217e6
1.1882988676456232e6
999543.5263104322
1.7701450715666742e6
765153.752309157
1.8804101529082211e6
960226.0170223624
805657.2333578216
453283.3296520631
⋮
1.4174186027324959e6
1.158472383905143e6
1.3887126297576292e6
799290.3925696253
1.1640814575272575e6
268649.0616989727
683292.7274327667
1.0124091010737661e6
566960.4683695761# Trials or Results Table (OUTPUT)
Trials = DataFrame(Revenue = Revenue, Expenses = Expenses, Profit = Profit)
first(Trials,20)| Row | Revenue | Expenses | Profit |
|---|---|---|---|
| Float64 | Float64 | Float64 | |
| 1 | 2.73457e6 | 2.02173e6 | 7.12844e5 |
| 2 | 3.012e6 | 1.82312e6 | 1.18888e6 |
| 3 | 3.32787e6 | 2.13957e6 | 1.1883e6 |
| 4 | 3.25327e6 | 2.25373e6 | 9.99544e5 |
| 5 | 3.64278e6 | 1.87263e6 | 1.77015e6 |
| 6 | 2.88027e6 | 2.11512e6 | 7.65154e5 |
| 7 | 3.41286e6 | 1.53245e6 | 1.88041e6 |
| 8 | 3.42622e6 | 2.466e6 | 960226.0 |
| 9 | 3.29672e6 | 2.49106e6 | 8.05657e5 |
| 10 | 2.85466e6 | 2.40138e6 | 4.53283e5 |
| 11 | 3.1979e6 | 2.59637e6 | 6.01531e5 |
| 12 | 3.09403e6 | 1.7547e6 | 1.33933e6 |
| 13 | 2.97618e6 | 2.14071e6 | 8.35462e5 |
| 14 | 2.53182e6 | 2.65497e6 | -1.23147e5 |
| 15 | 3.19411e6 | 1.80816e6 | 1.38595e6 |
| 16 | 3.24099e6 | 2.2356e6 | 1.00539e6 |
| 17 | 2.88315e6 | 2.17548e6 | 7.07667e5 |
| 18 | 2.59668e6 | 1.81287e6 | 7.83806e5 |
| 19 | 2.98373e6 | 1.95759e6 | 1.02614e6 |
| 20 | 2.79343e6 | 2.70982e6 | 83605.4 |
# Uncorrelated simulation
cormat(Trials)3×3 Matrix{Float64}:
1.0 0.016887 0.670901
0.016887 1.0 -0.704043
0.670901 -0.704043 1.0Applying correlation to your simulation
Using the corvar() function, we are going to correlate the Revenue and Expenses at -0.8 and generate the results tables for both the correlated and uncorrelated versions.
#Correlate the expenses and the revenue with a coefficient of 0.8
Rev_Exp_Cor = 0.8
cor_matrix = [1 Rev_Exp_Cor; Rev_Exp_Cor 1]
#Validate input correlation. You can also use cormat() to define the correlation
#matrix from historical data.
cor_matrix2×2 Matrix{Float64}:
1.0 0.8
0.8 1.0It is very important to join Trial into an array before applying correlation. Furthermore, this step is necessary in order to produce a sensitivity_chrt()
Input_Table = DataFrame(Revenue=Revenue, Expenses=Expenses)
Correl_Trials = corvar(Input_Table, n_trials, cor_matrix)
DataFrames.rename!(Correl_Trials, [:Revenue, :Expenses])
#Using the correlated inputs to calculate the correlated profit
Correl_Trials.Profit = Correl_Trials.Revenue - Correl_Trials.Expenses10000-element Vector{Float64}:
1.139970992142301e6
1.2684247130040182e6
632965.6741561713
892207.5209226953
826510.0086482479
1.3156590236982156e6
956000.6873403299
1.288104428843657e6
1.248803054942811e6
1.2633881455276287e6
⋮
1.0621471073426818e6
584415.6467456981
672170.724452565
1.1635267342694816e6
891140.2253855588
1.1192550129263354e6
1.0919996135719232e6
924649.3086631398
660878.4442639798#Verify correlation is applied correctly
cormat(Correl_Trials)3×3 Matrix{Float64}:
1.0 0.787252 0.238415
0.787252 1.0 -0.375821
0.238415 -0.375821 1.0Analyze the impact of correlation on your output
Input Correlation between Revenue and Expenses
Input Correlation:
cor(Revenue,Expenses)0.014288469778278998Input Correlation for the Correlated Model:
cor(Correl_Trials.Revenue, Correl_Trials.Expenses)0.7956154732029773You can query with the charting and stats functions using these Model Outputs: Trials, Correl_Trials, Profit, Correl_Trials.Profit
Correlated vs. Uncorrelated results in Julia
Let us compare the percentiles of an uncorrelated model vs. a correlated one.
Probability Analysis
compresults_df = DataFrame(uprofit = Profit, cprofit = Correl_Trials.Profit )
@df stack(compresults_df) density(:value, group=:variable, legend=:topright, title="Density Plot")Using GetCertainty() we can do some simple probability accounting to assess the likelyhood of making 1m or less in profit :
GetCertainty(Profit, 1000000, 0)0.4744GetCertainty(Correl_Trials.Profit, 1000000, 0)0.4248By accounting for the correlation, we can see the probability of achieving our profit objective dropped by about 5%
fractiles() allows you to get the percentiles at various increments to be able to compare results along a continuum.
#Uncorrelated
fractiles(Profit)11×2 Matrix{Any}:
"P0.0" -3.16091e5
"P10.0" 4.44946e5
"P20.0" 6.46387e5
"P30.0" 7.85942e5
"P40.0" 9.11707e5
"P50.0" 1.03182e6
"P60.0" 1.15261e6
"P70.0" 1.27746e6
"P80.0" 1.4255e6
"P90.0" 1.62412e6
"P100.0" 2.41851e6#Correlated
fractiles(Correl_Trials.Profit)11×2 Matrix{Any}:
"P0.0" 2.72519e5
"P10.0" 7.73021e5
"P20.0" 8.66077e5
"P30.0" 9.31235e5
"P40.0" 9.87341e5
"P50.0" 1.03794e6
"P60.0" 1.08772e6
"P70.0" 1.14034e6
"P80.0" 1.20161e6
"P90.0" 1.29062e6
"P100.0" 1.7825e6Comparing the impact of the inputs
Sensitivity of uncorrelated results:
sensitivity_chrt(Trials,3)Sensitivity of correlated results
sensitivity_chrt(Correl_Trials,3)A quick analysis of the results
Accounting for correlation meant a 5% (~42% vs. ~47%) reduction in probability of not making our goals.
The Worse Case goes from -290k to 230k, a 225% difference
The critical driver in both cases is expenses.