Correlating Variables in Your Model
To get started, we are going to recap what we did in your first model and build a correlated version of the same model and compare.
Building a Simple Uncorrelated Model
using Distributions, StatsBase, DataFrames, MCHammer
n_trials = 10000
Random.seed!(1)
Revenue = rand(TriangularDist(2500000,4000000,3000000), n_trials)
Expenses = rand(TriangularDist(1400000,3000000,2000000), n_trials)
# The Model
Profit = Revenue - Expenses
10000-element Vector{Float64}:
712844.150927736
1.1888840017148217e6
1.1882988676456232e6
999543.5263104322
1.7701450715666742e6
765153.752309157
1.8804101529082211e6
960226.0170223624
805657.2333578216
453283.3296520631
⋮
1.4174186027324959e6
1.158472383905143e6
1.3887126297576292e6
799290.3925696253
1.1640814575272575e6
268649.0616989727
683292.7274327667
1.0124091010737661e6
566960.4683695761
#Trial Results : the Profit vector (OUTPUT)
Profit
10000-element Vector{Float64}:
712844.150927736
1.1888840017148217e6
1.1882988676456232e6
999543.5263104322
1.7701450715666742e6
765153.752309157
1.8804101529082211e6
960226.0170223624
805657.2333578216
453283.3296520631
⋮
1.4174186027324959e6
1.158472383905143e6
1.3887126297576292e6
799290.3925696253
1.1640814575272575e6
268649.0616989727
683292.7274327667
1.0124091010737661e6
566960.4683695761
# Trials or Results Table (OUTPUT)
Trials = DataFrame(Revenue = Revenue, Expenses = Expenses, Profit = Profit)
first(Trials,20)
Row | Revenue | Expenses | Profit |
---|---|---|---|
Float64 | Float64 | Float64 | |
1 | 2.73457e6 | 2.02173e6 | 7.12844e5 |
2 | 3.012e6 | 1.82312e6 | 1.18888e6 |
3 | 3.32787e6 | 2.13957e6 | 1.1883e6 |
4 | 3.25327e6 | 2.25373e6 | 9.99544e5 |
5 | 3.64278e6 | 1.87263e6 | 1.77015e6 |
6 | 2.88027e6 | 2.11512e6 | 7.65154e5 |
7 | 3.41286e6 | 1.53245e6 | 1.88041e6 |
8 | 3.42622e6 | 2.466e6 | 960226.0 |
9 | 3.29672e6 | 2.49106e6 | 8.05657e5 |
10 | 2.85466e6 | 2.40138e6 | 4.53283e5 |
11 | 3.1979e6 | 2.59637e6 | 6.01531e5 |
12 | 3.09403e6 | 1.7547e6 | 1.33933e6 |
13 | 2.97618e6 | 2.14071e6 | 8.35462e5 |
14 | 2.53182e6 | 2.65497e6 | -1.23147e5 |
15 | 3.19411e6 | 1.80816e6 | 1.38595e6 |
16 | 3.24099e6 | 2.2356e6 | 1.00539e6 |
17 | 2.88315e6 | 2.17548e6 | 7.07667e5 |
18 | 2.59668e6 | 1.81287e6 | 7.83806e5 |
19 | 2.98373e6 | 1.95759e6 | 1.02614e6 |
20 | 2.79343e6 | 2.70982e6 | 83605.4 |
# Uncorrelated simulation
cormat(Trials)
3×3 Matrix{Float64}:
1.0 0.016887 0.670901
0.016887 1.0 -0.704043
0.670901 -0.704043 1.0
Applying correlation to your simulation
Using the corvar()
function, we are going to correlate the Revenue and Expenses at -0.8 and generate the results tables for both the correlated and uncorrelated versions.
#Correlate the expenses and the revenue with a coefficient of 0.8
Rev_Exp_Cor = 0.8
cor_matrix = [1 Rev_Exp_Cor; Rev_Exp_Cor 1]
#Validate input correlation. You can also use cormat() to define the correlation
#matrix from historical data.
cor_matrix
2×2 Matrix{Float64}:
1.0 0.8
0.8 1.0
It is very important to join Trial into an array before applying correlation. Furthermore, this step is necessary in order to produce a sensitivity_chrt()
Input_Table = DataFrame(Revenue=Revenue, Expenses=Expenses)
Correl_Trials = corvar(Input_Table, n_trials, cor_matrix)
DataFrames.rename!(Correl_Trials, [:Revenue, :Expenses])
#Using the correlated inputs to calculate the correlated profit
Correl_Trials.Profit = Correl_Trials.Revenue - Correl_Trials.Expenses
10000-element Vector{Float64}:
1.139970992142301e6
1.2684247130040182e6
632965.6741561713
892207.5209226953
826510.0086482479
1.3156590236982156e6
956000.6873403299
1.288104428843657e6
1.248803054942811e6
1.2633881455276287e6
⋮
1.0621471073426818e6
584415.6467456981
672170.724452565
1.1635267342694816e6
891140.2253855588
1.1192550129263354e6
1.0919996135719232e6
924649.3086631398
660878.4442639798
#Verify correlation is applied correctly
cormat(Correl_Trials)
3×3 Matrix{Float64}:
1.0 0.787252 0.238415
0.787252 1.0 -0.375821
0.238415 -0.375821 1.0
Analyze the impact of correlation on your output
Input Correlation between Revenue and Expenses
Input Correlation:
cor(Revenue,Expenses)
0.014288469778278998
Input Correlation for the Correlated Model:
cor(Correl_Trials.Revenue, Correl_Trials.Expenses)
0.7956154732029773
You can query with the charting and stats functions using these Model Outputs: Trials
, Correl_Trials
, Profit
, Correl_Trials.Profit
Correlated vs. Uncorrelated results in Julia
Let us compare the percentiles of an uncorrelated model vs. a correlated one.
Probability Analysis
compresults_df = DataFrame(uprofit = Profit, cprofit = Correl_Trials.Profit )
@df stack(compresults_df) density(:value, group=:variable, legend=:topright, title="Density Plot")
Using GetCertainty()
we can do some simple probability accounting to assess the likelyhood of making 1m or less in profit :
GetCertainty(Profit, 1000000, 0)
0.4744
GetCertainty(Correl_Trials.Profit, 1000000, 0)
0.4248
By accounting for the correlation, we can see the probability of achieving our profit objective dropped by about 5%
fractiles()
allows you to get the percentiles at various increments to be able to compare results along a continuum.
#Uncorrelated
fractiles(Profit)
11×2 Matrix{Any}:
"P0.0" -3.16091e5
"P10.0" 4.44946e5
"P20.0" 6.46387e5
"P30.0" 7.85942e5
"P40.0" 9.11707e5
"P50.0" 1.03182e6
"P60.0" 1.15261e6
"P70.0" 1.27746e6
"P80.0" 1.4255e6
"P90.0" 1.62412e6
"P100.0" 2.41851e6
#Correlated
fractiles(Correl_Trials.Profit)
11×2 Matrix{Any}:
"P0.0" 2.72519e5
"P10.0" 7.73021e5
"P20.0" 8.66077e5
"P30.0" 9.31235e5
"P40.0" 9.87341e5
"P50.0" 1.03794e6
"P60.0" 1.08772e6
"P70.0" 1.14034e6
"P80.0" 1.20161e6
"P90.0" 1.29062e6
"P100.0" 1.7825e6
Comparing the impact of the inputs
Sensitivity of uncorrelated results:
sensitivity_chrt(Trials,3)
Sensitivity of correlated results
sensitivity_chrt(Correl_Trials,3)
A quick analysis of the results
Accounting for correlation meant a 5% (~42% vs. ~47%) reduction in probability of not making our goals.
The Worse Case goes from -290k to 230k, a 225% difference
The critical driver in both cases is expenses.