Correlating Variables in Your Model

To get started, we are going to recap what we did in your first model and build a correlated version of the same model and compare.

Building a Simple Uncorrelated Model

using Distributions, StatsBase, DataFrames, MCHammer
n_trials = 10000
Random.seed!(1)
Revenue = rand(TriangularDist(2500000,4000000,3000000), n_trials)
Expenses = rand(TriangularDist(1400000,3000000,2000000), n_trials)

# The Model
Profit = Revenue - Expenses

10000-element Vector{Float64}:
 712844.150927736
      1.1888840017148217e6
      1.1882988676456232e6
 999543.5263104322
      1.7701450715666742e6
 765153.752309157
      1.8804101529082211e6
 960226.0170223624
 805657.2333578216
 453283.3296520631
      ⋮
      1.4174186027324959e6
      1.158472383905143e6
      1.3887126297576292e6
 799290.3925696253
      1.1640814575272575e6
 268649.0616989727
 683292.7274327667
      1.0124091010737661e6
 566960.4683695761

#Trial Results : the Profit vector (OUTPUT)
Profit

10000-element Vector{Float64}:
 712844.150927736
      1.1888840017148217e6
      1.1882988676456232e6
 999543.5263104322
      1.7701450715666742e6
 765153.752309157
      1.8804101529082211e6
 960226.0170223624
 805657.2333578216
 453283.3296520631
      ⋮
      1.4174186027324959e6
      1.158472383905143e6
      1.3887126297576292e6
 799290.3925696253
      1.1640814575272575e6
 268649.0616989727
 683292.7274327667
      1.0124091010737661e6
 566960.4683695761

# Trials or Results Table (OUTPUT)
Trials = DataFrame(Revenue = Revenue, Expenses = Expenses, Profit = Profit)
first(Trials,20)

20×3 DataFrame

Row	Revenue	Expenses	Profit
	Float64	Float64	Float64
1	2.73457e6	2.02173e6	7.12844e5
2	3.012e6	1.82312e6	1.18888e6
3	3.32787e6	2.13957e6	1.1883e6
4	3.25327e6	2.25373e6	9.99544e5
5	3.64278e6	1.87263e6	1.77015e6
6	2.88027e6	2.11512e6	7.65154e5
7	3.41286e6	1.53245e6	1.88041e6
8	3.42622e6	2.466e6	960226.0
9	3.29672e6	2.49106e6	8.05657e5
10	2.85466e6	2.40138e6	4.53283e5
11	3.1979e6	2.59637e6	6.01531e5
12	3.09403e6	1.7547e6	1.33933e6
13	2.97618e6	2.14071e6	8.35462e5
14	2.53182e6	2.65497e6	-1.23147e5
15	3.19411e6	1.80816e6	1.38595e6
16	3.24099e6	2.2356e6	1.00539e6
17	2.88315e6	2.17548e6	7.07667e5
18	2.59668e6	1.81287e6	7.83806e5
19	2.98373e6	1.95759e6	1.02614e6
20	2.79343e6	2.70982e6	83605.4

# Uncorrelated simulation
cormat(Trials)

3×3 Matrix{Float64}:
 1.0        0.016887   0.670901
 0.016887   1.0       -0.704043
 0.670901  -0.704043   1.0

Applying correlation to your simulation

Using the corvar() function, we are going to correlate the Revenue and Expenses at -0.8 and generate the results tables for both the correlated and uncorrelated versions.

#Correlate the expenses and the revenue with a coefficient of 0.8
Rev_Exp_Cor = 0.8
cor_matrix = [1 Rev_Exp_Cor; Rev_Exp_Cor 1]

#Validate input correlation. You can also use cormat() to define the correlation
#matrix from historical data.
cor_matrix

2×2 Matrix{Float64}:
 1.0  0.8
 0.8  1.0

It is very important to join Trial into an array before applying correlation. Furthermore, this step is necessary in order to produce a sensitivity_chrt()

Input_Table = DataFrame(Revenue=Revenue, Expenses=Expenses)
Correl_Trials = corvar(Input_Table, n_trials, cor_matrix)
DataFrames.rename!(Correl_Trials, [:Revenue, :Expenses])

#Using the correlated inputs to calculate the correlated profit
Correl_Trials.Profit = Correl_Trials.Revenue - Correl_Trials.Expenses

10000-element Vector{Float64}:
      1.139970992142301e6
      1.2684247130040182e6
 632965.6741561713
 892207.5209226953
 826510.0086482479
      1.3156590236982156e6
 956000.6873403299
      1.288104428843657e6
      1.248803054942811e6
      1.2633881455276287e6
      ⋮
      1.0621471073426818e6
 584415.6467456981
 672170.724452565
      1.1635267342694816e6
 891140.2253855588
      1.1192550129263354e6
      1.0919996135719232e6
 924649.3086631398
 660878.4442639798

#Verify correlation is applied correctly
cormat(Correl_Trials)

3×3 Matrix{Float64}:
 1.0        0.787252   0.238415
 0.787252   1.0       -0.375821
 0.238415  -0.375821   1.0

Analyze the impact of correlation on your output

Input Correlation between Revenue and Expenses

Input Correlation:

cor(Revenue,Expenses)

0.014288469778278998

Input Correlation for the Correlated Model:

cor(Correl_Trials.Revenue, Correl_Trials.Expenses)

0.7956154732029773

You can query with the charting and stats functions using these Model Outputs: Trials, Correl_Trials, Profit, Correl_Trials.Profit

Correlated vs. Uncorrelated results in Julia

Let us compare the percentiles of an uncorrelated model vs. a correlated one.

Probability Analysis

compresults_df = DataFrame(uprofit = Profit, cprofit = Correl_Trials.Profit )

@df stack(compresults_df) density(:value, group=:variable, legend=:topright, title="Density Plot")

Using GetCertainty() we can do some simple probability accounting to assess the likelyhood of making 1m or less in profit :

GetCertainty(Profit, 1000000, 0)

0.4744

GetCertainty(Correl_Trials.Profit, 1000000, 0)

0.4248

By accounting for the correlation, we can see the probability of achieving our profit objective dropped by about 5%

fractiles() allows you to get the percentiles at various increments to be able to compare results along a continuum.

#Uncorrelated
fractiles(Profit)

11×2 Matrix{Any}:
 "P0.0"    -3.16091e5
 "P10.0"    4.44946e5
 "P20.0"    6.46387e5
 "P30.0"    7.85942e5
 "P40.0"    9.11707e5
 "P50.0"    1.03182e6
 "P60.0"    1.15261e6
 "P70.0"    1.27746e6
 "P80.0"    1.4255e6
 "P90.0"    1.62412e6
 "P100.0"   2.41851e6

#Correlated
fractiles(Correl_Trials.Profit)

11×2 Matrix{Any}:
 "P0.0"    2.72519e5
 "P10.0"   7.73021e5
 "P20.0"   8.66077e5
 "P30.0"   9.31235e5
 "P40.0"   9.87341e5
 "P50.0"   1.03794e6
 "P60.0"   1.08772e6
 "P70.0"   1.14034e6
 "P80.0"   1.20161e6
 "P90.0"   1.29062e6
 "P100.0"  1.7825e6

Comparing the impact of the inputs

Sensitivity of uncorrelated results:

sensitivity_chrt(Trials,3)

Sensitivity of correlated results

sensitivity_chrt(Correl_Trials,3)

A quick analysis of the results

Accounting for correlation meant a 5% (~42% vs. ~47%) reduction in probability of not making our goals.
The Worse Case goes from -290k to 230k, a 225% difference
The critical driver in both cases is expenses.