Correlating Variables in Your Model

To get started, we are going to recap what we did in your first model and build a correlated version of the same model and compare.

Building a Simple Uncorrelated Model

using Distributions, StatsBase, DataFrames, MCHammer
n_trials = 10000
Random.seed!(1)
Revenue = rand(TriangularDist(2500000,4000000,3000000), n_trials)
Expenses = rand(TriangularDist(1400000,3000000,2000000), n_trials)

# The Model
Profit = Revenue - Expenses
10000-element Vector{Float64}:
 712844.150927736
      1.1888840017148217e6
      1.1882988676456232e6
 999543.5263104322
      1.7701450715666742e6
 765153.752309157
      1.8804101529082211e6
 960226.0170223624
 805657.2333578216
 453283.3296520631
      ⋮
      1.4174186027324959e6
      1.158472383905143e6
      1.3887126297576292e6
 799290.3925696253
      1.1640814575272575e6
 268649.0616989727
 683292.7274327667
      1.0124091010737661e6
 566960.4683695761
#Trial Results : the Profit vector (OUTPUT)
Profit
10000-element Vector{Float64}:
 712844.150927736
      1.1888840017148217e6
      1.1882988676456232e6
 999543.5263104322
      1.7701450715666742e6
 765153.752309157
      1.8804101529082211e6
 960226.0170223624
 805657.2333578216
 453283.3296520631
      ⋮
      1.4174186027324959e6
      1.158472383905143e6
      1.3887126297576292e6
 799290.3925696253
      1.1640814575272575e6
 268649.0616989727
 683292.7274327667
      1.0124091010737661e6
 566960.4683695761
# Trials or Results Table (OUTPUT)
Trials = DataFrame(Revenue = Revenue, Expenses = Expenses, Profit = Profit)
first(Trials,20)
20×3 DataFrame
RowRevenueExpensesProfit
Float64Float64Float64
12.73457e62.02173e67.12844e5
23.012e61.82312e61.18888e6
33.32787e62.13957e61.1883e6
43.25327e62.25373e69.99544e5
53.64278e61.87263e61.77015e6
62.88027e62.11512e67.65154e5
73.41286e61.53245e61.88041e6
83.42622e62.466e6960226.0
93.29672e62.49106e68.05657e5
102.85466e62.40138e64.53283e5
113.1979e62.59637e66.01531e5
123.09403e61.7547e61.33933e6
132.97618e62.14071e68.35462e5
142.53182e62.65497e6-1.23147e5
153.19411e61.80816e61.38595e6
163.24099e62.2356e61.00539e6
172.88315e62.17548e67.07667e5
182.59668e61.81287e67.83806e5
192.98373e61.95759e61.02614e6
202.79343e62.70982e683605.4
# Uncorrelated simulation
cormat(Trials)
3×3 Matrix{Float64}:
 1.0        0.016887   0.670901
 0.016887   1.0       -0.704043
 0.670901  -0.704043   1.0

Applying correlation to your simulation

Using the corvar() function, we are going to correlate the Revenue and Expenses at -0.8 and generate the results tables for both the correlated and uncorrelated versions.

#Correlate the expenses and the revenue with a coefficient of 0.8
Rev_Exp_Cor = 0.8
cor_matrix = [1 Rev_Exp_Cor; Rev_Exp_Cor 1]

#Validate input correlation. You can also use cormat() to define the correlation
#matrix from historical data.
cor_matrix
2×2 Matrix{Float64}:
 1.0  0.8
 0.8  1.0

It is very important to join Trial into an array before applying correlation. Furthermore, this step is necessary in order to produce a sensitivity_chrt()

Input_Table = DataFrame(Revenue=Revenue, Expenses=Expenses)
Correl_Trials = corvar(Input_Table, n_trials, cor_matrix)
DataFrames.rename!(Correl_Trials, [:Revenue, :Expenses])

#Using the correlated inputs to calculate the correlated profit
Correl_Trials.Profit = Correl_Trials.Revenue - Correl_Trials.Expenses
10000-element Vector{Float64}:
      1.139970992142301e6
      1.2684247130040182e6
 632965.6741561713
 892207.5209226953
 826510.0086482479
      1.3156590236982156e6
 956000.6873403299
      1.288104428843657e6
      1.248803054942811e6
      1.2633881455276287e6
      ⋮
      1.0621471073426818e6
 584415.6467456981
 672170.724452565
      1.1635267342694816e6
 891140.2253855588
      1.1192550129263354e6
      1.0919996135719232e6
 924649.3086631398
 660878.4442639798
#Verify correlation is applied correctly
cormat(Correl_Trials)
3×3 Matrix{Float64}:
 1.0        0.787252   0.238415
 0.787252   1.0       -0.375821
 0.238415  -0.375821   1.0

Analyze the impact of correlation on your output

Input Correlation between Revenue and Expenses

Input Correlation:

cor(Revenue,Expenses)
0.014288469778278998

Input Correlation for the Correlated Model:

cor(Correl_Trials.Revenue, Correl_Trials.Expenses)
0.7956154732029773

You can query with the charting and stats functions using these Model Outputs: Trials, Correl_Trials, Profit, Correl_Trials.Profit

Correlated vs. Uncorrelated results in Julia

Let us compare the percentiles of an uncorrelated model vs. a correlated one.

Probability Analysis

compresults_df = DataFrame(uprofit = Profit, cprofit = Correl_Trials.Profit )

@df stack(compresults_df) density(:value, group=:variable, legend=:topright, title="Density Plot")

Using GetCertainty() we can do some simple probability accounting to assess the likelyhood of making 1m or less in profit :

GetCertainty(Profit, 1000000, 0)
0.4744
GetCertainty(Correl_Trials.Profit, 1000000, 0)
0.4248

By accounting for the correlation, we can see the probability of achieving our profit objective dropped by about 5%

fractiles() allows you to get the percentiles at various increments to be able to compare results along a continuum.

#Uncorrelated
fractiles(Profit)
11×2 Matrix{Any}:
 "P0.0"    -3.16091e5
 "P10.0"    4.44946e5
 "P20.0"    6.46387e5
 "P30.0"    7.85942e5
 "P40.0"    9.11707e5
 "P50.0"    1.03182e6
 "P60.0"    1.15261e6
 "P70.0"    1.27746e6
 "P80.0"    1.4255e6
 "P90.0"    1.62412e6
 "P100.0"   2.41851e6
#Correlated
fractiles(Correl_Trials.Profit)
11×2 Matrix{Any}:
 "P0.0"    2.72519e5
 "P10.0"   7.73021e5
 "P20.0"   8.66077e5
 "P30.0"   9.31235e5
 "P40.0"   9.87341e5
 "P50.0"   1.03794e6
 "P60.0"   1.08772e6
 "P70.0"   1.14034e6
 "P80.0"   1.20161e6
 "P90.0"   1.29062e6
 "P100.0"  1.7825e6

Comparing the impact of the inputs

Sensitivity of uncorrelated results:

sensitivity_chrt(Trials,3)

Sensitivity of correlated results

sensitivity_chrt(Correl_Trials,3)

A quick analysis of the results

  1. Accounting for correlation meant a 5% (~42% vs. ~47%) reduction in probability of not making our goals.

  2. The Worse Case goes from -290k to 230k, a 225% difference

  3. The critical driver in both cases is expenses.