Qunatitative Risk Management

On “The Logic of Risk” by P. Cirillo

based on The Logic of Risk from Pasqualle Cirillo YouTube

check:

Conditional Gini (instead of moments for the tail)
unevenly spaced time series

Lecture 1

Do not separate probabilities from the payoff.
- Systematic and idisyncratic risks. Risks we can and we cannot diversify.
- Operative point of view : external and internal probabilities
- Payoff is awlays agent dependent (perception based). Risk is therefore viewed as subjective, not objective.
Modular risk -> when our module is wrong.
Extremse do cluster, hence they are not i.i.d., hence be careful in modelling.
- Studying dependency becomes pivotal, as depencey usually fattens the tail.

Risk Characteristics

Depth - knowledge of the phenomenon under scrutiny (under study). .. + quality fo the data we have, are they enough, modeling choices.
Height - single asset, group of assests, the whole portfolio. Risk aggregtion and disaggregation. What we do at portfolio or asset is different. (Forest vs tree :-) )
Length - range of variation of phenomenon we are considering. Is an outlier and extreme event, or it was an outlier because of our model
Time - it may also bring ruin.
- Delay bias, delay paradox (enough information is never enough => decision under uncertainty)
- Not all risks can be quantified (Knightian, Shackle (unknowledge), ambiguity, …)

Fallacies

Size fallacy (cannot compare risk of pandemic with the risk of being eaited by a shark)
- you cannot compare risk that are one time with repetitive once.
- some risks are additive, others are multiplicative
Ostrich’s Fallacy
Absence of proof is not proof of absence
Delay fallacy - postponing until enough data is there
Turkey’s Fallacy
technocratic fallacy - too much math, technical, model (your grandmather is also right)
Consensus fallacy and infallibility fallacy.

Lecture 2 - Risk Measure

axiomatic approach.
identification, assessment (quantify, give tags!!), prioritization (ranking), hedge
T (capital t) in fin math means maturity, but here is a final time considered.
check also distortion function, pdfs, risk measures from financial math course.
The risk measure is meant to determine the amount of capital or assets to be kept in reserve. The aim of a reserve is to guarantee the presence of suffucuent capital that can be used as a partioal cover if the risky ecenti manifests itself, generating a loss.
!!! Loss is positive, profit is negative in the course! (profis is a negative loss)
Value At Risk (a quantile) and Expected Shortfall (conditional expectation => expected value of all loses above the VaR).
- VaR is the loss value for which the probablity of observing a large loss, given the available info, is eqaul to $1-\alpha$ (where $\alpha$ is the confidence level of 95%).
- VaR is not coherent, ES is always coherent
- A risk measure is said to be coherent if if satisifies:
  - monotonicity, translation (cash) invaraince, pos. homogeneity, sub-additivity

L3 - EVT

use max() for extreme loss and min() for extreme gain… just switch the signs.
extremes and rare events are approached differently
- otlier is different from extreme (e.g. outlier -> measuring human height and there is a 12 meters tall man)
- extreme may be context dependent - 2.25 meters can be extreme in total population but not in NBA population
An extreme is an event that occurs within the range if variation of the phenomena under scrunity, but is characterised by unusual magnitude and/or rarity/temporal sparseness.
Outlier detection: $[Q1-k(Q3-Q1),Q3+k(Q3-Q1)\, k>0]$ (under thin tails only); chi-sqaure and distance measures; Thompson Tau; etc.
Remove outliers, but do not correct for extrmes. Either analyze them with the rest of the data, or just focus on them.
In EVT the role of the Normal and Stable distirbutions of classical statistics is played by GEV and GPD.

\[GEV(x;\mu,\sigma,\xi) = \begin{cases} exp(-(1+\xi\frac{x-\mu}{\sigma})^{-1/\sigma}) & \quad \text{if } \xi \neq 0\\ exp(- exp(-x)) & \quad \text{if } \xi = 0 \end{cases}\] \[GPD(x;\mu,\sigma,\xi) = \begin{cases} 1-(1+\xi\frac{x-\mu}{\sigma})^{-1/\sigma} & \quad \text{if } \xi \neq 0\\ 1-exp(- \frac{x-\mu}{\sigma}) & \quad \text{if } \xi = 0 \end{cases}\]

Lesson 4:

extreme event can be inferred, we have enough knowedge to model the event.
In balck swan we now nothing about the event. Before it manifests we cannot even think of it. (for the turkey Thanksgiving is a black swan. For the farmer - not necesseraly :-) )… black swan cannot be modeled or forecast
Limiting distributions:
- GEV (excess distribution) + block maxima + Fisher-Tippett
  - the limit distribution for the $\color{green}{\text{normalized maxima}}$
- GPD (generalized pareto) + pics over threashold + Pickands-Balkema-de Haan
  - the limit distirbution of $\color{red}{\text{scaled excesses over high threshold}}$
- block maxima - define time-blocks and get the max for each block. But no silver bullet for defining the size of the block. Given this maxima are normalized, then the limiting ditribution is GEV
- POT - poimts over the threshold -> define excesses, all observation above the threshold.
- GPD is preferred due to fitting purposes!
The 3 extreme value classes of distributions (or sorts of domain of attraction of the max MDA): Frechet, Weibull, Gumbell - distributions, each corresponding to a specific behavior of the maxima:
- Frechet $\Phi$: fat tail behavior, risky and infinity upper bound (e.g.: pareto, log-gamma, debour, and all other distro that in the tail decay as a power law, that are fat tailed).
- Weibull $\Psi$: more or less risky, but are all characterized by a finite upper bound (e.g.: Uniform, Beta - because they have upper bound)
- Gumbell $\Lambda$: thin tailed, decay quite quickly in the tail, not able to generate extremely risky phenomena, in most of the cases (e.g.: normal, exponential). Exception is the log-nomral distribution, which, depending on the sigma parameter, can be very wild animal :-) … in data is is occassionally difficult to distinguish between log-normal and pareto.
$\alpha$ is the tail distribution, and the tail (shape) parameter $\xi$ is $1/\alpha$. It is the same for GEV and GPD
Fisher-Tippett with GEV
- $\xi = \alpha^{-1} > 0 \leftrightarrow \Phi$
- $\xi = -\alpha^{-1} < 0 \leftrightarrow \Psi$
- $\xi = 0 \leftrightarrow \Lambda$
- Although these distributions are very different, from mathematical point of view, it can be defined that: $X \sim \Phi_{\alpha} \leftrightarrow logX^{\alpha} \sim \Lambda \leftrightarrow -X^{-1} \sim \Psi_{\alpha}$
- Maximum domain of attraction
- TODO: $\bar F(x)$ is the tail of $F$ (e.g a survival function = 1-F)
Pickands-Balkema-de Haan
- Excess distribution
  - excesses can be modeled with GPD
  - the number of exceedances follow a homogeneous Poisson procsess
  - the distro of the max of a poisson number of i.i.d. excesses is GEV
Bestiary of Tails
- Every tail has one specific requirement for the random variable we are considering.
- A benchmark will be the exponential distribution
  - Many thechnical reasons, leading to the behavior of the moment generating function.
  - In the case of GEV (Generalized Extreme Value Distribution) the exponential is the tail that is representative of the Gumbell class.
  - In the GPD (Generalized Pareto Distribution) the exponential case is the case that we get for $\xi=0$
- We consider the survival function of exponential function.
- Heavy tail (slower than exponential) : if its survival function decays slowlier than the survival function of an exponential function. $\lim \sup_{x \rightarrow \infty} \frac{\bar{F}(x)}{\bar{F}exp(x)} = \frac{\bar{F}(x)}{e^{-\lambda x}} = \infty , \quad \forall \lambda > 0$ $\lim \sup_{x \rightarrow \infty} \frac{\bar{F}(x)}{e^{-\lambda x}} = \bar{F}(x)e^{\lambda x} = \infty , \quad \forall \lambda > 0$
  - With heavy tails the random variable (e.g. Moment generating function (we can derive the PDF from it)) being always infinity for every lambda larger than zero, is useless. So for Heavy tails we need other instruments.
  - The most dificult to deal with.
- Long tail (sooner or later a new record) (is a Heavy Tail, but noy necessarily the reverse). One more condition which is expoloision principle: If very large event happens, the probability of even larger even approaches 1. $\lim_{x \rightarrow \infty} P(X > x+t |X>x) = 1, \quad \forall t > 0$
- Subexponential tails (one big shot is all it takes) (is a Long tail, but not necessarily a long tail). One more principle - one shot (or catastrophy) principle, or the winner takes it all.
  - For a collection of random varaibles let’s define a partial sum $S_n$ and a partial maximum $M_n$. Then the Principle stays that the probability that a $S_n$ will be larger than a particular large value $x$ is approximatelly the same as the probability that $M_n$ will be larger than that value $x$. So if the random varaible we are observing is subexponentioal at a certain point the behavior of the partial sum will be dominated by one big value, so by the partial sum $S_n$ $P(S_n > n) \approx P(M_n>x), \quad \text{as } x \rightarrow \infty$
  - For example, considering the losses in a portfolio the total loss will be dominated by one big loss
  - Many, very useful heuristics and grapic tools… TBD
- Fat Tails (is a subexponential) - a fat tail random varaible is a random varaible belong to the mximum domain of attraction of Frechet distribution, i.e a random varaible whose survival function is regularly varyning and can be expressed as the product of two components: a power law decay and a slowly varying function, which is in fact vanishing and leaves the power law only. $\bar{F}(x) = P(X \geq x) = x^{-a}L(x)$ Asymprotically scale invariant and the most interesting for modelling extreme and rare events.
- Pareto (is a fat tailed) (in the maximum domain of attarction of Frechet)
  - might not have all moment finite
- Lognormal (is a subexponential, but is not fat tailed) (in the maximum domain of attarction of Gambel, together with the Normal and Exponential distributions, which are light tailed)
  - can generate very strange behaviors
  - can be difficult to distinguish between a Pareto and a Lognormal
  - all moments always exist (i.e. all moments are finite)
- Useful tools:
  - QQ Plot typically compaeing data (X-axis) with exponential distribution like Gumbel (Y-axis), to check the hypothesis of exponentiality. Quantile-Quantile plot
  - Zipf (loglog) plot of the empirical survival function of data/ Useful for fat tailed diistributions, it gives necessary but not sufficient information about tails. $\bar{F}(x)=\left(\frac{x}{x_0}\right)^{-\alpha}, \quad0 < x_0 < x$ $\log(\bar{F}(x)) = \alpha \log(x_0) - \alpha \log(x)$
  - where $\alpha \log(x_0)$ is a constant as $x_0$ is a constant. So X-axis is ordered $\log(x)$ and Y-axis is the log of survival function $\log [1-\bar{F}(x)]$
  - Rule of thumb for distinguishing between Power Law and Lognormal:
    1. See the range of the X-axis. if it is small (e.g. btwn 0.1 and 100, if the max is 10000+), it is most likely to b lognormal.
    2. use data any type aggregation (e.g. order by size and sum smallest qith largest, 2nd smallest and 2nd largest, etc… or order by size and sum 1st and 2nd, 3rd and 4th and so on..) aggregation1 and aggregation2 can be of different types.
  - Mean excess function (very useful in EVT) with random variable X and pdf F is defined as: $e(v) = E[ X-v|X>v ] = \frac{\int_v^\infty (t-v)\mathrm{d}F(t)}{\int_v^\infty\mathrm{d}F(t)}, \quad 0 < v < x_F$
  - see van der Wijk’s law
  - Empirical MEF - MEPLOT $e_n(v)=\frac{\sum^n_{i=1}(X_i-v)}{\sum^n_{i=1}1(X_i>v)}$
    - look for upward linear trend
    - order by size, compute, remove first, compute, remove next, compute…. Generally use at least 10000 items to distinguish between Zipf and lognormal: $e_u^{LogN}(X)=\frac{\sigma^2 u}{log(u)-\mu}(1+ o(1))$ $e_u^{Pareto}(X)=\frac{u}{\alpha - 1} \quad \alpha > 1$

Lesson 5:

The tail parameter ($\alpha$ or $\xi$) value is the upper bound of meaninful theoretical moments: if $\alpha = 2.3$ then up to $2^{nd}$ moment are meaningful; if $\alpha < 1$ even the mean is infinite and useless. Put another way, the moment of order $k$ exists iff $\alpha > k$.
Owing to pre-asymptotic properties, a conservative heuristic is to consider i/i/d with $\alpha \leq 2.5$ as not forecastable in practice.
When mean is finite, but varaince is infinite, it is difficult to build confidence intervals for the mean. When varaince is finite, but 4th moment is infinte, it is difficult to build confidence intervals for the variance.
For Pareto some moments can be infinite, for lognormal all moments exisits.
! Maximum to Sum plot (to define moments): with partial max and partial sum on power of $p$, where $p$ is the moment.
- calculated for 1, 2, 3, …, n obsrvations
- for i.i.d. sequence $X_1, X_2,…$ define the quantities: $S_n(p) = \sum^n_{i=1}|X_i|^p$ $M_n(p) = \max(|X_1|^p,...,|X_n|^p)$ $R_n(p) = \frac{M_n(p)}{S_n(p)}, \quad n\geq1, p\gt0$
- plot $R_n(p)$ on Y-axis over $n$ in X-axis
  - always starts at 1
  - convergent to 0, then p-th moment likely tp be finite
  - divergent from 0, then p-th moment lekely to be infinite
  - upward jumps are common for fot tails
Concentration profile - uses Gini index , useful to carachterize the distribution.
- Order, calculate, remove 1st observation, calculate, …
- heuristics: The tail starts where the profile becomes horizontal
Lorenz Curve + Gini Index $L(p)=\frac{\int^p_0 Q(t)dt}{E[X]}$ $G=\frac{\|L(p)-L_{pe}\|}{\|L_{pi}-L_{pe}\|} = \frac{A}{A+B} \quad 0 \leq G \leq 1$
Truncated Gini Index, connected to VaR ad ES
Moment Ratio Plots
Zenga plot
! GEV and GPD estimation when data comes exactly from GEV or GPD:
- use MLE (when $\xi \gt -1/2$)
- Probability-Weighted moments (otherwise).
- MLE for GEV requires numerical estimation as it is impossible to find analytical solution
- MLE is ok for $\xi \gt -1/2$ (most of the time), but loses consistency of smaller values.
- Probablility-Weighted Moments: good performance for $\xi < 1$, when at leas the $1^{st}$ moment exist. Although it lacks a solid theoretical foundation it is preffered when $\xi \leq -1/2$
! Estimation under MDA (maximum domain of attraction) when data is not exactly GEV concentrate on the tails. Estimation of $\xi$ can be based on upper order statistics $X_{k,n} \leq ... \leq X_{1,n} \quad \text{where } X_{k,n} \text{ is the k-th maximum and } X_{1,n} \text{ is the maximum}$
- tail condition 1: $k(n) \rightarrow \infty$ use a sufficient number of order statistics
- tail condition 2: $n/k(n) \rightarrow \infty$ let the tail speaks.
!!! Hill estimstor Probably the best known tail estimator in EVT
- best used for data in the Frechet MDA (i.e. $\xi = \alpha^{-1} > 0$ ) (i.e. $\xi = 1/\alpha > 0$ )
- if $F \in MDA(\Phi_{\alpha}), \quad \alpha \gt 0$ then $\bar F(x) = L(x)x^{-\alpha}$. The Hill estimator is defined as (+ tail conidtions): $\hat \alpha^H = \hat \alpha^H_{k,n} = \frac{1}{\hat \xi^H_{k,n}} = \left( \frac{1}{k} \sum^k_{j=i} log(X_{j,n}) - log(X_{k,n}) \right)^{-1}$
- order decreasing : start from the 1st, then 1 and 2nd top, etc.. the optimum K is when the plot stabilizes => the corresponding value is the alpha, look down the grpah (see pic) for N of observations
- it is actually MLE
- loog for stability in the plot
OLS on Zipf Olot (Please NO)
other methods..

Lesson 6: Time Series

in TS modelling, time is usually treated as discrete. (chapter is only for equally spaced time series)
returns are stationary prices might not be so.
weak stationary: mean is constant, autocovariance is immune to shifts
detrending: either with linear regression, or differencing (to be preffered when the trend is not deterministic), always use both and choose later on based on the model results.
deseasonalizing - also for business cycles (in R there is the stl() function)
auto-covariance: in principle $n^{-1}$ should be$(n-h)^{-1}$, but there are numerical ans spectral reasons for using this. $\hat \gamma(h) = n^{-1} \sum^{n-h}_{t=1} (x_{t+h} - \bar x)(x_t - \bar x)$ $\hat \gamma(h) = \hat \gamma(-h) \quad h=0,1,...,n-1$
auto-correlation, and auto-correlation function (ACF), which describes how the autocorrelation varies with time (i.e variyng $h$).
cross-correlation: correlation between two time series
partial auto-correlation function (PACF) - auto-correlation with linear dependency removed
auto-regressive operator
ARMA, ARIMA, MA, …

Lesson 8:

Reality is not i.i.d.
under time series, the fundamental assumptions for Fisher-Tippett that observations are not i.i.d, does not hold.
D conditionsn(asymptotic independence) - avoid clusterins and excessive dependence in data, so that we can assume extremes to qualitatively behave as if they were coming from an i.i.d. sequence.
D’ imposes anti-clustering condition.
Similar to block maxima approach, but for non i.i.d.

Lesson 10: The Greeks for Market Risk

Delta: measures sensitivity of the portfolio to small changes in the price of the underlaying asset(s): where $P$ is protfolio and $S$ is price.

\[\Delta = \frac{\partial P}{\partial S}\]

can be seen also as the 1st partial derivative of the value of the portfolio with respect to the price of the unrelying asset(s). (as there are other things apart of the price that influence, hence partial)
Delta hedging - cover for the loss generated by a small varaintion in the price.
To Delta hedge one always need to buy/sell units of the underlying asset
Gamma: measures sensitivity of the portfolio to large changes in the price of the underlaying asset(s): where $P$ is protfolio and $S$ is price.
- can be seen also as the 2nd partial derivative of the value of the portfolio with respect to the price of the unrelying asset(s).
- or the 1st derivative of Delta

\[\Gamma = \frac{\partial^2 P}{\partial S^2}\]

Gamma measures the the curvature of the function (convex or concave).
For linear products (forward, future, …) Gamma is alwyas 0, so Delta is enough. As we have linearity, there is no curvature, hence no need for Gamma.
see pics for linear example
Delta hedge-error is the error when gamma is ignored on nonlinear product
To Gamma hedge one cannot buy the underlying asset, because gamma is related to non-linearity. So to Hendge fpr Gamma, one need something that is non-linearly related to the underlying asset. Typically this is an option on the undrlyig asset.
Gamma hedging (make gamma of portfolio 0), where $w_T$ is the amount of options, $\Gamma_T$ is the gamma of the option, and $\Gamma$ is the gamma of the protfolio: $w_T\Gamma_T+\Gamma=0$ $w_T=-\Gamma/\Gamma_T$
But the option may have a non-0 Delta too, so the portfolio will become non-delta neutral, so you need to make it again delta neutral again.
- see example pics
Vega sensitivy of portfolio with respect to volatility. (e.g. partial derivative of change in portfolio with respect to volatility). So to hedge one needs a product that is a function of volatility => options, but a differnent thant the one used for Gamma (a 2nd option), in order to solve a linear system - see pic example + what is on the desk.
- But again one need to make the portfolio Delta neutral again (e.g. buy underlying asset).
Theta sensitivity of portfolio to time
Rho sensitivity of portfolio to interest rates. Most of the time it si not directly rho that is used, but a function of ot called “duration”
- Duration exposure of a portfolio to a yiled curve movements.
- Let $y$ bw a bond yiled and $S$ its market price. Duration $D$ is defined as: $D= -\frac{1}{S}\frac{\Delta S}{\Delta y}$
- If we consider small percentage changes: $D= -\frac{1}{S}\frac{\partial S}{\partial y}$
- $\displaystyle \frac{\partial S}{\partial y}$ this is RHO
- Can be seen as a weighted average of the maturities of the times at wich one receive money for its invetment (or payning if issueing a bond).
- !!! Duration - How long I have to wait (on avergae) I have to wait befire I gain monye
Convexity - like Gamma, but for interest rates. 2nd derivative
Taylor Series expansion of Protfolio P, as a function of $S, \sigma, t$ then: