Notes on Taleb
Others
- “Truth is in the extremes. There is no noise in the fat tails.”
- “Every Ponzi is sold as a non-zero sum store of value.”
Statistical consequences of fat tails - reading club
- Empirical survival functon = 1 - CDF
on chapter 4:
- Tail may start at:
- convex to concav part of the distribution (Jensen inequality)
- between 2 and 3 STD
- Chebyshev’s inequality
on chapters 6-7:
- (Taleb) Entropy - a problem with continuous vs discrete entropy. Mutual information is made for continuous distributions. $-log(f(x,y)/[f(x)f(y)])$
- solve it by making a kernel distribution out of a histogram and use it insde the $F(x,y)$ function.
- So instead of using summation of discrete functions (summing the logs), one needs to smooth the points. So use a smooth kernel. It doesn’t matter how good the kernel is as long as it gives you a smooth function. Then integrate the logs (the $-log$ part)
- see calculation of mutual information with kernel function and copula entropy.
- For Pareto, as it is unstable distribution, any $\alpha$ higer than 2 will be 2.
on chaprers 8-9
- Jensen inequality
- Mean Deviation (average of the sqrt) over Standard Deviation (sqrt of the average)
- $STD \geq MAD$, as sqrt is concave
- KAPPA
- compare tail fatness of different distributions.
- the speed of tracking a sample size
- (Taleb) Prefer MLE (logs) over the Method of Moments (sample average).
- Instead of getting the mean directly, derive the $\alpha$ (thru MLE calibration - Hill estimator), and then get the mean (the Pareto mean).
- $\alpha$ give extrapolates over things you have not seen. Mean does not extrapolates.
- Mutual infomration over correlation
- Kolmogorov information
- (me) + see wikipage: variation of information or shared information distance - Unlike mutual informatio it obeys triangular inequality (i.e. true metric)
on chapter ???
- If the variance exists, ut the curtosis is infinite, one cannot find the variance.
- $\alpha$ is not a hard threshold.
- stable distributios can be combined with linear combination
- mean for Paerto -> use the same logic to find it as for Gini!?
- power law extrapolates => one can take the mean
on chapter 24-25
- !!! Mistaking tails for volatility. Fat tails is not the same as high varaince. High varaince reduces kurtosis.
- variance: “average squared deviation”
- kurtosis: average of “squared deviation squared” / square of “average asquared deviation”
- 4th moment is like non-adjuste kurtosis, can be seen as varaince of the variance.
- kurtosis could be seen as Jensen inequality ratio. Kurtosis deviates form 1 of the 3 inequalities.
- …extract the 2nd moment in order to achieve neutrality ot it. Sucj a newutrality requires some kind of “short volatility” in the body because higher kurtosis means lower action in the center of the distribution.
- recall dynamic hedging book: p202 - moments explained, p264 4th moment trade example
- greeks $\leftrightarrow$ payoff fucntion $\leftrightarrow$ polynomials $\leftrightarrow$ moments
- see excel file
- skewnness:
- 3rd moment is like the varaince of the 1st moment, how the 1st moment is distributed.
- betting on 3rd moment is betting on the symmetry. 5th moment is the symmetry of the 3rd moment.
- another way to view it is as a corelation between 1st and 2nd moment
- ingreased vilatility when the 1st moment has increased => positive skewness
- increased volatility when the market goes down => negative skewness
- numerarie is not necesarrily something that is risk free, but somethong that maps to your consmuption (basket of your needs).
- If there is an inlfation, but I only buy wine and chees, and these are now cheaper => no inflation fo me.
- Taleb’s numerarie: a synthetic combination of a little bit of: gold, dollars and euros
- max entropy to evaluate correlation risk measure ??
- max likely hood is close to max entropy. but they are conceptually differnt.
- In financy we use a Gaussian becuase it s a maximum entropy distribution under the constraints of $\mu$ and $\sigma$. If you set the 1st and 2nd moment and maximise the entropy (maximum ignorence, disorder, uncertainty) then you end up with a Gaussian. But what if you maximise the distribution under tail risk constraints (i.e. no tail risk), then what is the distribtuion?
- maximise the entropy under no constraints => Caushy
- and if thre is a log constraint => Student T (power law). So you have a power law without the tail, because you have a hard hedge on the tail (not distribution hedge). And this is the idea of max entropy under tail risk constraint.
Statistical consequences of fat tails - book
Chapter …
- The natural boudary between Mediocristan and Extremistan occurs at the subexponential class.
2
- Shadow moments (a.k.a. plug-in estimation): instead of direct measurement, uses a MLE parameters, say the tail exponents, and then derive the shwadow moments. (ch 13, 15)
- The tail exponent $\alpha$ captures, by extrapolation, the low-probability deviation not seen in the data, but that plays a disproportionately large share in determining the mean. This generalized approach to estimators is also applied to Gini and other inequality estimators. So we can produce more reliable (or at least less unreliable) estimators for, say, a function of the tail exponent in some situations. But, of course, not all.
- Unusable:
- Metrics such as standard deviation and variance are not useable. They fail out of sample, even when they exist, even when all moments exist.
- Beta, Sharpe Ratio and other common hackneyed financial metrics are uninformative.
- The method of moments (MoM) fails to work. Higher moments are uninformative or do not exist.
- In the absence of reliable information, Bayesian methods can be of little help. The key is that one needs a reliable prior, something not readily observable
- Invisibility:
- We do not observe probability distributions, just realizations.
- A probability distribution cannot tell you if the realization belongs to it.
- You need a meta-probability distribution to discuss tail events (i.e., the conditional probability for the variable to belong to a certain distributions vs. others).
- Miltiplicative risk scale example: At the beginning of the COVID-19 pandemic, many epidemiologists innocent of probability compared the risk of death from it to that of drowning in a swimming pool. For a single individual, this might have been true (although COVID-19 turned out rapidly to be the main source of fatality in many parts, and later even caused 80% of the fatalities New York City). But conditional on having 1000 deaths, the odds of the cause being drowning in swimming pools is slim.
This is because your neighbor having COVID increases the chances that you get it, whereas your neighbor drowning in her or his swimming pool does not increase your probability of drowning (if anything, like plane crashes, it decreases other people’s chance of drowning ).
This aggregation problem is discussed in more technical terms with ellipticality, see Section 6.8 – joint distributions are no longer elliptical, causing the sum to be fat-tailed even when individual variables are thin-tailed.
Technical incerto
1. Real world statistical consequences of fat tails
- Fat-tails do not deliver a lot of deviations from the mean, but big deviations once in a while.
- fat-tails means higher peaks for the distribution, as the fatter the tail, the more market spend time between the Mean and SD
- => less variations that are not tail events
- => more quiet time, not less
- One need meta-probability distributions to discuss tail events.
- If we know nothing about the 4th moment (Kurtosis is a function of it), we know nothing about the stability of the 2nd moment.
- figure out the distribution and then calculate mean (ot other moments) from it.
2. Convexity risk and fragility
- $E_n(X_t) \geq E_\tau(X_i)$ Ensemble probabilities and Time probabilities.
- Law of Large Numbers (LLN) obeys arithmetic average for Ensemble probabilities
- obeys geometric average for time probabilities => follow different LLN
- What matter for the Climate is not the average, but the distribution of the extreme events.
- $PDF{x} \neq PDF{f(x)}$
- Do not mistake the probabilities of exposure to a X variable for the variable itself.
- It is easier to modify the exposure to get tractable properties than to try understanding $x$.
- Confusion between truth space and consequence space.
- Test example: change airport traffic +/- 10% and check travelling time.
notes
- payoff function IS NOT exposure
- measures of VEGA with L1 norm
- !!! fragility def : (1)sensitivity of a given (2)risk measure to (3)an error in the (4)estimation of the (5)deviation parameter of a (6)distribution. Because the (2)risk measure involves parts of (6) the distribution that are away from the portion used for (4)estimation.
- anti-fragility (sensitive to volatility)
- similar to VEGA of an option or non-linear payoff
- not exactly the opposite of fragile. Implies VEGA above a K threshold in the positive tail of the distribution (right one), and an absence of fragility on the left tail.
- Utility function is not only convex or concave, but both (as in Prospect Theory)
- Jensen inequality expects monotonic transformations, but not both at the same time.
- Being long on the option side of the tail eliminates the need to try to figure out what we do not know.
- If you are short a call spread strike at $K$ and long another call ot $K+y$ => you are “short volatility”, but you are not exposed to infinite variance.
- Portfolio constructed of infinite variance security does not have to be with infinite variance.
- example for convexity and Jensen inequality
$1+3+5$ | $f[E(x)]$ | relation | $E[f(x)]$ | type |
---|---|---|---|---|
$f(x)=x$ | $(1+3+5)=9$ and $9/6=1.5$ | $=$ | $1+3+5=9$ and $9/6=1.5$ | linear |
$f(x)=x^2$ | $(1+3+5)^2$ and $(9/6)^2=2.25$ | $<$ | $1^2+3^2+5^2$ and $35/6=5.83$ | convex |
$f(x)=\sqrt x$ | $\sqrt{(1+3+5)}$ and $\sqrt{9/6}=1.22$ | $>$ | $\sqrt1+\sqrt3+\sqrt5$ and $\sqrt{9}/6=0.5$ | concave |
- $E[x] = \displaystyle\sum_{i=1}^\infty x_i p_i$ // discrete
- $E[x] = \displaystyle\int_{-\infty}^\infty xf(x) \delta x$ // continuous
here and there
- I show elsewhere that if you do not know what a typical event is, fractal power laws are the most effective way to discuss the extremes mathematically.
- It does not mean that the real world generator is a power law - it means you do not understand the structure of the event
- it simplifies the math: only one parameter alpha, that increase or decrease the role of rare events in total probabilities (instead of doing simulations)
- Parametrizing power law lends itself to monstrous estimation error (the horrible inverse problem), as small changes in the alpha leads to monstrously large effect in the tails.
- alpha is not observable
- 4th quadrant meaningless measurements: SD, Linear Regression, ANOVA, least square, Markovitz portfolio optimization (relies on estimates of covariance)
- !!! convexity (long-gamma property) is easier to attain than knowledge
- reduce the cost per attempt, compensate by multiplying the number of trials, allocate 1/N of potential investment, make N as large as possible
- => minimize P of missing, rather than maximize profits on a win.
- stay flexible with frequent ins/outs; be very short term in order to capture the long term. 5 sequential five-year options are better than single 5 year option.
A conversation between Nassim Nicholas Taleb and Stephen Wolfram at the Wolfram Summer School 2021
- 2nd level of stochasticity is variance of the variance.
- Instead of taking an option of the average varaince I take an average option accross the two state varaince. It is out-of-the money. $F[E[\sigma]] \ vs \ E[F[\sigma]]$, where F is the option function.
- 25 degree, 25 degree (Celcius) or 0 and 50 degress.
- In finance I see the world as arbitrage. In econmics I see the world as law of one price. I traded by looking at similar things with different evaluation => replicate an item into antoher (ME - if it walks like a duck it is duck) + dynamic replication // an option into stock, inot another opition. Comparision is done by comparing the distributions.
- When you are under fat tails, these do not work dynamically.
- I know of no model than discounted cash flow to price anything.
- on numeraire: I take a basket of goods and services, constantly revised, and see what numeraire will make this basket non-volatile.
- I use gold. So on days when gold is down I return the merchandise in silver ????
- When you do evidence based sceince you are making claims about the aggregate, not about individuals. And a lot of people don’t get that calims about aggregate hold for an aggregate, not for individuals.
- p value is random variable.
- Gini is Bad (one issues is that it is static).