KrigHedge: Gaussian Process Surrogates for Delta Hedging

Mike Ludkovski
5 min readMar 2, 2021

--

Authors: Mike Ludkovski (UC Santa Barbara, USA) & Yuri Saporito (FGV Rio de Janeiro, Brazil)

Photo by Manuel T on Unsplash

Hedging is about learning the sensitivities of the contingent claim to evolving market factors. For example, Delta hedging manages risk by controlling for the sensitivity of the financial derivative to the underlying spot price. Theta manages risk by controlling for the sensitivity of the financial derivative to the passing of time, and so on. Thus, successful hedging strategies depend on accurately learning such sensitivities. Unfortunately the related Greeks are rarely available analytically.

We investigate a novel tie-in between machine learning and hedging. The idea is to develop a non-parametric method that does not require working with any particular stochastic model class — all that is needed is the data source (or a black-box simulator) generating approximate option prices. The training dataset is used to fit a data-driven input-output mapping and evaluate the respective price sensitivity. Specifically, we propose to use Gaussian process (GP) surrogates to capture the functional relationship between derivative contract price and relevant model parameters, and then to analytically differentiate the fitted functional approximator to extract the desired Greek.

To assess the quality of our approximation, we examine the P&L of the computed self-financing Delta hedging strategy on out-of-sample scenarios. Assuming perfect approximation, the resulting hedging error should be zero modulo the error due to time-discretization (i.e. our hedges are performed at discrete time increments rather than continuously). As one benchmark, we compare to the so-called implied Delta, namely the Black-Scholes Delta computed at the current implied volatility. The implied Delta does not take into account the skew of the implied volatility and hence is inferior to our data-driven approach.

Our specific implementation brings several advantages over competing methods. First, GPs can handle both interpolation and smoothing tasks, i.e. one may treat training inputs as being exact or noisy, allowing their application across the contexts of

  1. speeding up Greek computations when a few exact data samples are available (model calibration);
  2. utilizing approximate Monte-Carlo-based samples that yield noisy estimates of option prices;
  3. fitting to real-life data.

Second, GPs are well-suited to arbitrary training sets and so naturally accommodate historical data that is highly non-uniform in the price dimension (namely based on a historical trajectory of the underlying). Third, GPs offer uncertainty quantification so rather than providing a best-estimate of the desired Greek, GPs further supply a state-dependent confidence interval around that estimate. This interval is crucial for hedging purposes, since it indicates how strict one ought to be in matching the target Greek. Fourth, GPs interact well with dynamic training, i.e. settings where the training sets change over time.

Our approach is to first build a statistical model for the price P(t,S) of an option, and then compute (via calculus) derivatives of the above approximation. Thus, we utilize “curve-fitting” and our training dataset is solely price-based, facilitating straightforward use by the quant. The method is based on the remarkable property that the gradient of a GP is given by formally differentiating the linear algebra equations defining the predictive mean. In the preprint, we provide the respective formulas for differentiating commonly used kernels, such as squared-exponential (Gaussian), Matern-5/2 and Matern-3/2.

Fig 1: Learning the Delta of a Call option

Once the (t,S) -> P(t,S) surface is learned, we obtain all the respective sensitivities immediately by computing the appropriate gradients. So for example, the Delta is the gradient with respect to “S” and Theta is the negative of the gradient with respect to “t” — both obtained from the same surrogate, hence at no extra cost. Moreover, the probabilistic structure of the GPs ensures a consistent probabilistic representation of model uncertainty jointly across all the above.

Fig 2: Theta of a Call option (GP-based estimate in orange, 95% posterior credible band in dashed orange, ground truth in dashed black). Training set of 400 inputs from a Latin Hypercube Sampling scheme.

In our analysis we benchmark the results using both a very simple Black-Scholes model where all Greeks are known exactly and hence explicit ground truth is available, as well as a local volatility setting where option prices are only accessible via time-intensive Monte Carlo or PDEs. Local volatility models assume that sigma(t,S) is a nonlinear function of the current (local) asset price. Such models can be perfectly calibrated to observed (arbitrage-free) option prices. The volatility sigma(t,S) that calibrates the model to market data is called local volatility function.

We then investigate the role and relative importance of various surrogate ingredients, such as kernel family, shape of experimental design, training data size, and propose several modifications that target the financial application. Moreover, we assess the performance of our Greek approximators both from the statistical perspective, as well as from the trader’s perspective in terms of the resulting hedging error.

Among our main findings:

Fig 3: Estimated Gamma for a Black-Scholes Call across 3 different GP kernel families. Ground truth is the dashed black line. Space-filling experimental design with noisy observations.
  1. The GP Kernel family matters. This is not very surprising since the kernel drives differentiability of the fitted response surface and hence dramatically affects the GP-based sensitivities. For example, the Matern-3/2 kernel is only once-differentiable, and as a result is not able to estimate the Gamma. We also observe that smoother kernels, like the Squared-Exponential, are over-confident and yield posterior credible bands that are too narrow. Overall, we recommend Matern-5/2.
  2. Training set geometry matters. We document that training the GP over paths of the underlying prices (mimicking real-life observed data) is significantly worse that training over a gridded or space-filling experimental design. Consequently, application of machine learning tools for modeling empirical option sensitivities is much more challenging compared to employing them for speeding-up model-based computations.
  3. The role of observation noise (GPs handle both interpolation and regression) is complex. We observe that Greeks are better estimated with more distinct observations (even if noisy), compared to a few high-precision data. This is a useful insight for Monte-Carlo driven hedging frameworks, see Fig 4.
  4. We assess the performance of our GP surrogates directly in terms of the Delta hedging P&L — which is the ultimate financial metric to assess approximation quality.
Fig 4: Root integrated mean squared error of Delta approximation as a function of experimental design size (x-axis) and number of Monte Carlo simulations for price evaluation (“InnerSims” color)
Results for a local volatility example from the full preprint.

Full analysis is available in the preprint on Arxiv.

--

--

Mike Ludkovski
Mike Ludkovski

Written by Mike Ludkovski

Professor of Financial Mathematics at UC Santa Barbara

No responses yet