Bayesian eikonal tomography using Gaussian processes

Eikonal tomography has become a popular methodology for deriving phase velocity maps from surface wave phase delay measurements. Its high efficiency makes it popular for handling datasets deriving from large-N arrays, in particular in the ambient-noise tomography setting. However, the results of eikonal tomography are crucially dependent on the way in which phase delay measurements are interpolated, a point which has not been thoroughly investigated. In this work, I provide a rigorous formulation for eikonal tomography using Gaussian processes (GPs) to interpolate phase delay measurements, including uncertainties. GPs allow the posterior phase delay gradient to be analytically derived. From the phase delay gradient, an excellent approximate solution for phase velocities can be obtained using the saddlepoint method. The result is a fully Bayesian result for phase velocities of surface waves, incorporating the nonlinear wavefront bending inherent in eikonal tomography, with no sampling required. The results of this analysis imply that the uncertainties reported for eikonal tomography are often underestimated.

ever, the results of eikonal tomography are crucially dependent on the way in which phase delay measurements are predicted from data, a point which has not been thoroughly investigated.In this work, I provide a rigorous formulation for eikonal tomography using Gaussian processes (GPs) to smooth observed phase delay measurements, including uncertainties.GPs allow the posterior phase delay gradient to be analytically derived.From the phase delay gradient, an excellent approximate solution for phase velocities can be obtained using the saddlepoint method.The result is a fully Bayesian result for phase velocities of surface waves, incorporating the nonlinear wavefront bending inherent in eikonal tomography, with no sampling required.The results of this analysis imply that the uncertainties reported for eikonal tomography are o en underestimated.

Non-technical summary
Eikonal tomography is an imaging method that uses slight variations between seismic waves trapped at the surface of the Earth to infer information about the properties beneath the surface.To be able to perform the best possible eikonal tomography, we need to be able to predict in between measurements of these variations at di erent seismic recording stations as best we can.Furthermore, end-users of seismic tomography require information about the uncertainty of the images.In this paper, I perform this prediction using Gaussian processes (GPs), a method with particularly nice mathematical properties.The GP prediction results in robust uncertainty measurements for our imaging problem without many of the computational di iculties associated with other uncertainty quantification methods.

Introduction
Surface wave tomography is a cornerstone imaging technique for the investigation of the crust and upper mantle.However, due to the signi cant non-planarity of scattered surface waves, interpretation of surface wave data is not straightforward (e.g., Wielandt, ).Despite this issue, the increasing proliferation of dense seismic arrays, combined with the advent of ambient-noise correlation methods, has motivated intense study into surface wave tomographic techniques.
To ameliorate the great cost of nonlinear ray tracing for large inverse problems, a large part of this study has focused on methods that derive surface wave properties from only local information contained in the wave eld.Beginning with a wave eld perturbation approach (e.g., Friederich et al., ; Friederich and Wielandt, ; Pollitz, ), theoretical e orts in local surface wave inversion have since concentrated on direct measurement of wave eld derivatives (e.g Lin et al., ; Lin and Ritzwoller, ; de Ridder and Biondi, ; de Ridder and Maddison, ).Likely owing to its simplicity, the most popular extant method is eikonal tomography (Lin et al., ), which relies on the determination of the wave eld phase gradient across an entire local or regional array.For a single surface wave mode propagating with phase velocity C p , frequency ω, phase delay τ and amplitude A, the Helmholtz equation implies that (Tromp and Dahlen, Simplifying this relationship under the assumption that the frequency of the wave is large compared to perturbations in the wave amplitude gives us the eikonal equation: Eikonal tomography uses Equation to directly infer local phase velocity from local phase gradient.A distinction compared to local gradiometry is that calculation of the phase gradient is performed simultaneously for all desired locations by tting a delay curve across an array, rather than by local analysis of sub-arrays (Langston, a, e.g.,).The assumption that the wavefront is smooth relative to frequency is strong, but the di culty associated with measuring wavefront curvature accurately has ensured that eikonal tomography remains a central technique in array analysis.Application of eikonal tomography in practice has typically resulted in images comparable to other tomographic methods and Helmholtz tomography (which uses Equation directly), especially when results are averaged azimuthally (Bodin and Maupin, ; Lin et al., ; Lehujeur and Chevrot, ).
In this work, I employ Gaussian process theory (Rasmussen and Williams, ) to derive semi-analytic closed-form approximations for the posterior distribution of eikonal-equation-based phase velocity measurements using the saddlepoint method (Butler, ).In this case, semi-analytic means that the posterior approximations have a single parameter that must be solved using constrained minimization techniques -no Monte Carlo methods need be used.As a result, the approximate posterior can be calculated very quickly.As an intermediate result, I derive fully analytic posteriors for the gradient of phase delay.The delay gradient posteriors can be sampled using standard multivariate normal ran-dom number generators, which provides an e cient way to compute arbitrary statistics of the GP posterior when the semi-analytic approximations are di cult to obtain.

Eikonal tomography from derivatives of Gaussian processes
The least well-de ned problem in eikonal tomography is how to go from point measurements of phase delay to the phase delay gradient map (Lin et al., ).It is in this process that the practitioner has the greatest control over the resulting phase velocity map; intuitively, we can immediately see that over-smoothing the map will result in a measurement of C p that is too large; conversely, maps that are too rough will result in too small C p .Past studies have typically employed splines (either in tension (e.g., Lin et al., ; Lin and Ritzwoller, ) or smoothing Chevrot and Lehujeur ( )) to perform prediction.The spline framework is a robust general interpolation or smoothing method, however in its basic formulation it gives a single maximum-likelihood estimate of the prediction, with no associated uncertainty information.
This study is aims to place the problem of estimating an optimal phase gradient map on a robust Bayesian footing, where all assumptions are explicit, adjustable, and optimizable in the face of the data.In this study, the problem of predicting phase delay measurements is posed as a Gaussian process (GP) regression -we will see that this framework meets the desiderata for estimating phase gradients.GPs are a particular framework for de ning distributions over function spaces (Rasmussen and Williams, ).GPs have the property that any nite collection of points sampled from them will have a multivariate Gaussian distribution.A GP is de ned by a mean function f (x) and covariance function k(x, x ), which generate the mean and covariance matrix of a nite collection of points drawn from the GP.In the context of regression, this leads to a powerful result -if we assume a GP prior for an unknown function, and we then observe data with a Gaussian likelihood, the posterior distribution for the unknown function will also be a GP.Thus, GPs fully generalize nite linear regression and Gaussian inverse problems to the function space setting (Valentine and Sambridge, a,b).As di erentiation is a linear operation, derivatives of GPs are again also GPs.We will use these properties to derive closed-form posterior distributions for the derivatives of observed data under a GP prior.While the motivating example is eikonal tomography, these techniques are applicable to regression problems generally.Derivatives of GPs have long been used in the dynamical control community (e.g.Solak et al., ; Rasmussen, ) Closer in spirit to seismology, GP derivatives have also been applied to the identi cation of geodetic transients (Hines and Hetland, ).
The presentation described here is generalized from McHutchon ( ).
In general, bold font refers to D collections of data and capitals to matrices.Boldfont capitals are therefore collections of n data in d coordinates and will have dimensions n × d.Coordinates (i.e., x) may be vector quantities but will not be boldfont.To begin, assume that there are measurements (X, y) of the observed phase delay y at points X. Assume that the data y are noisy; for the purposes of exposition this is taken to be identically distributed Gaussian noise η with the distribution N (0, σ), but arbitrary multivariate Gaussian noise distributions are also easily handled by GP theory.
This implies that there is an unknown true phase delay eld τ (x) with y = τ (X) + η. (3) The objective of eikonal tomography is to know the eld τ (x) so that we can di erentiate it and get C p .I assume that where f is a zero-mean GP and τ 0 (x) is a reference phase delay eld, for example for a laterally homogeneous medium.
where k(x, x ) is the assumed covariance function.For the examples in this work, I will use a squared-exponential kernel with independent length scales in each dimension for the covariance function: This covariance function promotes very smooth elds (it is in nitely di erentiable), and provides a degree of exibility due to the independent length scales.I also assume that τ 0 (x) = s 0 |x| for a xed reference slowness s 0 .Let K XX be the matrix of evaluating k with rows given by X and columns by X .The fundamental idea of GP regression is that, given this problem setup, then the observed data y and the predicted data τ (X ) has the joint multivariate Gaussian distribution By conditioning τ (X ) on the observed data y we have (Rasmussen and Williams, ) Note that data error models with Gaussian covariance just require replacing σ 2 I with C D . Figure shows an example application of GP regression for obtaining τ (x)|(y), with comparison to the approach based on regression using splines (e.g., Lin et al., ; Lin and Ritzwoller, ) -in this case, using smoothing splines (e.g., Chevrot and Lehujeur, ).
This example emulates a typical local surface wave application, using data points uniformly distributed within the inversion region with .s added Gaussian noise.The synthetic phase delay eld is strongly perturbed away from the reference model to highlight di erences between the two methods.The GP mean and standard deviation are given analytically, and show substantial di erences with the smoothing spline t -here, the spline smoothing parameter is automatically set by the FitPack routine (Dierckx, ).In comparison to the GP, the spline performs less well, especially in areas of data gaps.I can now calculate expectation values for the derivatives; note that from now on I implicitly condition on y but will not write it out for ease of notation, unless it seems particularly germane to do so.Since di erentiation is a linear Figure 2 Cross-sections through the GP reconstruction, showing the true phase delay (black), GP mean (orange) and standard deviation (grey).The GP reconstruction is overlaid with the noisy observed delay values.The GP posterior closely follows the true phase delay curve, with substantially higher uncertainty near the edges of the domain, even before extrapolation.
operation, and linear operations acting on normal distributions result in normal distributions, the components of ∇τ must also be normally distributed, and are completely speci ed by their mean and covariance.The collection of means for component i are immediately given by recognizing that as the expectation operator is also linear, it commutes with the derivative operator: Note that the mean value of the derivatives are calculated independently for each dimension; however as we will see they do have covariance between output points and between dimensions.For the covariance, consider n × n blocks of the covariance matrix of size nd × nd where d is the dimension and n is the number of output points.Note that I choose to order the hierarchy of the covariance matrix rst by derivative coordinate, and second by data point index, as it makes the notation more convenient.As the covariance is bilinear, where I introduce the dummy variable x to represent the second argument in the covariance (X = X , but we want to formally di erentiate in respect to the second slot only when using x ).Continuing on, So that I can compress the notation somewhat, let us de ne KXX = K XX + σ 2 I and ∆y = y − τ 0 (X).For the D case investigated here (noting that higher dimensions immediately generalize), the conditional posterior is a multivariate Gaussian with mean given by Equation and covariance given by Equation : which is an exact distribution for the derivatives evaluated at X . of the posterior for the squared slowness and compare it against the predictions from the smoothing spline.The GP posterior is in this case more accurate than the spline result, and also delivers uncertainty information.
Unfortunately, it turns out that this is as far as it is possible to go with exact distributions, as the velocity is a nonlinear function of the gradients in eikonal tomography.Thankfully, however, there is well-developed theory for approximating quadratic forms of normal random variables, and as 1 , which is a quadratic form of a normal random variable, it may be possible to try for a good approximation to the velocity.Before deriving one, however, there are two important issues to investigate -setting hyperparameters, and closed forms for the expectation value of velocity.

Finding good values for GP hyperparameters
The hyperparameters of the GP may be optimized by maximizing the log marginal likelihood of observations, where the marginalization is performed over the unknown function values τ (X) (Rasmussen and Williams ( )).This gives the type-II maximum likelihood estimate; the hyperparameters have a point-estimate, whereas the function values have a full posterior distribution given that point-estimate.The log marginal likelihood for GP regression is given by where the covariance matrix KXX (θ) is treated as a function of the hyperparameters θ, and n is the number of data.
Intuitively, the log marginal likelihood parsimoniously balances data mis t (the rst term) with the level of uncertainty (the second term).For a D squared-exponential kernel with independent length scales, independent Gaussian data noise, and a laterally homogeneous medium as a reference model, the hyperparameters are θ = (ρ, l 1 , l 2 , σ, s 0 ).

A special exact case for eikonal tomography: The expectation value of squared slowness given normally distributed derivatives
Consider without loss of generality a D case.The squared slowness is given by Assume the phase gradient is given by a multivariate Gaussian random variable that describes the joint distribution of the two derivatives τ x1 , τ x2 , and let S 2 be the random variable describing the distribution of slowness squared.This is, for example, the distribution that arises for the derivatives of a single point conditioned on observations under GP regression as described above.Then As the slowness squared is a scalar, I can take the trace to proceed as follows, following Kendrick -2 0 2 0.0 0.2 0.4 -1.0-0.50.0 0.5 True Derivative GP Analytic GP-FD GP-FD Moments Comparison of the true squared slowness against results calculated using a squared-exponential Gaussian process with tuned hyperparameters.The GP mean and standard deviation are calculated by drawing 100,000 predicted travel time gradients.The spline squared slowness has been calculated using 5 th order centred finite di erences.The GP result has a mean closer to the truth, and additionally adds uncertainty information, when compared to the smoothing spline.The colouring of the di erence plots is arranged according to the usual seismic convention of blue being a fast and red being slow; in this case blue means that the predicted slowness is smaller compared to the truth and vice versa; note that this induces a colour flip compared It is instructive to note that the expectation value of squared slowness is strictly greater than the sum-of-squares of the mean derivatives, so that velocities are "biased" lower a er accounting for errors.Note that this is true for any calculation that assumes the derivatives have a Gaussian distribution, not just the Gaussian process framework analysed here.

Approximation of the posterior using the saddlepoint method
The analytic results obtained for the derivative ∇τ have already given us a great deal.Any expectation value that depends on these derivatives (in particular, moments of the phase velocity) can be calculated using the Monte-Carlo method -i.e., by drawing many random samples of ∇τ and then calculating the desired statistics on this random sample.Because it is possible to draw directly from the posterior of ∇τ given Equation , every sample can be used and is independent (unlike in Markov-Chain Monte-Carlo).As such, these expectation values will usually converge quickly.However, there are cases where it is still useful to have approximations of the posterior that can be even more quickly calculated; for instance if the eikonal tomography derived phase velocities are being used in a joint inverse problem, or if accurate statistics for extreme values need to be calculated.A frequently used simple approximation would be to us Laplace's method directly on the posterior distribution for ||∇τ || 2 or C p .The approximate posterior under using this technique is the best tting Gaussian distribution.However, looking at Figure , it is clear that neither distribution is close to Gaussian, and may not in fact have a clear mode to t.
Instead of approximating the posterior directly, I instead use the saddlepoint approximation.The saddlepoint approximation for the distribution of random variables was originally proposed by Daniels ( ), with Butler ( ) giving a thorough account of the basic method.Very roughly, the idea is to examine the cumulant generating function (CGF) where f (x) is the probability distribution of X and X is its domain of support.The existence of the CGF requires that there is some interval a < 0 < b such that the above integral converges.Applying Laplace's approximation for this integral and rearranging terms, where ŝ is the solution of K (s) = x.ŝ is a saddlepoint of the integrand in Equation , hence the name "saddlepoint approximation".If the application requires it, f (x) then typically has to then be normalized to integrate to unity so that it is a true probability distribution, giving us If the application only requires the PDF up to proportionality (as is o en the case), then the above normalization is not required, and the saddlepoint approximation requires no integration whatsoever.Butler ( ) shows that this optimization problem is well posed and gives a unique real solution for f , if s is constrained to be inside the interval that contains 0 for which K(s) converges.Serendipitously, this low order method o en provides extremely good approximations to the PDF, as the CGF K contains the full information about the distribution of X.For sums of random variables (such as ||∇τ || 2 ), it is almost always easier to construct the CGF K analytically rather than the PDF f , as where X and Y are arbitrary random variables and * is the convolution operator.Therefore, when using the saddlepoint approximation of to obtain the PDF, multiple potentially slowly converging convolution integrals are converted into a simple root-nding problem with a unique solution.Let us now apply this concept to deriving the PDFs of ||∇τ || 2 and C p from our closed form posteriors for phase delay derivatives ∇τ .To do this, my goal is to write the distribution of ||∇τ || 2 in a form for which I can determine the CGF K ||∇τ || 2 , and then use the saddlepoint approximation to obtain the posterior PDF f||∇τ|| 2 , from which I can also obtain the posterior PDF fCp using a change-of-variables formula.
For simplicity, I approximate the posterior for a single point x given data (X, y).I have shown that ∇τ (x )|y ∼ N (µ, Σ) for a d dimensional mean vector µ and a d × d covariance matrix Σ.Therefore, where QΛQ T = Σ is an eigenvalue decomposition of Σ and h is a d-dimensional standard normal variable h ∼ N (0, I).Q contains the normalized eigenvectors as its columns and Λ is a diagonal matrix of corresponding eigenvalues.Assuming that the phase delay measurements are taken in di erent locations, all of the terms in Λ are positive as then Σ, as a non-degenerate covariance matrix, is positive de nite.I can then write where μ = Λ − 1 2 µ.The eigenvalues collected in Λ are labelled λ i , with corresponding components of μ labelled μi .The quadratic form in Equation can be written as a sum over non-central chi-squared distributions (Imhof, ; Butler and Paolella, ).The degree of freedom of each non-central chi-squared corresponds to the multiplicity of the eigenvalues of Σ, which will for our purposes always be distinct, giving Because of the summation property of the CGF, the CGF of ||∇τ (x )|| 2 is then (Butler and Paolella, ) and the derivatives are given by The domain of convergence in which the root of K (s) = x is sought is the largest open interval containing zero for which , where λ max is the largest eigenvalue of Σ. Applying the saddlepoint approximation given the above K gives us the saddlepoint distribution f||∇τ(x )|| 2 (x) for the squared slowness, which can be normalized to give The transformation between squared slowness and phase velocity is given by g(x) = 1 √ x , which is a monotone decreasing function.The appropriate Jacobian transformation rule to obtain the approximate PDF of phase velocity is then (Kadane, The approximate distributions f||∇τ(x )|| 2 (x) and fCp (x ) are plotted against a histogram of , , draws of the squared slowness and phase velocity using the analytic derivatives in Figure , showing that the saddlepoint approximations are a close t.Higher order saddlepoint approximation terms and approximations for the cumulative distribution function (CDF) are collected in Butler ( ).
The saddlepoint method can be further applied to the joint distribution function two points to derive the approximate spatial covariance (Al-Na ouri et al., ).Because the underlying posterior distributions for the derivatives is given by a GP, the covariance completely describes the spatial behaviour of the velocity distribution, and so the ability to calculate the distribution for any two arbitrary points is su cient to fully characterize the posterior.However, the resulting rootnding problem will be two-dimensional rather than one dimensional and is substantially more complicated than the forms derived here, so they are le for future work.

Implications for sample statistics
Most eikonal tomography applications report per-station per-frequency error statistics by computing the standard error in the mean phase velocity over multiple sources.Studies typically appeal to the central limit theorem to justify the use of the sample standard error formula and sample mean for quantifying the data distribution.The reported standard errors are then used to weight data in further inversions -a typical use case is to perform D Bayesian inversion beneath each station using the mean values and the reported error.Previous methods do not optimally smooth the phase delay regression that underlies eikonal tomography, potentially producing biased results, and do not produce uncertainty estimates for each source.However, uncertainties reported in studies using these methods are o en extremely low, amounting to a few percent of the estimated phase velocity.
In our GP framework, Monte Carlo sampling can be used to directly estimate the distribution for sample statistics such as the mean over multiple sources.As a motivation, observe that both the empirical distribution for phase velocity and its saddlepoint approximation is heavy tailed in Figure .This is a point relatively close to the edge, which can result in a distribution that is far from Gaussian.Taking this point, I then draw 4 n samples of velocity for n = 0 . . .6, calculate the sample mean and median, and then repeat , times to nd the distribution in the sample statistics.
Figure shows the results.The sample mean converges only slowly to a normal distribution, and is still broad even with samples.In comparison, the sample median is well-behaved and converges quickly as the sample size increases.For both sample statistics, the distribution for small numbers of samples is unsurprisingly quite similar to the underlying velocity distribution, and is consequently heavy tailed -this should be taking under consideration for applications such as tting azimuthal anisotropy pro les to eikonal tomography results, where many azimuth bins near the edges of arrays will o en have few contributing sources.

Future work
In this study, I present the simplest possible implementation of a GP framework for eikonal tomography with analytic derivatives of phase delay.The exibility of GP modelling o ers several opportunities for future improvements that should result in more robust inversions.The rst of these is that multi-frequency eikonal inversion is naturally handled by GP modelling by assuming a space-frequency covariance function.The most simple model would use a separable function k((x, f ), (x , f )) = k x (x, x )k f (f, f ).A smooth frequency covariance k f (f, f ) would reduce the impact of missing data in particular frequency bins, which can be an issue due to spectral holes in surface wave trains.
Secondly, the squared-exponential kernel used in this study could be further improved to better represent the behaviour of true seismic wave elds; for instance, the problem could be recast in radial coordinates with a radial-azimuthal kernel as studied in Padonou and Roustant ( ).Due to the natural cylindrical symmetry of wave propagation, this may allow us to reduce the uncertainty in the eikonal tomography results.In particular, this kernel choice would be appropriate in use cases such as ambient-noise tomography where the seismic source is inside the array, resulting in highly non-planar wavefronts.
Mean Velocity (km/s) The mean or median is calculated by drawing 4 n samples for n = 0 . . .6.This process is repeated 100,000 times to obtain the distributions of sample means and medians.The sample mean converges to a normal distribution slowly.
A third option would be to use the GP framework for smoothing the underlying full wave eld records before processing them for phase delay measurements or for other gradient based techniques such as wave eld gradiometry (e.g., Langston, a,b; de Ridder and Biondi, ; de Ridder and Maddison, ) or full Helmholtz tomography (Lin and Ritzwoller, ).These applications would potentially require extending the GP derivative theory to higher order, but again noting that derivatives are linear, the resulting distributions for higher order spatial terms will also be GPs.The GP framework is especially well suited towards the inclusion of strain measurements in joint wave eld reconstruction (e.g., Muir and Zhan, ) as the appropriate covariance kernels can be calculated using the results in Equation an enticing prospect considering the proliferation of distributed acoustic sensing (DAS) strain sensors (Zhan, ).GP based techniques have also been used in geodesy to investigate transient strain rates (e.g., Hines and Hetland, ), and the saddlepoint approximation techniques investigated here could o er a way to more accurate quanti cation of strain invariants arising from geodetic analysis.
Finally, as the number of phase delay measurements increases across stations and frequency bins, the size of the data covariance matrix K increases.For n measurements, the cost of inverting this matrix scales like O(n 3 ), so very large collections of measurements pose a challenge for GP based inversion.Due to the popularity of GPs in machine learning research, there are a wide range of sparse GP approximations that produce almost identical results and still result in analytic derivatives once the sparsity structure is determined (e.g., Titsias, ; Lindgren et al., ; Wilson and Nickisch, ).Employing these methods would allow e cient upscaling of the methodology presented here to multi-frequency inversion of USArray-scale datasets.

Conclusions
This study derives an analytic posterior distribution for phase delay derivatives, and then derives approximate posteriors for phase velocity using the saddlepoint approximation applied to the eikonal equation.The result is a fully Bayesian eikonal tomography that requires no MCMC sampling to characterize the posterior.As such, computations are easily implemented and highly e cient.Using the GP framework as a basis, I investigated two important e ects that impact the interpretation of eikonal tomography results, namely the e ect of the inclusion of data uncertainty on the expectation value of velocity and the behaviour of sample statistics, both of which suggest that the uncertainty in eikonal tomography results is greater than previously assessed.The GP framework presents a fully interpretable way forward to improve eikonal tomography in the future, with many opportunities for future work due to the exible and robust nature of GP modelling.

Data and code availability
I have included the Pluto notebook used to generate the results in the submission.This notebook will be uploaded to Zenodo a er acceptance so that the assigned DOI corresponds to the nal version used for the publication.
Figure compares the GP reconstruction with the true values of the phase delay map.T The GP mean closely ts the true values, although the level of uncertainty becomes quite substantial near the edges of the domain.

Figure 1
Figure1Comparison of the GP posterior (showing mean and point-wise standard deviation) of the phase delay with a smoothing-spline based solution for an example phase delay data set with 100 randomly distributed points and 0.2 s Gaussian noise.There are notable di erences in the estimated phase delay, especially where there are gaps in the data coverage.The colouring of the di erence plots is arranged according to the usual seismic convention of blue being a fast and red being slow; in this case blue means that the predicted arrival is fast compared to the truth and vice versa.
Figure shows the mean and covariance structure for the derivatives at two test points calculated using the above theory, compared to the true derivative of the phase delay, and nite-di erence estimates computed using random draws of the GP estimate of the phase delay (i.e., Monte-Carlo nite-di erence derivatives).Both the analytic and Monte-Carlo results closely agree with each other and with the true values for the derivatives.In Figure , I use the multivariate normal posterior for the derivatives to generate samples

Figure 3
Figure 3Corner plot showing the covariance of derivatives at two test points, and their individual histograms.The test points are τ 1 at (0.1875,0.3875), and τ 2 at (5.5, 3.5).Black crosses and lines show the true value of the derivatives.Orange lines show the analytical GP based solutions derived in this paper, with ellipses drawn at the 95% credible level and crosses showing the mean.Grey circles and histograms show finite-di erence (FD) based derivatives using Monte-Carlo samples of the GP posterior for phase delay, and red crosses and ellipses show the mean and estimated covariance at 95% confidence from the FD draws.

Figure 5
Figure 5Comparison of the empirical CDF and PDF (grey) for the squared slowness and phase velocity for the point at (0.1875,0.3875) with the saddlepoint (SP) approximation (orange).For the PDF, the true value is also shown in black and the median, 25 th and 75 th percentiles of the empirical PDF are shown in purple.The empirical distributions are truncated between 0.01 and 10 for plotting purposes.

Figure 6
Figure 6Comparison of the distribution of sample means and sample medians for the phase velocity at (0.1875,0.3875).The mean or median is calculated by drawing 4 n samples for n = 0 . . .6.This process is repeated 100,000 times to obtain the distributions of sample means and medians.The sample mean converges to a normal distribution slowly.