Seismic Architecture of the Lithosphere-Asthenosphere System in the Western United States from a Joint Inversion of Body-and Surface-wave Observations: Distribution of Partial Melt in the Upper Mantle

Quantitative evaluation of the physical state of the upper mantle, including mapping temperature variations and the possible distribution of partial melt, requires accurately characterizing absolute seismic velocities near seismic discontinuities. We present a joint inversion for absolute but discontinuous models of shear-wave velocity (Vs) using 4 types of data: Rayleigh wave phase velocities, P-to-s receiver functions, S-to-p receiver functions, and Pn velocities. Application to the western United States clarifies where upper mantle discontinuities are lithosphere-asthenosphere boundaries (LAB) or mid-lithospheric discontinuities (MLD). Values of Vs below 4 km/s are observed below the LAB over much of the Basin and Range and below the edges of the Colorado Plateau; the current generation of experimentally based models for shear-wave velocity in the mantle cannot explain such low Vs without invoking the presence of melt. Large gradients of Vs below the LAB also require a gradient in melt-fraction. Nearly all volcanism of Pleistocene or younger age oc-curred where we infer the presence of melt below the LAB. Only the ultrapotassic Leucite Hills in the Wyoming Craton lie above an MLD. Here, the seismic constraints allow for the melting of phlogopite below the MLD.


Introduction
The state of Earth's asthenosphere exerts a fundamental control on the tectonic and magmatic evolution of the crust and lithosphere. The asthenosphere is a rheologically weak layer beneath the lithospheric plates, with ambient temperatures near or above the solidus for silicate melting in a peridotite mantle. The low viscosities facilitate a wide range of advection processes that deliver heat and stress to the overriding plate, and the production, accumulation, and subsequent removal of partial melt drives volcanic and plutonic processes at plate-boundary and intraplate settings. In detail, the rheology of the asthenosphere likely depends strongly on the presence and distribution of melt, which is inferred to weaken mantle rocks at both geological and seismic time scales as it accumulates on intersti-tial grain boundaries (e.g. Hammond and Humphreys, 2000;Takei, 2002;Holtzman, 2016;Chantel et al., 2016;Takei and Holtzman, 2009). However, due to tradeoffs and uncertainty between the effects of melt, temperature, volatile content, and grain size on the seismic and other geophysical properties of the mantle, detailed quantification of the distribution of partial melt in Earth's mantle remains elusive.
Over the past decade, significant progress has been made in estimating the state of the asthenosphere beneath the diverse tectonic physiography of the western United States (Fig 1). This progress has been enabled by the deployment of EarthScope's USArray, which blanketed the continental US with seismic observations of sufficient density to resolve crustal and upper-mantle structure on length scales as small as 100 km, comparable to length scales of major tectonic features and boundaries, including mountain belts and vol-canic fields. This allows for accurate quantification of seismic characteristics at depths that can be directly compared to surface observations derived from geology and geochemistry (e.g. Plank and Forsyth, 2016;Porter and Reid, 2021). In particular, two imaging approaches have emerged that provide distinct but complementary constraints on crustal and upper-most mantle structure. Array-based surface-wave phase velocities provide excellent constraints on three-dimensional variations in absolute velocities in the upper mantle (e.g. Lin and Ritzwoller, 2011;Jin and Gaherty, 2015;Ekström, 2017), key for quantifying melt in the asthenosphere and its impact on overlying lithospheric structure. However, surface waves lack the ability to constrain abrupt velocity changes laterally or with depth, and surfacewave images contain strong trade-offs between reducing the model misfit and geologically reasonable but ad hoc constraints such as model smoothness and model length. Common-conversion-point (CCP) images of Sto-p converted phases (receiver functions) provide critical data on abrupt changes in velocity with depth (e.g. Kawakatsu et al., 2009;Rychert et al., 2007;Levander and Miller, 2012;Lekić and Fischer, 2014;Hansen et al., 2015;Liu and Shearer, 2021), including quantifying the change in physical characteristics across major boundaries within the lithosphere-asthenosphere system in two dimensions. These observations lack sensitivity to absolute velocity, however, making it difficult to quantitatively interpret them in the context of temperature, melt content, or other state variables. For example, Sto-p images of the upper mantle often produce sharp negative velocity gradients (NVGs) within the upper mantle (a negative gradient is defined as a decrease in seismic velocity with increasing depth). NVGs are often interpreted as the lithosphere-asthenosphere boundary (LAB) (e.g. Kawakatsu et al., 2009;Rychert et al., 2005Rychert et al., , 2007Kumar et al., 2012;Levander and Miller, 2012;Lekić and Fischer, 2014), but in some cases NVGs clearly fall within the lithosphere and are interpreted as a midlithospheric discontinuity (MLD) (e.g. Abt et al., 2010;Ford et al., 2010Ford et al., , 2016Fischer et al., 2010) of widely debated origin (Hansen et al., 2015;Selway et al., 2015;Saha et al., 2021;Karato et al., 2015;Helffrich et al., 2011). Distinguishing between these interpretations requires additional information to constrain temperature, such as absolute velocities.
The joint inversion of surface waves and receiver functions merges the best attributes of each technique: constraints on absolute velocities from surface waves with rapid transitions in velocity with depth resolved by receiver functions. Thus, much more confident interpretations of the resulting structures are possible: accurate absolute velocities both above and below an NVG enable a more explicit interpretation than is possible from each observation independently. Joint inversions of surface wave and receiver function data are now quite common. Primarily, these efforts consist of joint inversion of P-to-s converted wave data to better constrain crustal thickness (e.g. Chai et al., 2015;Delph et al., 2015;Shen and Ritzwoller, 2016;Delph et al., 2018). More recently, inversions incorporating S-to-p conversions have improved quanti-tative velocity estimates across upper mantle discontinuities such as the LAB (e.g. Bodin et al., 2016;Eilon et al., 2018). These localized inversions model the full receiver function at individual stations, and a benefit of these inversions is their lack of imposed constraints; however, this can lead to complex velocity models that vary considerably between stations and can be difficult to explain geologically.
In this paper we present an alternative joint inversion of surface wave and receiver function data that takes advantage of our geological intuition. We think of the upper 400 km of the earth as a layered structure, with a crust overlying a strong high-velocity lithosphere, which in turn overlies a lower-velocity asthenosphere. Previous studies of receiver functions provide spatially coherent sets of data that define the layering, specifically the depth to (or more accurately, the travel time to) and magnitude of abrupt velocity changes, including the Moho and (in many regions) the lithosphereasthenosphere boundary. Surface-wave dispersion constrains the absolute shear velocities within this layered framework. The resulting 3-D layered velocity model provides new constraints on the absolute velocity at the top of the asthenosphere, enabling unique quantitative estimates of partial melting in the upper mantle.

Tectonic Background
To first order, the continental United States can be divided into a tectonically stable (cratonic) eastern half, and a western half characterized by active and/or recent tectonic deformation. The crust and upper mantle in the active western US has long been observed to be seismically distinct from the stable east, with lower seismic velocity and high seismic attenuation in the upper mantle suggesting higher temperatures and the presence of partial melting (e.g. Grand and Helmberger, 1984;Humphreys and Dueker, 1994;Pakiser, 1963;Solomon, 1972), which also correlate with higher elevations and heat flow relative to the eastern continent. The western half can be further subdivided into provinces that feature distinct magmatic and tectonic activity. USArray and similar regional broadband deployments enable a detailed characterization of the subsurface on small regional scales. Fig 1 highlights the major provinces and geologic features that we focus on here.
The eastern edge of our study region captures the western portion of stable North America (SNA), which primarily consists of Archean and Proterozoic basement overlain by Phanerozoic sedimentation (Whitmeyer and Karlstrom, 2007). Upper-mantle seismic wavespeeds in the area are high (e.g. Schmandt and Humphreys, 2010;Shen and Ritzwoller, 2016;Porter et al., 2016), and NVGs are usually interpreted as an MLD (e.g. Hopper and Fischer, 2018). Abutting the stable platform to the west are high-standing mountain ranges and moderately deformed plateaus that were uplifted during the widespread Laramide orogeny from the late Mesozoic to the early Cenozoic, including the modern Rocky Mountains, the Archean-cored Wyoming province, and the Proterozoic-cored Colorado Plateau (CP). Subsequent to Laramide uplift, the Wyoming Cra-ton returned to relative quiescence (Humphreys et al., 2015), and the subsurface is characterized by moderately thick, high-velocity lithosphere (Shen and Ritzwoller, 2016;Porter et al., 2019;Xie et al., 2018). In contrast, from the mid-Cenozoic onwards, volcanism and modest extension have encroached from the Basin and Range towards the center of the CP (Roy et al., 2009;Crow et al., 2011), creating a plateau "transition zone" along the western and southern borders with the Basin and Range that is characterized in the subsurface as highly thinned lithosphere underlain by anomalous hot asthenosphere (Schmandt and Humphreys, 2010;Levander et al., 2011;Shen and Ritzwoller, 2016;Porter et al., 2019;Golos and Fischer, 2022). These features are absent from the eastern side of the CP and the southern Rocky Mountains. Localized volcanic centers in the region can be highly voluminous (e.g. Marysvale volcanic center), and persist to recent times.
Further west and south lies the modern Basin and Range province (BR), interpreted to be a former highstanding orogenic plateau that underwent significant, wide-spread extensional collapse during the middle-tolate Cenozoic. Prior to extension, the region experienced a sweep of volcanic activity that is expressed primarily as widely distributed ignimbrite-producing calderas (Best et al., 2016). Today, the region is characterized by anomalous thin crust (e.g. Gilbert, 2012) and lithosphere (e.g. Lekić and Fischer, 2014;Hansen et al., 2015;Hopper and Fischer, 2018;Kumar et al., 2012;Levander and Miller, 2012) underlain by hot asthenosphere (Humphreys and Dueker, 1994;Plank and Forsyth, 2016;Porter and Reid, 2021). Volcanism in the region is highly distributed throughout the province, and persists to recent times. North of the BR, the Snake River Plain (SRP) stretches from the Yellowstone Hotspot to the High Lava Plains of central Oregon, and is characterized by voluminous surface volcanism that initiated at approximately 15 Ma and continues to the present. Seismic characterization of the subsurface suggests that the entire SRP is underlain by hot asthenosphere (e.g. Schmandt and Humphreys, 2010;Shen and Ritzwoller, 2016;Porter and Reid, 2021).
We limit this presentation to the region shown in Fig 1, which captures a rich diversity of tectonic environments while also avoiding subducting slabs and other plate-boundary complexity to the west and north (for example, see Schmandt and Humphreys, 2011) that may not be well described by the three-layer parameterization that we describe below.

Datasets
We construct profiles of seismic velocity from depths of 0 to 400 km by combining four published datasets with complementary sensitivity to structure. Each dataset is derived from seismic data recorded by the EarthScope USArray, including the Transportable Array (nominal background station spacing of 70 km) plus more densely spaced Flex Arrays and other regional data sets. In each case described below, we refer the reader to the relevant citations for the specific data utilized and methodological details.  son (1946), with modifications described in the text. Red circles approximately demarcate select volcanic fields that are discussed in the text. White labels are names used for features in the main text.

Surface-wave phase velocities
We use the phase velocities of Rayleigh waves in three non-overlapping period bands from three studies. From 8 to 15 s, we use phase velocities from Ekström (2017). These phase velocities were estimated from ambient seismic noise using Aki's formula (Ekström et al., 2009;Ekström, 2014Ekström, , 2017. From 20 to 100 s, we use the phase velocities of Jin and Gaherty (2015) derived from the cross-correlation of Rayleigh waves from teleseismic events, with Helmholtz tomography applied for correcting focusing effects (Lin and Ritzwoller, 2011). From 20 to 40 s, these data agree well with the ambient-noise results of Ekström (2017). We extend our phase velocity dataset over 120-180 s with the results of Babikoff and Dalton (2019), who used the cross-correlation methodology of Jin and Gaherty (2015). Maps of phase velocity at periods of 10, 60, and 120 s across our study area are shown in Fig 2, with periods chosen to show one map from each of our three sources. Uncertainties vary by period and are estimated in the referenced studies, varying from 0.025 to 0.097 km/s at 10 and 180 s, respectively.

P-to-s conversions from the Moho
Conversions of teleseismic P-to-s phases provide constraints on both the depth to and the contrast in seismic velocity across the Moho. While most P-to-s studies of the crust focus on constraining intracrustal properties, including Moho depth and Vp/Vs ratio (e.g., Gilbert, 2012), the study of Shen and Ritzwoller (2016) constrain both depth and contrast across the Moho by fitting the Figure 2 Phase velocities of Rayleigh waves at 10, 60, and 120 s in panels A, B, and C, respectively.
waveforms of the P-to-s receiver functions as part of a joint inversion along with the phase velocity, group velocity, and ellipticity of Rayleigh waves. Two constraints are extracted from the model of Shen and Ritzwoller (2016): first, a time to the Moho by calculating the travel time of a vertically propagating S wave from the surface to the Moho through the model, accounting for variations in the Vp/Vs ratios in the crust from ; and second, the contrast in velocity across the Moho from the difference in velocity directly above and below the discontinuity. Uncertainties for both quantities are directly calculated from errors given on Vs at each depth and are divided by a factor of 4 to convert from a standard deviation to a standard error (see section 4.2 of Shen and Ritzwoller (2016) for a discussion). We then apply a Gaussian filter with a width of 0.25°to both datasets to approximate the smoothness of the 20-32 s phase velocities, and the resulting datasets are shown in Fig 3a,b.

S-to-p conversions from an NVG
Travel times to and the velocity contrast (including uncertainties) across the NVG are provided by Hopper and Fischer (2018). A spatially varying Vp/Vs ratio in the crust from  is used to convert from the observed S minus P times to S times for a vertically propagating wave, to match the type of constraint on the Moho described above. Converted-phase amplitudes are converted to a change in velocity over a specified width of the NVG (Supplementary section S1, and see Hopper and Fischer (2018) for details). The widths of the NVG are not directly observed in the original S-to-p receiver functions, but are imprecisely inferred from waveform modeling. We explore modifications to the width during the subsequent inversions and so consider the widths a different type of constraint than other data. Finally, we apply a Gaussian filter with a half-width of 0.5°to approximate the smoothness of the 50-100 s phase velocities to both datasets. The magnitude of the velocity contrast across the NVG ranges from 4-15% across the study area. Filtered travel times and velocity contrasts are shown in Fig 3c,

Pn velocities
The final dataset we use is the velocity of Pn phases taken from Buehler and Shearer (2017). Pn travels along the underside of the Moho, and we use the observed Pn velocity to derive a direct constraint on shear velocity just below the Moho. This requires an assumed Vp/Vs ratio for the shallow mantle, as well as an adjustment to account for anisotropic structure, as Pn phases are primarily sensitive to the P-wave velocity in the horizontal plane, V ph, while Rayleigh waves and phase conversions are primarily sensitive to the S-wave velocity in the vertical plane, V sv. We assume a mean V p/V s ratio of 1.76 and correct for radial anisotropy assuming a (V sh/V sv) 2 of 1.04 (Clouzet et al., 2018) with the scaling relationships of Montagner and Anderson (1989). The estimated shear velocities for the upper-most mantle (immediately beneath the Moho) are shown in Fig 4. We assign a large uncertainty of 0.1 km/s to this constraint, which results in a weaker constraint on our final models than the other three datasets. This uncertainty is based on observed variations of sub-Moho Vp/Vs in the uppermost mantle for portions of the western United States (Buehler and Shearer, 2014), as well as the significant uncertainty in radial anisotropy at the relatively short (tectonic) scales represented here.

Inversion approach
Our philosophy in this inversion is to capitalize on the geological intuition that, to first order, the shallow velocity structure of the Earth can be described by three layers coinciding with the crust, lithospheric mantle, and asthenospheric mantle (with ambiguity in the terminology in the case of an MLD). Receiver function studies constrain the boundaries between these layers (Ps and Sp for the Moho and NVG, respectively) and phase velocities of surface waves provide constraints on the absolute velocities within the layers. Using a linearized least-squares approach, we invert these data for a set of one-dimensional shear velocity models at each point within a geographic grid with 0.25°spacing in both latitude and longitude. Within each layer, the shear velocity is constrained to behave smoothly. The thickness of the Moho and NVG are assumed a priori; the Moho jump is assumed to occur over 1 km, while the breadth of the NVG is taken from Hopper and Fischer (2018) and ranges from 10-50 km ( Figure S1). Alternative choices for the width of the NVG are discussed below. By com-

Figure 3
Constraints from converted phases. Panels on the top row are data describing the Moho and on the bottom row are data describing the NVG. The left-hand column shows contrasts in velocity with increasing depth in percentage relative to the shallower layer, and the right-hand column shows the travel time expressed as the travel time of a vertically propagating S wave from the mid-point of the discontinuity to the surface. The contrast in Vs across the NVG (panel C) depends on the breadth of the NVG (see Supplementary Section S1) bining these one-dimensional profiles, we construct a three-dimensional, layered shear-velocity model for the region.

Model Parameterization
We define the model to be solved for ( Fig 5) as where s is a vector of vertically polarized shear wave velocities (V sv) defined at fixed depths, and t is a vector of the thicknesses of layers above discontinuities in the model, in this application corresponding to the crust and the mantle layer above the NVG (Fig 5). These abrupt boundaries are not explicit discontinuities in velocity (the Moho has a width of 1 km, and the NVG has variable width), but to simplify the terminology we call them "discontinuities" in the following discussion. The model is constructed such that the top and bottom of each discontinuity corresponds explicitly to an element of s. In the layers above and below the discontinuities, an integer number of elements in s is chosen so that the spacing between elements is greater than 6 km or so that there are at least 5 elements. The number of elements in s in a given layer may update when the elements of t change. Linear gradients in V sv are assumed between each point in s. The shear-velocity models presented here utilize 67-72 parameters at each location: the two values of t and 65-70 values of s as a function of depth z. We initialize the inversion using a starting model constructed with velocities above a depth 150 km taken from Shen and Ritzwoller (2016), and velocities from 150-410 km depth taken from PREM (Dziewonski and Anderson, 1981). In all cases investigated, both with synthetic and real data, the final model was found to be essentially independent of the starting model.
The Gauss-Newton method is used to find a model of this form that adequately fits each dataset under regularization. Partial derivatives between the observations and model parameters are found with MINEOS and directly from the geometry of the model (boxed equations in Fig 5, for the surface and body wave observations, respectively. MINEOS is executed on a constant grid (purple in Fig 5), and the inverse model must be related to this intermediate structure. Details are given in the Appendix.

Uncertainties on parameters from the recovery of synthetic models
We assess the resolving power of the data and inversion by attempting to recover known velocity models. We first invert two velocity profiles to evaluate the relative importance of the different observations used in this study for accurately characterizing key components of the lithosphere-asthenosphere system. For these two tests, noise is not added to the synthetic data, and the thickness of the Moho and NVG are 1 and 10 km, respectively. The first model (black line in Fig 6a) features a nearly linear gradient above the Moho, a moderate negative gradient with some curvature below the Moho, and a large NVG. The second model (black line in Fig 6b) features stronger curvature in the crust with a steep slope above the Moho, a steep negative slope below the Moho, and a lower minimum Vs below the NVG. When these two models are inverted with only the surface wave and P-to-s constraints (yellow models in Fig 6), the crust is reasonably well reconstructed, but the layered structure in the mantle is not accurately recovered. At the depth where the minimum Vs is reached in the input models, Vs is overestimated by 0.2 and 0.4 km/s, and the steepness of the gradient in Vs is underestimated both above and below the NVG. Adding constraints from S-to-p converted phases leads to an excellent recovery of the first model at all depths, and the inclusions of the head-wave velocities does not noticeably affect the outcome. For the second model, however, the head waves are necessary to properly estimate the gradient below the Moho, which leads to an improvement in recovery both above and below the NVG. Crustal structure -and not mantle structure -appears to be the primary control on whether the slope below the Moho can be recovered without the head waves (Supplementary Section S2). We conclude that all four datasets are necessary to accurately describe the upper mantle, with the head-waves supplying the least information.
To further evaluate the modeling approach and to quantify uncertainties for key model characteristics, we generate 500 random velocity models (Supplementary Section S3), add noise to synthetic data predicted for each model (see Section 2 for the uncertainties on each dataset), invert, and compare the resulting model with the input model. We seek to quantify the recovery of several key parameters of the layered models: the depths to a Moho and an NVG; the shear-velocity contrast across the Moho and NVG; the shear-wave velocity immediately above and below the Moho, and immediately above and below the NVG; and the slope of the shear-wave velocity within 10-km above the Moho, between the Moho and the NVG, and within 50-km below the NVG. We attempted to recover the second derivative of shear velocity within the layers but conclude that the data lacks a strong intrinsic constraint on the curvature of the velocities in any of the three layers (Fig 6). Table 1 quantifies our ability to accurately recover these key parameters, in the form of the standard deviation of the difference between the input and recovered parameters in this test. Since we have not utilized data with direct constraints on shallow crustal structure, we do not interpret values in the upper half of the crust.  Table 1 Errors on specific features, based on the inversions of many synthetic models

Figure 6
Results of inversions of synthetic datasets. In both panels, black lines are the models used to generate the synthetic dataset, and models in color are inversions of the datasets described in the legend. Ps is P-to-s conversions from the Moho, SW is surface wave phase velocities, Sp is S-to-p conversions from the NVG, and HW is the velocity at the top of the mantle constrained by head waves.

Preferred inversion of the data
We invert the suite of observations from Section 2 for 3D models of shear velocity over the study region by ap-plying the 1D parameterization in Section 3 on a 0.5 by 0.5 degrees spatial grid. The resulting models satisfy the discrete observations within estimated uncertainty (Fig 7). Misfits of the model predictions to each dataset expressed as the mean squared error, χ 2 , are very low, exceeding the nominal target value of one (where one means that on average the data is fit to the error) only for the phase velocities at 25 s and the velocity contrast from Ps conversions. These higher misfits likely indicate tension between the two observations as to the contrast at the Moho, but are acceptable. Given this uncertainty, we demonstrate the acceptability of our Moho structure by comparing predicted receiver functions between our model and the model of Shen and Ritzwoller (2016) in Supplementary Section S5. Fig 8 displays the depths of the two discontinuities across the region. The depth to the Moho varies from over 50 km at locations within stable North America to less than 35 km in much of the BR province. The Moho is shallower in the southern than northern BR, and the Colorado Plateau typically features transitional values between 35 and 50 km, with thinner crust beneath the transition zone along its southern and western margin. Overall, the Moho depths and their variation are generally consistent with previous studies from the region, with RMS difference of 2.3 km compared to  and 3.8 km relative to Gilbert (2012). Both are comparable to our uncertainty in Moho depth (2.5 km), with the greater difference compared to Gilbert (2012) likely caused by Gilbert (2012) migrating Ps conversions to depth with a fixed model, while both our study and  are joint inversions of data from receiver functions and surface wave phase velocities.
Depths to the NVG are typically greater beneath SNA (average of 90 km) than in the BR (average of 75 km), but the depths also feature shorter-wavelength variations that are likely associated with smaller-scale tectonic processes. Within the Basin and Range, the depths to the NVG are highly variable, especially in the north, with a swath of shallower depths in the south (Liu and Shearer, 2021). Relatively shallow depths extend from the BR through the Rio Grande Rift and into the southern Rocky Mountains. Somewhat greater NVG depths characterize the Colorado Plateau, Wyoming craton, and northern Rockies, but again with significant shortwavelength variations. Beneath the CP, a local maximum in the depth of the NVG occurs beneath the center of the plateau embedded among shallower NVGs to the west, south and east. Beneath the transition zone of the Colorado Plateau, depths are more similar to those beneath the BR than the center of the Plateau.
Fig 9 displays the regional variations in shear velocities and associated vertical velocity gradients directly above and below these layer boundaries. Lower crustal gradients are averaged over 10 km above the top of the Moho, and gradients below the Moho are averaged from the base of the Moho and the top of the NVG. Fig 10 shows cross-sections through our study area with shear-wave velocities specified every 1 km in depth. Velocities within the lower crust (Fig 9a) show a similar long-wavelength pattern to that seen in the depths to the Moho, but with more pronounced short-wavelength variations. Lower-crustal velocities are highest in SNA in the east and are lowest across a broad swath of the Basin and Range province. Velocities are 0.2-0.4 km/s faster in the northern-most BR than to the south, but this division occurs at approximately 39°N (Fig 9a) and so is not coincident with the decrease in crustal thickness that occurs 36°N (Fig 8a). The slowest lower-crust velocities in the BR surround the western, southern, and eastern rim of the Colorado Plateau, with a contrast in velocity from the BR to the interior plateau ranging from 0.3 to 0.5 km/s with only minor variations in velocity within the plateau itself. Low-velocity anomalies in the lower crust are typically associated with weak gradients in shear-wave velocity above the Moho, for example along the southern and western edges of the Colorado Plateau and at an anomaly beneath the San Juan Mountains at 38°N/108°W (Hansen et al., 2013). The gradient in the lower crust anticorrelates with the velocity in the lower crust over much of the BR and CP. However, the correlation is imperfect as maxima in the gradient correspond with intermediate shear velocities in the northern BR and the high crustal velocities in SNA are typically associated with intermediate or weak gradients.
The velocities between the Moho and the NVG (Fig 9b,e) are less variable than the other two layers, and less obviously correlated with surface tectonics. Shear velocities are high beneath the Great Plains and Wyoming craton, intermediate beneath the CP and much of the BR, and low only in localized anomalies such as beneath Yellowstone and the western transition zone of the CP. The vertical velocity gradient in the mantle lithosphere is generally positive over much of the region, with negative gradients localized to the Snake River Plain, the Marysvale volcanic fields (as also seen in the profiles in Figs 10,11), and the Rio Grande Rift. Local maxima in negative velocity gradients below the Moho in the same three locations were reported by both Shen and Ritzwoller (2016) (their Fig. 17) and Buehler and Shearer (2017) (their Fig. 7c,d). The negative gradients extend over a more extensive region in Shen and Ritzwoller (2016) than in this study or in Buehler and Shearer (2017), and we do not reproduce the local maximum in positive gradients in the BR in Buehler and Shearer (2017). The gradient in this depth range is the most poorly constrained feature of the model space (Table 1). The strongly negative gradients correspond to regions with slow surface-wave velocities (Fig 2b) and often with moderately large Moho contrasts. In a few locations, this results in a sub-Moho velocity gradient of similar magnitude to the imposed NVG associated with the Sp contrast. Forward modeling of Sp receiver functions confirms that these high-gradient models do satisfy Sp travel times within uncertainty, despite contradicting the intuition that the NVG should have the highest gradient in Vs with depth (Supplementary Section S6). Buehler and Shearer (2014) also directly observed Sn across a portion of our study region, and to first order our sub-Moho Vs variations are in agreement, noting the low Vs below the Moho beneath a wide swath of the Western Colorado Plateau in particular.
Velocities below the NVG exhibit pronounced patterns at both short and long wavelengths and excellent correlation with the tectonic provinces observed on the surface (Figs 9c,11). Velocities are high (Vs > 4.4 km/s) beneath most of SNA and the Wyoming Craton, with relatively low velocity anomalies of approximately 4.3

Figure 7
Fit to the surface (A) and body (B) wave datasets expressed as the mean squared error, χ 2 , for our preferred dataset and inverse approach. The datasets are described in Section 2. km/s only occurring beneath the Black Hills and from 35°N to 39°N along 104°W. Velocities are lower beneath the interior of the Colorado Plateau but are never < 4.2 km/s (Fig 10b,c,Fig 11b). Velocities beneath the transition zone of the Colorado Plateau are typically <4 km/s, and such low velocities span the entire BR province. Remarkably low Vs <3.9 km/s is observed in patches along the eastern, southern, and western rim of Colorado Plateau (Fig 9c,Fig 10b,c), extending from the transition zone into the plateau interior. This encroachment of BR-like structure inboard of the surface expression of the CP rim is observed at similar depths in a variety of geophysical imaging studies (e.g. Porter et al., 2019;Schmandt and Humphreys, 2010;Shen and Ritzwoller, 2016;Wannamaker et al., 2008;van Wijk et al., 2010;Xie et al., 2018), and distinguishes the boundary of the CP in the mantle from that in the crust, where the CP/BR transition correlates more closely with the surface expression of the plateau rim. Similarly, very slow Vs anomalies occur beneath the Snake River Plain but are not strongly associated with the modern Yellowstone hotspot (Fig 10a).
The spatial variations in shear velocity below the NVGs agree well with previous surface-wave tomography models of western North America, except that the absolute velocities just below the NVG are typically much lower due to the explicit inclusion of constraints from Sp conversions. At the depth of the base of the NVG in our model, the mean difference between our results for Vs and the Vs reported by Shen and Ritzwoller (2016), Porter et al. (2016), and Xie et al. (2018) are 0.17, 0.24, and 0.2 km/s, respectively, with peak differences of 0.45, 0.5, and 0.45 km/s. That the differences are a large fraction of the total range in Vs emphasizes the importance of the Sp constraint.
The vertical shear-velocity gradient within 50 km below the NVG (Fig 9f) has a tectonic affinity that is similar to the absolute velocities just below the NVG (Fig 9c). The correlation coefficient between these model characteristics is high (0.89), and nowhere in the study area do these two quantities deviate from this correlation outside of twice the standard error. This behavior differs from the crust and the shallow mantle layer, where correlations between absolute velocity and the gradient  are not always strong. The strong correlation between sub-NVG shear-wave velocity and the associated gradient could hypothetically be an artifact of our inversion procedure -the model is damped to the Vs in PREM at 400 km depth (4.75 km/s), and so overly damping the second derivative could force a correlation between absolute velocity and the average gradient. However, the set of randomized synthetic models discussed in Section 3.5 have no correlation between sub-NVG velocity and associated vertical gradient, and the inversion of the synthetic datasets produced models with a negligible correlation coefficient (0.05) between these model characteristics (see Supplementary Section S3). We conclude that the strong correlation in the inversion of the real dataset between velocity and the gradient of velocity is robust.

Impact of Modeling Choices
The inclusion of an NVG that explains Sp conversions is the primary difference between our study and previous shear-velocity models of the upper mantle in the western US (e.g. Shen and Ritzwoller, 2016;Porter et al., 2016;Xie et al., 2018). The inclusion of the NVG lowers Vs in the mantle just below the discontinuity, compared to models that vary smoothly with depth (Fig 6). We quantify this effect by performing the inversion without an NVG and associated Sp data in the modeling and find the difference between this new model and the preferred inversion at the depth of the base of the NVG (Fig 12a). Omitting the Sp constraints results in significantly higher velocities compared to the preferred model at all locations (Fig 12a), with the largest differ-ences (up to 0.4 km/s) falling within the BR. The mean effect of the Sp constraint is 0.16 km/s, which is nearly identical to the mean difference between our preferred model and Shen and Ritzwoller (2016). The spatial variation in these differences correlates strongly with the magnitude of the Sp-derived velocity contrast (Fig 3c), demonstrating the strong impact of these observations on the model. However, the difference is less well correlated with the modeled velocity beneath the NVG (Fig 9c), suggesting that the surface-wave phase velocities (Fig 2b) also play a significant role in constraining the minimum velocities reached beneath the NVG.
The incorporation of head-wave velocities (Fig 4) represents a second difference compared to prior models, and we test the impact of this choice by comparing the preferred inversion to one omitting these observations (Fig 12b). The use of the head-wave constraint systematically increases velocities just below the Moho over a wide swath of the study region. As suggested by Fig 6b, Pn constraints are accommodated by producing models with negative vertical gradients in the mantle lithosphere; the average velocity across this upperlithosphere layer is primarily controlled by the surfacewave data and remains largely unchanged between the preferred model and the model lacking Pn constraints. The difference between the models is not strongly correlated with the Pn constraints (Fig 4), and is largest where the crust is thick below the Rockies and over much of the Colorado Plateau. The effect is more muted over much of the Basin and Range and SNA. These differences likely reflect the correction that the head-wave constraint makes to the synthetic test in Fig 6b (see Supplemental Section S2 for further discussion). Finally, we make an important choice in the construction of the preferred inversion by assuming spatially variable widths of the NVG that are constrained by modeling Sp waveforms in Hopper and Fischer (2018). The widths of the discontinuities are only loosely constrained, ranging from 10 to 50 km ( Supplementary Figure S1), and the implied velocity contrasts depend on the width, becoming larger as the width of the boundary increases (Rychert et al., 2007). We test the effect of this choice by inverting for an alternative set of models that utilize NVG widths that are half of the value of the widths estimated by Hopper and Fischer (2018). The widths are bounded at a minimum of 10 km. The difference at the base of the NVG between a model using the half widths and our preferred inversion are shown in Fig 12c. The primary effect is to increase velocities by up to 0.3 km/s in several localized areas, with marginal difference in many locations. On a regional scale, the effect is greatest in the central Basin and Range and to the south-west of the rim of the Colorado Plateau, where the velocities in the preferred model are system-atically slower by 0.1-0.2 km/s compared to a model with a sharper NVG. Some of the most pronounced anomalies, such as beneath the Snake River Plain and the Marysvale volcanic fields, are unaffected by the change in the width, and may be driven more strongly by the constraints from surface waves than from receiver functions.

Discussion
The relationship between the NVG and the lithosphereasthenosphere system is not always straightforward. A common inference is that the NVG is the lithosphereasthenosphere boundary. Under this interpretation, the high velocity layer on the shallow side of the NVG is the lithosphere, which is cooler and possibly compositionally distinct from the underlying asthenosphere; in contrast to the lithosphere, the asthenosphere is hotter and may have additional reduction in velocities due to hydration or the presence of melt (e.g.  . This interpretive framework fails to explain NVGs in locations where the extension of high velocities to sufficiently great depths is inconsistent with warm asthenosphere below the discontinuity. When the NVG is thus within the lithosphere, the term "Mid-lithospheric Discontinuity" is commonly used and the cause must be different Ford et al., 2010). A change in the hydration of the upper mantle offers a universal mechanism for both LABs and MLDs (Olugboji et al., 2013;Karato et al., 2015), but competing possibilities include the metasomatism of the lithosphere (Hansen et al., 2015;Selway et al., 2015;Saha et al., 2021) or anisotropy (Wirth and Long, 2014;Ford et al., 2016). In some cases, the NVG may even lie within the convecting asthenosphere (Byrnes et al., 2015). A key difficulty when interpreting the NVG is that only the depth and contrast in velocity are typically known. In many places including in the Western United States, precisely where discontinuities transition from an LAB to an MLD is uncertain Lekić and Fischer, 2014;Hansen et al., 2015). The absolute velocity models presented in this study reduce the ambiguity in the interpretation of the NVG.
We use our preferred model for the region to evaluate the physical state of the lithosphere-asthenosphere system across the western US. The refined constraints on absolute shear velocity and associated gradients above and below the NVG are compared to those predicted for experimentally based solid-state models of an olivinedominated upper mantle. We find that the lithosphereasthenosphere system falls into one of three states: (1) regions where velocities below the NVG are too low to be explained by plausible solid-state models, requiring the presence of partial melt in the asthenosphere; (2) regions where melt is not required in the asthenosphere, but associated temperature estimates suggest that the NVG represents a lithosphere-asthenosphere boundary; and (3) regions where temperature estimates below the NVG imply that the NVG is within the thermal boundary layer, and thus an MLD.
To make predictions for the shear-wave velocity in a melt-free upper mantle, we use models based on two experimental deformation studies, as implemented in the Very Broadband Rheology (VBR) Calculator (Havlin et al., 2021). The first study, Jackson and Faul (2010), hereafter JF10, measured the shear modulus and dissipation in fine-grained, nominally melt-free olivine samples and provided a model for the velocity and attenuation of a shear-wave at seismic frequencies that depends on the temperature and grain-size of the upper mantle. The second study, Yamauchi and Takei (2016), hereafter YT16, proposed a model for the velocity and attenuation of shear-waves in the upper mantle that additionally depends on the melting temperature of the upper mantle. The measurements were made on an organic material that scales to upper mantle conditions when experimental frequencies are normalized by the Maxwell frequency (McCarthy et al., 2011). A "pre-melting" reduction in viscosity occurred in their experiments that causes YT16 to predict lower shear-wave velocities than JF10 at the same temperatures and grain-sizes where the temperature is near the solidus. We assume the as-thenosphere is at the solidus when using YT16, which will be the case if the asthenosphere in the Western United States features typical concentrations of either water or CO2 (Yamauchi and Takei, 2020). Havlin et al. (2021) provide a detailed comparison of JF10 and YT16 and their implementation in the VBR -in the terminology of the VBR, JF10 is eburgers_psp with the bg_peak fit, and YT16 is xfit_premelt.

Distribution of Partial Melt in the Asthenosphere
We evaluate whether the shear-wave velocities above and below the NVG are consistent with a melt-free or melt-bearing upper mantle. The presence of melt below an NVG can often explain contrasts in velocity too great to be explained by other means. Our results provide two pieces of information typically not available for testing this hypothesis: the absolute value of the shear-wave velocities at the NVG and the gradient in shear-wave velocity below the discontinuity.
To use the VBR to test the hypothesis that the mantle is melt-free, we first calculate shear-wave velocities for a range of potential temperatures and grain-sizes with both JF10 and YT16. Bayes's theorem is used to infer the probability that the observations can be explained by the predictions (see Havlin et al., 2021, for details), both of which are for a sub-solidus mantle. The a priori distribution of potential temperatures is Gaussian with a mean and standard deviation of 1400 and 75°C. These values encompass the range of potential temperatures inferred for the western United States at several sites of volcanism in previous studies within two standard deviations (i.e. Plank and Forsyth, 2016;Porter and Reid, 2021), neglecting higher temperatures that are possible at the Yellowstone hotspot. An adiabatic effect of 0.5°C /km converts from potential temperature to temperature. The prior distribution for grain-sizes is log-normal with a mean of 5 mm and a (unitless) standard deviation of 0.75. This is chosen to encompass plausible estimates of grain-sizes in the asthenosphere (Ave Lallemant et al., 1980;Karato and Wu, 1993;Behn et al., 2009), with a grain-size of 1 mm occurring at the approximate 95% lower bound of the prior, and a grain size of 1 cm occurring at the 95% upper bound. The calculations utilize a period of 100 s (appropriate for the asthenosphere), and an uncertainty of 0.08 km/s on Vs below the NVG (Table 1). We present the calculations without a correction for radial anisotropy even though our model only constrains V sv because current evidence suggests that (V sh/V sv) 2 is small. A (V sh/V sv) 2 of 1.04 (Clouzet et al., 2018) would correct a V sv of 4.0 km/s to 4.02 km/s, which is within the observational uncertainty.
The hypothesis that the upper mantle can be explained by JF10 and YT16 is rejected at 95% confidence across much of the study area (Fig 13a). JF10 and YT16 can both explain the observations without invoking the presence of melt down to shear-wave velocities of approximately 4.0 km/s, with slight deviations due to variations in the depth of the NVG (Fig 8b). The two models do not, in general, predict precisely the same Vs under the same conditions and the close agreement of the 95% limit for the two models occurs because they reach similar minimum Vs values at high-temperatures and small grain-sizes. Velocities below nearly the entire Basin and Range province, the Rio Grande Rift, and the CP transition zone cannot be explained by either model, and therefore likely require retained asthenospheric melt. The center and northern portions of the Colorado Plateau, the bulk of the Rocky Mountains, and SNA to the east all feature Vs consistent with a meltfree upper mantle. The melt-free hypothesis is rejected with greater confidence for more pronounced low velocities anomalies, with probabilities becoming as low of 10 -5 and 10 -3 for JF10 and YT16, respectively. Nearly all volcanism of age Pleistocene or younger in the NAV-DAT database (Glazner, 2004;Walker et al., 2004) lies where the melt-free hypothesis has been rejected (green circles in Fig 13a). The Leucite Hills (LH) in Wyoming is the only volcanic field to clearly lie outside of the confidence interval for both JF10 and YT16; the Raton-Clayton volcanic field (RCV) near 37 o N, 104 o W coincides with a slight divergence of the two models and lies near but outside of the confidence interval for YT16 and partly outside for JF10.
Within the region where a solid-state asthenosphere can be confidently rejected, we can utilize the shear velocity estimates to hypothesize variations in retained melt fraction. To do so, we find the difference in shearwave velocity between the observations (Fig 9c) and the 95% confidence interval for YT16, and use the model of Hammond and Humphreys (2000) (1% melt = 8% Vs reduction) to convert residual velocities to a melt fraction. We find that melt fractions below 1% across the entire study area are sufficient to explain the shear-wave velocities below the NVG (Fig 13b). Such melt fractions are in accord with the amount of melt that can plausibly be retained in the upper mantle without being rapidly extracted (Faul, 1997(Faul, , 2001. In detail, the effect of melt fraction on shear-wave velocity is uncertain (e.g. Holtzman, 2016;Chantel et al., 2016), due primarily to a strong dependence of velocity on the poorly known aspect ratio of melt inclusions. At higher aspect ratios than assumed in Hammond and Humphreys (2000) (e.g. Garapić et al., 2013), smaller melt fractions can explain our observations (Takei, 2002). The relative distribution of retained melt is robust if the geometry of the melt inclusions are constant across the study area, although variations in the inclusion aspect ratio are possible (Holtzman and Kendall, 2010).
The estimates of shear-velocity gradient below the NVG (Fig 13c) provide an additional test on the necessity of the presence of retained melt in the asthenosphere, and a possible constraint on melt distribution. We consider two hypotheses for the large positive slopes in Vs below the NVG: an increase in the grain-size of the upper mantle with increasing depth (Faul and Jackson, 2005), or a decrease in the melt fraction with increasing depth. For the former, we generated a suite of velocity profiles for increasing grain-sizes using both JF10 and YT16. Assuming a nominal asthenosphere temperature of 1400°C, we search over gradients in grain sizes of 0 to 333 mm/km (Faul and Jackson, 2005) and a mean grain size within the gradient zones of 1 mm to 1 cm. Models with grain sizes that go below 1 mm are not considered. Grain-size increases fail to produce the range of slopes in Vs observed in our models, with both JF10 and YT16 spanning only one-fourth to onehalf of the range of slopes observed in the study area (polygons in Fig 13c; see Supplemental Section S7 for the individual calculations). Note that these polygons do not account for variations in temperature, and so the range of predicted V s is more limited than found in the Bayesian test in Fig 13a. In contrast, assuming a reference model with a velocity of 4.25 km/s and a gradient of Figure 13 Caption on next page. 14 SEISMICA | volume 2.2 | 2023 Figure 13 Tests of the hypothesis that the upper mantle is melt-free. A) Shear-wave velocity below the NVG is contoured and identical to Fig 9c, 95%  2.2 (km/s)/km x 10 -3 (Gaherty et al., 1996;Stixrude and Lithgow-Bertelloni, 2005;Tan and Helmberger, 2007), including melt fractions just below the NVG from 0 to 1.5% that linearly taper to 0% over 50 km depth can explain the full range of Vs and the vertical gradient in Vs to within error (Fig 13c). This distribution is qualitatively consistent with melt production in the 120-150 km depth range in a hydrated (Katz et al., 2003) and/or carbonated  asthenosphere, accompanied by an upward migration and systematic accumulation of melt between the initiation depth and the base of the thermally controlled lithosphere. The latter is consistent with the accumulation depth of mafic melts from the region (Plank and Forsyth, 2016;Porter and Reid, 2021). In detail, the intrinsic sensitivity of the surface-wave constraint limits our ability to precisely define the depth extent of the melt-bearing zone (Supplemental Section S8).
The inferred distribution of partial melt is broadly in agreement with previous estimates of melt distribution in the region. Porter and Reid (2021) combine a smooth seismic-derived thermal model for the North America upper mantle with an assumed set of peridotite solidi to map out regions of likely partial melting in the asthenosphere. They find peaks in likely melting along the southwest and northwest margins of the Colorado Plateau transition zone and beneath the Snake River Plain that closely correspond to peaks in melt content shown here (Fig 13b). Our melt distribution is spatially more extensive, wrapping around the Colorado Plateau with significant melting beneath the northern Rio Grande Rift and southern Rockies; this difference most likely reflects the lower velocities that can be achieved in our discontinuous model compared to smooth surface-wave models. Debayle et al. (2020) combine shear-velocity and attenuation models with experimental constraints (YT16) to estimate melt content on a global scale. While they cannot resolve the regional variations evaluated here, they infer asthenosphere melt contents beneath the western US very similar to those found here (up to 0.7% over the entire region), as high as any other region in their model.
The melt distribution (Fig 13b) is not highly correlated with lithospheric thickness variations (Fig 8b); in particular, the shallowest depths to the NVG do not generally correlate with peaks melt content that might suggest the ponding of melt at the base of the lithosphere, as likely occurs in oceanic environments (Mehouachi and Singh, 2018;Sparks and Parmentier, 1991). Instead, melt is concentrated either along strong gradients in lithospheric thickness (e.g the CP transition zone), or in the broader Snake River Plain region. This suggests that thermal variations in the asthenosphere associated with small-scale and/or edge-driven convection (Schmandt and Humphreys, 2010;van Wijk et al., 2010;Ballmer et al., 2015) control melt accumulation, rather than topography on the base of the lithosphere (Golos and Fischer, 2022).
Our quantification of melt distribution omits the possibility that hydration (or other volatile-induced weakening) provides a plausible interpretation of shear velocities too low to be explained by solid-state mechanisms (e.g. Karato and Jung, 1998;Karato et al., 2015;Olugboji et al., 2013;Ma et al., 2020). Hydration is typically invoked to explain modest reductions in shear velocities, with minimum velocities in the range of 4.0-4.2 km/s, often in conjunction with additional constraints such as boundary sharpness (e.g. Gaherty et al., 1996;Mark et al., 2021) or shear attenuation (Ma et al., 2020). Our interpreted melt distribution displays shear velocities <4.0 km/s, which almost certainly requires a contribution of melt, and the hydration hypothesis does not provide an explanation for the large slopes in Vs below the NVG. Hydration or other volatiles may be important in explaining the NVG at more moderate asthenosphere velocities.

Interpreting the NVG -LAB, MLD, or something else?
The dominant mechanism controlling the state of the lithosphere-asthenosphere system (including the distribution of melt) is temperature. While the temperature associated with the LAB is depth dependent and not uniquely defined, most studies place the base of the lithospheric thermal boundary layer in the range of 1350-1450 o C (Priestley and McKenzie, 2006;Fishwick, 2010;Priestley and McKenzie, 2013;Porter and Reid, 2021), with higher temperatures clearly corresponding to convecting asthenosphere (e.g. Sarafian et al., 2017). We utilize the VBR to estimate temperature both above and below the NVG (Fig 14), with a goal of evaluating where the discontinuity represents the LAB and where it more like represents an MLD. In both cases, two estimates are made by fixing the grain size to 1 and 5 mm, and searching for the best-fitting temperature returned by JF10 with the VBR (Havlin et al., 2021). When estimating temperature below the NVG, we mask regions where we inferred retained melt in the previous sec-

Figure 14
Estimate of the temperature in the mantle at fixed grain sizes. Estimates are for below and above the NVG in the left-and right-hand columns, respectively, and at grain sizes of 1 mm and 5 mm in the top and bottom rows, respectively. The dashed-line in the right-hand column approximately marks the boundary between the LAB and MLD. The Leucite Hills (LH) and Raton-Clayon volcanic (RCV) fields are shown in purple and yellow in all panels.
tion (Fig 14a,c); inferred temperatures in these regions (>1500°C) are well over expected solidus temperatures (e.g. Sarafian et al., 2017), and masking them allows for a clearer evaluation of likely temperatures where melt is not required. The results show that the uncertainty in grain size introduces approximately ±50°C of uncertainty into the estimates, with higher temperatures inferred at larger grain sizes. The two volcanic fields above regions where melting in the asthenosphere was not inferred (the Leucite Hills and Raton-Clayton) are individually marked. The "LAB" interpretation likely applies across more of the study area than where we inferred melt in the previous section. The boundary between the LAB and MLD is marked in 14a,c approximately follows the 1300°C contour but should be interpreted as a semi-quantitative estimate. Below the Colorado Plateau, sub-NVG temperatures are within the range for asthenosphere, and the estimated temperature exceeds the volatile-free and water-bearing solidus (1490 and 1447°C, respectively, at a depth of 95 km) at both grain sizes tested (Hirschmann, 2000;Katz et al., 2003). High temperatures extend beneath much of the Rocky Mountains and across the borders of the Wyoming Craton and SNA, with a relatively broad region of higher temperatures near the RCV. The mantle below the Black Hills is likely an LAB regardless of the grain size. The Bayesian test in Section 5.1 does not exclude the possibility that there is melt beneath the NVG in these high-temperature regions. However, even if melt is not present, the disconti-nuity can be plausibly interpreted as an LAB in these regions by inferring temperatures typical of the asthenosphere below an inferred thermal boundary layer.
In the regions where we inferred melt below the NVG (masked in Fig 14a,c), the temperatures above the discontinuity (Fig 14b,d) are typically sub-adiabatic (that is, below a 1350°C adiabat). This conforms well to the hypothesis of a lithosphere-asthenosphere boundary, in that the NVG can be ascribed to the base of a thermal boundary layer with melt in the deeper asthenosphere. Thus, we confidently identify most regions that are masked in Fig 14a,c as LABs. In detail, a few locations within this zone (Snake River Plain, Marysvale volcanic field, Rio Grande Rift) have inferred temperatures above the NVG that are higher than expected for the lithosphere. This is similar to other temperature estimates for the region (Porter et al., 2019;Porter and Reid, 2021), and suggests that the distinction between lithosphere and asthenosphere in these regions is arbitrary. We speculate that the NVG in these locations could reflect an increase in the mobility of basaltic melt as depth decreases (Sakamaki et al., 2013), so that melt ponds within the asthenosphere at the depth where the NVG is observed. Such regions must have thinner lithosphere than marked by the NVG as we consider such high-temperature regions to be asthenosphere by definition.
In SNA on the eastern edge of our study area, our estimates of temperature below the NVG are clearly lower than a plausible potential temperature for the convecting asthenosphere by up to hundreds of degrees in some locations. Low temperatures extend beneath the Wyoming Craton as far south as 41°N and includes the region of the Leucite Hills (Fig 14a,c). Broadly, regions with shear-wave velocities exceeding 4.4 km/s below the discontinuity can be confidently ascribed to an MLD, with confidence increasing with increasing velocity. Whenever this condition is met, temperatures above and below the NVG are estimated to be below 1000°C (Fig 14b,d), with much of the great plains region characterized by temperatures at the MLD that are typical of cratons (<800°C). These temperature estimates provide important constraints on the plausible mechanisms producing the MLD, including crystallized metasomatic products (Hansen et al., 2015;Selway et al., 2015;Saha et al., 2021), and/or changes in intracrystalline deformation processes (Karato et al., 2015). We explore the implications of these constraints for the Lucite Hills region in the next section.

Relationship between NVGs and recent intraplate volcanism
Nearly all intraplate volcanism of Pleistocene or younger age occurred within the region where we inferred melt must be present beneath the NVG. Broadly, the volcanism in the western United States is sourced by asthenospheric melts at ambient to elevated temperatures, with compositions ranging from primitive to evolved (Fig 15a,b). Compositions are from NAVDAT (Glazner, 2004;Walker et al., 2004) with data from Mirnejad and Bell (2006) for the Leucite Hills included.
Relating the petrology of each of these eruptions to the upper mantle structure inferred here is beyond the scope of this study, but to first order an LAB at ∼70 km depth above a melt-bearing asthenosphere is consistent with petrologically inferred depths of magma generation (Golos and Fischer, 2022;Plank and Forsyth, 2016;Porter and Reid, 2021). Of the two volcanic fields that fall outside of the region where the Bayesian test in Section 5.1 required the presence of melt, the RCV is characterized by temperatures below the NVG (~1450 o C) that are near a peridotite solidus and exhibits compositions (yellow dots in Fig 15a,b) that fall along the bimodal trend for the rest of the volcanism (black dots in Fig 15a,b). Thus, some unique mechanism for explaining volcanism at the RCV is not required.
The Leucite Hills, in contrast, are both seismically and petrologically unique. First, the LH lie above an MLD, with temperature conditions well below the peridotite solidus. Second, looking at the LH petrologically, samples from the LH do not fall along the bimodal trend observed in the western United States because of a strong enrichment in potassium at a given SiO 2 (Fig 15a), and low Na 2 O/K 2 O and Al 2 O 3 /TiO 2 ratios (Fig 15b). Both of these observations appear consistent with metasomatism of the low-temperature lithosphere. Ultrapotassic compositions (Foley et al., 1987) are often explained either by the melting of recycled oceanic crust and possible reaction with surrounding peridotite (Dasgupta et al., 2007;Mallik and Dasgupta, 2013), or metasomatized veins within the lithosphere (Foley, 1992;Pilet, 2015). We do not consider the melting of recycled oceanic crust because the temperatures beneath the LH (Fig 14) are too low and recycled oceanic crust cannot explain the unique trace element profile in the LH (Fig 15b) (see Pilet, 2015, for a discussion). As discussed in the previous section, MLDs in general can also be explained by metamosomatic compositions in the lower lithosphere (e.g. Selway et al., 2015). How the lithosphere becomes metasomatized is not perfectly understood and beyond the scope of this study.
Metasomatic enrichment of the lithosphere both lowers the solidus and seismic velocity of the mantle, and so provides an explanation for the unique volcanism at the LH and the presence of an MLD. Both seismic and petrologic studies have suggested amphibole (Pilet, 2015;Pilet et al., 2008;Saha et al., 2021;Selway et al., 2015) and phlogopite (Hansen et al., 2015) as the active metasomatic phase that explains MLDs. Several lines of evidence support phlogopite for the location in question. The depth of the MLD beneath the Wyoming Craton (∼85 km) is very close to the maximum depth of stability for amphibole (Frost, 2006;Hansen et al., 2015) and the temperature below the boundary ( 1250 o C) is below the amphibole solidus (Pilet et al., 2008) and is thus unlikely to produce the LH melts. In contrast, phlogopite stability extends below the observed MLD (Frost, 2006), and the solidus at this depth (<1175°C; Thibault et al., 1992) implies that melt can be produced at the seismically inferred temperature. The composition of the magmas erupted at the LH also do not overlap with the experimentally measured composition of amphibole melts (Pilet et al., 2008) and are better fit by the composition produced by the melting of phlogopite (Thibault et al., 1992) in both major (Fig 15a) and trace element spaces (Fig 15a,b). The velocity contrast across the MLD in this region may be caused directly by the low velocity of phlogopite (2.47 km/s, which is for a temperature and pressure of 1175°C and 3 GPa; Hacker and Abers, 2004), but other factors such as a melt phase could contribute as well.

Conclusions
The inclusion of an NVG into the parameterization of seismic velocity profiles allows for the construction of models for shear-wave velocity across the Western United States that can simultaneously explain observations of Rayleigh wave phase velocities from short to long periods, P-to-s conversions from the Moho, S-top conversions from an NVG in the upper mantle, and Pn velocities. The resulting models allow for several advances in our understanding in the physical state of the upper mantle in this region: 1) The shear-wave velocity below the NVG is too low to be explained by the current generation of experimentally based predictions for shear-wave velocity in the upper mantle without invoking the presence of partial melt.
2) The shear-wave velocity below the NVG is strongly correlated with the slope of the velocity profile. As above, the large slopes cannot be explained without invoking the presence of melt in the upper mantle. Linearly tapering melt fractions from a maximum below the NVG to zero percent at 50 km deeper depth can explain both observations.
3) At nearly all locations where we infer the presence of melt in the upper mantle, the NVG can be interpreted as an LAB due to sufficiently high velocities above the discontinuity, and asthenospheric velocities below the discontinuity. 4) Beneath the Wyoming Craton and much of stable North America, Vs and associated temperature estimates below the NVG are too high to represent the asthenosphere and an MLD is inferred instead.
5) The presence of phlogopite in the upper mantle beneath the Leucite Hills can explain the presence of an MLD.
The inversion algorithm presented here provides a flexible and efficient platform for jointly inverting discontinuity constraints from scattered-wave imaging with velocity constraints from surface-wave phase velocities, at a variety of spatial and depth scales. The structures are best resolved when the discontinuity constraints include both depth and velocity-contrast information, and we encourage scattered-wave imaging analyses to document not only timing information but amplitude as well.

Appendix -Joint Inversion methodology 8.1 Inverse approach
We define the model to be solved for as where s is a vector of vertically polarized shear wave velocities (Vsv) defined at fixed depths, and t is a vector of depths to abrupt boundaries within the model, in this application corresponding to the tops of the Moho and the NVG. To solve for a model of this parameterization, we follow the framework of Russell et al. (2019) and Menke (2012), iterating over a linearized least-squares inversion to minimize the misfit between our predicted and observed values, δo, by making changes to the model parameters, p. Given a matrix of the partial derivatives of our observed values with respect to our model parameters, G, we have the following equation in matrix form: (3) G(p − p o ) = δo which can be rearranged to (4) Gp = δo + Gp o As we are now multiplying G by the model, p, rather than the model perturbation, we add linear constraint equations that are applied directly to the model. Following Menke (2012), where W e is a diagonal matrix of the uncertainties in the observations, i.e. 1 σ 2 , and W d is a diagonal matrix with the damping parameters for the constraint equations. The dampening constraints minimize the second derivative of the model, expressed by the matrix H, within each geologically defined layer (i.e. within the crust, above the NVG, and below the NVG). Smoothing constraints are not applied across the boundary layers so that the dampening is considered separately for each geological layer. The weight given to dampening parameters is placed along the diagonal of the matrix W d , with a value of 1, 2, and 4 used for the three layers, respectively, for all inversions of both real and synthetic data shown in this study. Once F is known, the Gauss-Newton least squares solution is and all that remains to be defined is the forward problem that predicts observations for a given model along with their partial derivatives.

The forward problem
We calculate phase velocities and associated partialderivative kernels using the spherical-earth normalmode solver MINEOS (Masters et al., 2011). We construct an input model for MINEOS by linearly interpolating velocities at depth (radius) intervals of 2 km between each node defined in s from the surface to 410 km depth. Below 410 km, we extend the model to the center of the Earth with PREM. The MINEOS model is parameterized to allow for radial anisotropy, incorporating independently defined values for P and S velocities in the vertical and horizontal directions, an anisotropic shape factor η, density, and shear and bulk attenuation. Because our Rayleigh-wave and Ps and Sp datasets have little sensitivity to the horizontal velocities, only the vertically polarized S-wave velocity (V sv) is independently varied in the inversion. We constrain V sh = V sv, V ph = V pv, and η is set to 1. Pwave velocities are scaled to the S-wave velocities using a V p/V s ratio of 1.76; the Vp/Vs ratios from  were already accounted for when calculating travel times to the Moho (Section 3.2), and tests including these values in the forward problem instead did not change the results. Phase velocities are corrected for physical dispersion based on a PREM Q model with a reference frequency of 35 mHz (Kanamori and Anderson, 1977;Dziewonski and Anderson, 1981). Forward calculations of receiver function travel times, the contrasts in V sv, and head wave velocities are direct given a velocity model. The velocity contrast is defined as the percentage change in shear velocity across the boundary layer relative to the velocity in the upper layer.

Dependence of phase velocities on the model
The sensitivity kernels for phase velocity at each period with respect to elastic parameters (P and S velocities) as a function of depth are straightforward to calculate using a normal mode formalism. Here we provide the mapping between mode-based partial derivative kernels for a smooth, finely sampled model space, and partial derivatives for our parameterization of relatively coarsely sampled values of velocity separated by abrupt discontinuities in velocity of a finite thickness ([s, t]). These partials are connected by the chain rule, where c is the phase velocity, p is an element of the parameterized inversion model [s, t], and vsv is an element of vertically polarized shear velocity in the MI-NEOS model. The first term on the right side of Equation 9, ∂c ∂vsv , is thus the existing partial with respect to the finely sampled MINEOS model, and the left side is the kernel that we seek. The final term, ∂vsv ∂p , describes the perturbation to a MINEOS model parameter given a change in the inversion model, and we analytically define these here. We use the V sv structure to demonstrate the relationship between dvsv and dp, but similar relationships can be expressed for other scaled and/or free parameters (e.g. V pv, V sh) utilized in the inversion. We first describe the dependence of elements in vsv for the velocities in s before giving the dependence for the thicknesses, t. The process is described graphically Fig 5. The depth vector, z, that gives the depth for each element in s is coarser and does not necessarily intersect with the regularly spaced MINEOS depth vector, d, and so velocities must be linearly interpolated between elements of s. Any change to any value in s at depth z i , s i ≡ s(z i ), will have non-zero impacts on vsv only where z i−1 < d < z i+1 , with two analytical forms for locations above and below the depth of the perturbation.
For any vsv points between z i−1 and z i , called vsv a in Fig 5 under "Dependence on s", Similarly for any vsv points between z i and z i+1 , called vsv b in Fig 5 under "Dependence on s", ∂ vsv(d) The remainder of the inversion model, p, are parameters controlling the depth to the top of each discontinuity. Because we define the coarse model s(z) to have a node corresponding to the top of each discontinuity, changes in the depth of a discontinuity directly correspond to changes in the thickness of the layer immediately above the discontinuity, which we define as the parameter t k , where k corresponds to the discontinuity in question. If z i is the depth to the top of the k th boundary layer, t k = z i − z i−1 . The width of the boundary layers, w, are fixed for any given inversion, but it is convenient to track these widths as w k = z i+1 − z i . The change in t k is balanced by a change in thickness of equal magnitude but opposite sign in the layer below the base of the discontinuity, i.e. z i+2 − z i+1 . As such, any change to any value in t, t k , will thus have non-zero impacts on vsv and z only where z i−1 < d < z i+2 . Using the above definitions for t k and w k , we define three expressions for the sensitivity of vsv to t k above, within, and below the discontinuity, respectively.
For any vsv points between z i−1 and z i (i.e. above the discontinuity), called vsv a in Fig 5 under "Dependence on t", For any vsv points within the discontinuity between z i and z i+1 , called vsv b in Fig 5 under "Dependence on t", For vsv points below the discontinuity between z i+1 and z i+2 , called vsv c in 5 under "Dependence on t",

Dependence of receiver function observations on the model
We have two kinds of observations from receiver functions: velocity contrasts across the boundary layers and travel times to the boundary layers. In the following discussion, we notate the k th boundary layer as extending from z i to z i+1 in depth, with velocity s i at the top and s i+1 at the base. Velocity contrast for the k th boundary layer, dV k , is only a function of the velocity above and below the boundary layer Travel time is a function of all s and t defined at z ≤ z i+1 , assuming that the converted wave energy originates on average in the center of the boundary layer.
For any s points shallower than z i , For others that will affect the calculated travel time, For any t shallower than z i , where s h−1 is the velocity at the top of the layer and s h is the velocity at the base of the layer of thickness t j , SEISMICA | volume 2.2 | 2023

Dependence of head wave observations on the model
The velocity of the Pn phase constraints the model below the Moho. The partial derivative of the predicted head wave velocity, HW v , with the shear velocity below the moho, vsv M , is given by