Homogenizing instrumental earthquake catalogs – a case study around the Dead Sea Transform Fault Zone

The creation of a homogenized earthquake catalog is a fundamental step in seismic hazard analysis. The homogenization procedure, however, is complex and requires a good understanding of the heterogeneities among the available bulletins. Common events within the bulletins have to be identified and assigned with the most suitable origin time and location solution, while all the events have to be harmonized into a single magnitude scale. This process entails several decision variables that are usually defined using qualitativemeasuresorexpertopinion,withoutaclearexplorationoftheassociateduncertainties. Toaddress this issue, we present an automated and data-driven workflow that defines spatio-temporal margins within which duplicate events fall and converts the various reported magnitudes into a common scale. Special attention has been paid to the fitted functional form and the validity range of the derived magnitude conversion relations. The proposed methodology has been successfully applied to a wide region around the Dead Sea Transform Fault Zone (27N-36N, 31E-39E), with input data from various sources such as the International Seismological Centre and the Geophysical Institute of Israel. The produced public catalog contains more than 5500 events, between 1900 and 2017, with moment magnitude Mw above 3. The MATLAB/Python scripts used in this study are also available.


Introduction
An earthquake catalog is a parametric list of events with each entry providing an earthquake's epicenter, origin time, and magnitude size; and sometimes additional data such as depth, associated uncertainties, and focal mechanism information (Woessner et al., 2010).In an instrumental catalog these properties have been computed by analyzing seismic recordings, either analog or digital.In many cases they form the principal datasets from which seismologists interpret the earthquake process and build forecasting statistical models (e.g.Sesetyan et al., 2018).Earthquake catalogs that span many decades are usually inherently heterogeneous.From the early days of (pre-)instrumental seismology, in the beginning of the twentieth century, seismological networks have undergone many changes that are reflected in the databases in use today.These changes can be gradual, such as improvements in location and magnitude estimation over time, as networks gradually increase in size and advances in instrumentation enhance the signal-to-noise ratio in the seismological record.They can also be rapid, such as a systematic change in operating, recording and processing procedures (Husen and Hardebeck, 2010).
Even without network changes, however, discrepancies should be expected due to the sensitivity of the models used to derive parametric information from seismic records.Such procedures employ (i) signal processing techniques, (ii) phase picking algorithms, (iii) subsurface velocity models, (iv) calibration of the instruments, and (v) calibration of the attenuation model used to reconcile observations at different distances (Gomberg et al., 1990;Douglas, 1967).Steps (i)-(iv) affect both origin time and location (epicenter and hypocenter) (Waldhauser and Ellsworth, 2000; Kagan, 2003), while (v) is often coupled to the earthquake's magni-tude, presenting a formidable inversion problem (Tormann et al., 2010).As a result, refinements over time either in the models or in the technology used can lead to significant differences in the output.Large discrepancy is observed also due to the fact that the models and techniques used in steps (i)-(v) are network-specific and often are not standardized (Bormann and Saul, 2008).
Heterogeneity among earthquake catalogs leads to significant data contamination and misinterpretations of the results in a number of analyses, such as seismicity rate evaluation and hazard assessment (Musson, 2012).These problems are more evident when the analysis needs to include data from periods with durations on the order of century and in regions where the coverage from the seismic network is sparse.Our case study region, the Dead Sea Transform Fault Zone (DSTFZ), lacked local seismic networks for a long time (until 1983), although it comprises one of the most rapidly deforming non-subduction region worldwide (Garfunkel et al., 1981).This is why the scope of the present study is to present a framework for merging and homogenizing multiple instrumental earthquake catalogs.We developed automated data-driven methods to minimize the need for expert opinion, which is inherently subjective.Specifically, using as sole input the available parametric catalogs ( §3), the procedure can generate models to convert the various reported magnitudes into a common scale ( §4) and to define the spatio-temporal margins within which duplicate events fall ( §5).The application of these models leads to a unified instrumental earthquake catalog containing only unique events with standardized parametric information ( §6).Similar efforts with variations in the methodology have been done in the past for Italy (Rovida et al., 2020), Lebanon (Brax et al., 2019), Ecuador (Beauval et al., 2013), the Middle East (Zare et al., 2014), South Asia (Nath et al., 2016), Europe (Gruenthal and Wahlstroem, 2012) and for global large magnitude events (Storchak et al., 2015), to name a few.Compared to such past efforts, some of the key improvements presented here relate to data-driven workflows in order to group similar magnitude types to address data-scarcity, define saturation levels for various functional forms, and select time-dependent spatiotemporal windows for the removal of duplicated entries by utilizing metadata of the International Seismological Centre (ISC).Our investigated area (27N-36N and 31E-39E) is meant to match the boundaries of the latest regional historical catalog (Figure 1, Grigoratos et al., 2020).The two catalogs combined can serve as valuable input to probabilistic seismic hazard assessment (PSHA) studies in the region.The latter can be paired with existing exposure (e.g.Grigoratos et al., 2018) and vulnerability models (e.g.Grigoratos et al., 2016;Rodriquez et al., 2018;Cerchiello et al., 2018;Meo et al., 2018) that are available for some parts of the DSTFZ.

Seismotectonic setting
The DSTFZ is the main expression of the movement of the European, Arabic and African plates.It consists of a sequence of left-lateral transform faults (Figure 1) connecting the spreading oceanic ridge of the Red Sea in the south with the compressional deformation zones of the Arabia-Eurasia collision zone in the north (Garfunkel et al., 1981).Although the large majority of earthquakes occur at a depth range between 10 and 20 km, the total seismogenic thickness beneath the midpoint of the DSTFZ is about 28 km (Aldersons and Ben-Avraham, 2014).Global Positioning System (GPS) measurements indicate significant crustal motion with slip rates of about 4-5 mm•yr −1 for the whole DSTFZ, perhaps somewhat larger in the northern parts and smaller further south (Marco and Klinger, 2014;Ambraseys, 2009).
In recent times, there is an apparent quiescence of the DSTFZ; excluding the large earthquake of November 22 1995 (M s 7.1) in the Gulf of Aqaba, only one mainshock of M s 6.0 or larger has occurred during the past century, on July 11 1927 (Ambraseys, 2001).The frequency of large earthquakes in the last 2000-3000 years, however, is quite different (Grigoratos et al., 2020), with the majority of historical earthquakes rupturing fault segments above the 31st parallel north (Figure 1).

Figure 1
Historical earthquakes with M w ≥5 between 31BC and 1900, inside our investigated zone (Grigoratos et al., 2020).The black lines indicate main faults along the DSTFZ.

Input datasets
Our goal was to present a unified catalog containing unique events with M w ≥3.To arrive at that point after magnitude homogenization, we initially used a cut-off scale-independent magnitude of 2. The catalogs and bulletins we used as input sources are described below and are summarized in Table 1.
The International Seismological Centre (ISC) was established in 1964 as the successor to the International Seismological Summary.It collects and standardizes raw and parametric seismic data from about 130 networks worldwide (ISC, 2023).With the exception of Africa, Central Asia, and SE Asia, the reporting instru- mental agencies in most parts of the world are contributing members to ISC (Willemann and Storchak, 2001).ISC re-analyzes all events above magnitude 3.5 and often assigns new epicenters (Bondar and Storchak, 2011) and/or magnitudes (Di Giacomo and Storchak, 2015), using all the available raw data (Storchak et al., 2017).The ISC Bulletin was a fundamental source of data for this study.
The ISC-GEM catalog (Storchak et al., 2015) is a major step forward compared to previously available sources of information.Version 4 of the catalog includes around 27,000 global earthquake epicenters and hypocenters between 1900 and 2013, recomputed using the original arrival time data and the same technique and velocity model throughout.Where possible, earthquake magnitudes are expressed using the M w scale based on seismic moment; proxy M w values are estimated for the other cases based on the newly developed empirical relationships with M S and m b (Di Giacomo et al., 2015).Uncertainties around the parametric information are estimated using uniform techniques (Storchak et al., 2015).The cut-off magnitude thresholds (M s ) for Version 4 were: 7.5 after 1900, 6.25 after 1918, 5.5 after 1920 (newer versions have lower thresholds).
The European-Mediterranean Seismological Centre (EMSC) collects real time parametric data (source parameters and phase pickings) since 1998 from about 70 seismological networks in more than 50 Euro-Mediterranean countries (Godey et al., 2009).We herein refer to this bulletin as CSEM, following the naming scheme of the ISC Bulletin (http://www.isc.ac.uk/ iscbulletin/agencies/).The online bulletin provides events only after mid-2004.Epicentral relocations are performed for all events (Godey et al., 2006).When amplitude/period information is available (provided by 50% of the contributing networks), original body wave and local magnitudes are computed by EMSC (Godey et al., 2013); otherwise, the reported magnitude values are taken from the contributing networks.We should note that the exact source of the magnitude estimates is not cited in the online bulletin.Since 2006, EMSC is integrated in the ISC Bulletin.
The Geophysical Institute of Israel (GII, formerly IPRG) processes the seismic data collected by the Israel Seismic Network (operating since 1983), which contains about 20 stations in and around Israel.The geographic region, which is covered by this bulletin is within the geographic boundaries 27N-36N and 32E-38E (Feldman).We used GII's relocations as the preferred parametric information for the events that were reported in the study of Wetzler and Kurzon (2016).We were unable to find original catalogs from other regional networks.
Within the framework of the "Earthquake Model of the Middle East Region" (EMME) project (Danciu et al., 2017;Sesetyan et al., 2018), Zare et al. (2014) published a catalog up to year 2006 for the Middle East, compiling parametric information provided by past studies and bulletins (mainly ISC).It provides origin time, epicentral coordinates and magnitudes, homogenized in M w .For several events, the source of the original magnitude estimate remains unclear, due to limitations in the original national data.
The European-Mediterranean earthquake catalogue (EMEC; Gruenthal and Wahlstroem, 2012) is an extension of the CEntral, Northern and northwestern European earthquake Catalogue (CENEC) (Grünthal et al., 2009) covering Europe and the Mediterranean Sea until 2006.Like EMME, it compiled parametric information provided by past studies and bulletins.For the eastern Mediterranean and the Levant area, the vast majority of the events originate from Papaioannou (2001) and Abdallah et al. (2004).
The Incorporated Research Institutions for Seismology (IRIS) has an online bulletin that collects (without further review) parametric data from the ISC Bulletin and the National Earthquake Information Center (NEIC) (Trabant et al., 2012).The IRIS Earthquake Browser (IEB) is the main expression of this bulletin, but does not provide the source of each entry and the magnitude scale of each magnitude value.Therefore, we did not make use of IEB.Instead, we used another tool of IRIS called SeismiQuery which does not present such drawbacks.Unfortunately, SeismiQuery was shut down in January 2017.
In principle, using the ISC Bulletin as the only source of instrumental seismicity should be sufficient, since the other available sources are either incorporated in, or based on ISC.However, for reasons that might have to do with the reviewing procedure, some events published in the bulletin of a local agency are not reported by the ISC Bulletin, even though the local agency in question is a contributing member.The opposite scenario is also possible, i.e. the ISC Bulletin lists an event solution citing a local contributing agency which does not report the same event in its own bulletin.The same is true for other international bulletins such as EMSC or IRIS: even though they share most of their contributing agencies with the ISC Bulletin, in some cases their reported events do not match.As a result, for specific cases including our case study region, one has to consider several international bulletins and local agencies, even if they are closely related to the ISC Bulletin.Furthermore, regional networks with localized velocity models can provide better estimates for the epicenter and depth of an event.
The ISC Bulletin and GII are the only sources still reporting more than one magnitude solution per event, thus enabling the correlation between different magnitude scales, and agencies (only ISC).The ISC Bulletin is also the only source providing numerically quantified uncertainty on the time, magnitude and location solutions.

Magnitude scales and their limitations
Numerous different magnitude scales have been proposed through time (Kanamori, 1983;Lay and Wallace, 1995), each based on a different analyzed property of the recorded earthquake signal and with a different applicability.Defining the most suitable magnitude scale for all purposes is generally not possible, as it depends on the practical needs, and it may vary considerably between different regions and seismic networks.Often a single network reports multiple magnitude scales for different event sizes and occasionally for the same event; the latter case enables empirical correlations between scales to be established.Although the different magnitude scales were defined so that they would behave overall similarly within certain magnitude ranges (Gutenberg and Richter, 1956), there can still be considerable variation between estimates of a single event.This heterogeneity may produce artifacts in the magnitudefrequency statistics of the unified catalog (Tormann et al., 2010).
The most commonly used class of magnitude scales, following Richter's original formulation for the local magnitude scale (Richter, 1935), is based on the logarithm of the amplitude of the recorded seismic waves (Deichmann, 2006).The local magnitude (M L ) is arbitrarily defined based on the maximum observed amplitude on a Wood-Anderson seismometer, with a period of 0.8s, recorded at 100 km from the earthquake.In practice, however, the recording distance is never exactly 100 km and region-dependent corrections must be made to account for amplitude changes with distance due to anelastic attenuation and geometrical spreading.Station corrections are also needed, to account for site effects.Further corrections must be made for recordings from instruments other than the standard Wood-Anderson, which is practically not used anymore.Because of these constraints, M L is most suitable for earthquakes at moderate distances (Luckett et al., 2018), with magnitudes between 3 and 5 (Hanks and Boore, 1984).
Other scales are based on the log of the amplitude of a particular phase.The most common are the body wave magnitude, m B (Gutenberg, 1945a), based on body waves with periods of 1-10s, and the surface wave magnitude, M S (Gutenberg, 1945b), based on 20s surface waves (Gutenberg and Richter, 1956).These magnitude scales are used mostly for teleseismic (global) earthquakes.M S strictly measured around 20s is not appropriate for magnitudes greater than 7 or 8 (Di Giacomo et al., 2015;Bormann et al., 2009) because the amplitude of 20s period waves does not increase as the rupture length increases beyond 60 km (Kanamori, 1978).However, broadband M S suffers less from saturation and deviates from M w only for tsunami earthquakes (Kanamori, 1972) and ~M9 earthquakes (Di Giacomo  and Storchak, 2022).The m B magnitude was progressively replaced by many observatories of the "western world" with the 1s period body wave magnitude m b (Bormann and Saul, 2008).Although m b worked well for the purpose of monitoring nuclear tests, its qualities as far as the moderate-to-large earthquakes are concerned were inferior to those of the original m B (Storchak et al., 2015).The short-period m b usually presents extensive scatter (Di Giacomo et al., 2015) and performs best with distant small-to-moderate earthquakes, with magnitudes between 4 and 6 (Gasperini et al., 2013).
One scale of magnitude that is independent of amplitude is the coda duration magnitude, which is based solely on the duration of the seismic signal.Common notations found in the literature include M d , M D, M C .Coda duration magnitude is intended for locallyrecorded events, where the various reflected and refracted phases are not well separated and instead form a prolonged coda following the initial phase arrivals.The amplitude of the coda diminishes as the reflected and refracted phases attenuate; the larger the initial waves, the longer the duration of the observable coda.It is thus sensitive to the signal-to-noise ratio (Del Pezzo et al., 2003).Although this magnitude scale requires no amplitude calibration, it does require empirical calibration of event durations, as well as corrections for distance and event depth.Most agencies have calibrated these parameters so that their product matches the observed values of M L (Eaton, 1992).As a result, often M d and M L values are heavily correlated.One potential artifact is that coda duration magnitudes may be biased towards larger magnitudes during aftershock sequences or other times of intense seismicity, as additional earthquakes may occur within the coda of the first event and lengthen it.
The moment magnitude scale, M w (Hanks and Kanamori, 1979), is based on the log of the seismic moment (M 0 ) which can be directly derived by fitting a double couple moment tensor solution to the recorded earthquake waveforms (Dziewonski et al., 1981), rather than from just the single amplitude of a particular phase at a particular frequency.Alternatively, for wellrecorded earthquakes, the moment can be estimated from a finite source model of the earthquake.Because of that, M w lacks saturation effects, which makes it the most suitable scale for many practical applications.The moment magnitude is the standard practice when it comes to seismic hazard assessment studies, i.e. both the activity rates and the ground motion models (Danciu et al., 2016) should be defined in terms of M w .Unfortunately, M w has been routinely calculated for large earthquakes worldwide only since the beginning of the Global Centroid Moment Tensor (GCMT) catalog in 1976 (Dziewonski et al., 1981).Therefore, all smaller or older earthquakes have to be converted to M w empirically based on conversion relations from other magnitude scales.The magnitude scale that correlates best with M w within the crucial magnitude range for seismic hazard assessment, i.e. 4<M<7, is M s (Kanamori, 1983).That said, solutions in M s are extremely rare in our study region (available for only 1% of the events).The most popular scales are M L (74%) and M d (42%), which are very sensitive to agency-specific calibrations, and to a lesser extent m b (8%).

Regression methods and magnitude uncertainty
To derive appropriate relations between two magnitude types, say GII M d and GCMT M w , one should first identify which events are available in both types, plot the reported values (in pairs) and derive a best-fit curve using regression analysis.Once defined, one can then use this conversion relation to transform any other M d estimate of GII to a proxy M w , assumed equivalent to the standardized estimates of GCMT, in this case.
We should note that in the literature, the terms "magnitude scale" and "magnitude type" are often used interchangeably.We, however, herein define the latter term as the property of the magnitude that describes both its scale and the agency that originally computed it.
In the past, the fitting of the magnitude pairs was usually carried out using standard least squares regression (SR), often without explicit note (e.g.Papazachos et al., 1997;Scordilis, 2006;Yadav et al., 2009).In this approach, the vertical offsets to the best-fit curve are minimized, with the independent variable (in our example GII M d ) being assumed error-free.The latter does not hold for magnitude estimates that are prone to both random and systematic errors, limiting the applicability of SR for magnitude conversions (Stromeyer et al., 2004;Gasperini et al., 2015).According to Castellaro et al. (2006), the application of SR may induce bias of around 30% when later deriving the b-value of the Gutenberg-Richter distribution (Gutenberg and Richter, 1956).Hence, they note that observed variations in b-values may be to some extent an artifact of improper catalog data processing (Musson, 2012;Shelly et al., 2021).
Orthogonal regression (OR) has been proposed as a more appropriate technique to deal with least-squares problems in which dependent (y) and independent variables (x) are both considered to have some finite error.In its general form, it minimizes the weighted orthogonal distance to the best-fit curve (Madansky, 1959;Pujol, 2016).The weighted scheme is dictated by the error variance ratio (η) between the two variables.For η equal to unity, Castellaro and Bormann (2007) introduced the term Particular Orthogonal least-squares Regression (POR), which we adopted.POR minimizes the offsets perpendicular to the best-fit curve, eliminating any weighting scheme (η = 1).Thus, the resulting relations, contrary to the ones based on SR, can be in-verted.As an alternative to OR, Stromeyer et al. (2004) and Krystek and Anton (2008) proposed the chi-square maximum likelihood regression (CSQ) and weighted total least squares (WLS), respectively.Lolli and Gasperini (2012) showed that, under the common assumption that only the variance ratio η is known or assumed, all three methods are substantially equivalent.
We should note that, in general, the error in M w is generally smaller than the error in the older scales (Gasperini et al., 2015), meaning that, when converting any scale to M w, η is smaller than 1.This implies the following: • setting η equal to 1 is certainly an approximation, but probably not a rough one, since Di Giacomo et al. ( 2015) did not observe improvements when they tried weighting schemes and quantile regression; • OR is in most cases more suitable than SR for conversion to M w , since Castellaro and Bormann (2007) demonstrated that SR provides better estimates compared to OR only when η 0.5 > 1.8.
The effect that the regression algorithm can have on the fitted curve is illustrated in Figure S1 of the supplementary material.Overall, in recent years, POR has been the de facto regression method used in magnitude conversion studies globally (e.g.Di Giacomo et al., 2015;Weatherill et al., 2016;Bormann et al., 2009;Nath et al., 2016;Shahvar et al., 2013) and that is also another reason why we adopted it.
Finally, the accuracy of older magnitude measurements tends to be lower; given that larger errors lead to a positive shift in the a-value that is proportional to the square of magnitude errors (Tinti and Mulargia, 1985), the seismic activity for older time intervals may spuriously appear to exceed more recent activity by a significant margin.This effect may be, at least partly, responsible for the often-claimed discrepancy between earthquake rates in recent and old catalogs.

Combining magnitude types to address data scarcity
Direct moment magnitude estimates are available for a limited number of events, resulting in a shortage of data when deriving some conversion equations, especially at smaller magnitudes.To address this issue, seismologists often perform regressions conditional only to the magnitude scale, ignoring the sensitivity to the reporting agency.However, the calibration needed for most magnitude scales depends on the instrument, soil profile and processing techniques.Hence, the solution might vary between agencies, even if the scale is common (e.g. Figure 5d).Regional scale is also an important factor, since regression using local and global data can lead to different fits (Figure S2).
In order to achieve a balanced trade-off, we developed the following procedure.For each magnitude type (agency-scale) that has less than 200 of its data points available in M w or a magnitude range than spans less than 3 magnitude units, we check how it correlates with any other magnitude type that shares the same magnitude scale; if the common events were more than 20 and the root-mean-square orthogonal error (RMSOE) with respect to the one-to-one diagonal (green dotted line in Figure 4a) is smaller than 0.25, then the two magnitude types were grouped together, i.e. their values were considered equivalent when performing POR (Figure 2).We termed this metric RMSOE y=x.We chose RMSOE y=x < 0.25 as threshold because: • it is smaller than the average RMSOE of the later derived relations (Table 2), meaning that the decision to group does not deteriorate the data scatter; • it is equal to the largest RMSOE y=x among the most common sources of original direct M w values (Figure S3).
For the purposes of this procedure, two magnitude scales were considered as potentially equivalent when their first two letters were identical (e.g.M S , M S7 ).Exceptions to that rule were the scales M L , M d , M D , and M C , which were also considered grouping candidates, since the formulation of coda duration magnitudes is usually calibrated using local magnitude values (Eaton, 1992).Solutions of unknown magnitude scale (M) were checked against all scales.
Given the overall scarcity of original M w values, we had to merge different sources, following common practices followed in magnitude-homogenization efforts.Therefore, all available moment magnitude solutions were assumed equivalent to the "true" M w , with the exception of MED-RCMT (Med-Net Regional Centroid Moment Tensor; Pondrelli et al., 2011), which showed 50% more scattering than any other source, with a clear tendency to overestimate magnitudes below 5 (Figure S3a).The other sources of original M w estimates were consistently trending close to the diagonal, when plotted against each other (Figure S3b-f).

Indicator for goodness of fit
Traditionally seismologists employ either expert opinion or the dependent variable's correlation coefficient (R x 2 ) to quantify the goodness of fit or to rank the available conversion equations.The closer R x 2 is to 1, the better the fit.This approach is valid, however, only when the regression is standard least-squares (SR).For orthogonal distance regression, R x 2 is not strictly applicable because the independent variable is not error-free (Gasperini et al., 2015).Since we are using OR, we had to find an alternative data-driven indicator for the goodness of fit.It should perform consistently well for all sample sizes, functional forms, magnitude scales and magnitude ranges.Following Bormann et al. (2007), we selected the root of the mean squared orthogonal errors (RMSOE).The smaller the RMSOE, the better.Given the wide range of magnitude pairs and the varying functional forms ( §4.2.4) considered, we corrected for sample size and complexity of the functional form: where n is the sample size and p the number of free parameters.The applied correction is not novel, since it is identical to the one commonly used for R x 2 .

Functional forms
The simplest and thus most frequently used functional form to fit and apply is the linear case (e.g.Papazachos et al., 1997).However, older magnitude scales saturate at larger magnitudes due to their limited frequency bandwidth.They also often underestimate magnitudes below about 3, due to a disproportionate amount of high-frequency attenuation along the path (Hanks and Boore, 1984;Deichmann, 2006).The functional form of the fitted curve should be able to capture both tendencies.To that end, seismologists have employed bilinear models (e.g.Scordilis, 2006), quadratic polynomials (e.g.Grünthal et al., 2009, their eq. 3), exponential models (e.g.Di Giacomo et al., 2015) or even more complex forms (e.g.Grünthal et al., 2009, their eq. 6).Adding free parameters (c i ) increases both the adaptability of the functional form and the data points needed to constrain the fit.We experimented with all the above formulations, plus cubic polynomials and power-law models, and concluded that the most likely candidates for our dataset were: • two-parameter linear model, y = c 1* x + c 2 ; • three-parameter exponential model, y = e (c3 + c4*x) + c 5 ; • three-parameter power-law model, y = c 6* x c7 + c 8 .
A bi-linear model was not selected, because it introduces a discontinuity point in the relations where the uncertainty is hard to map (Di Giacomo et al., 2015).The standard linear model cannot capture saturation or inverse-saturation effects, and thus it usually performs well only when the magnitude range of the data points is between 4 and 6.The other two models are more flexible, since they are able to capture both linear and non-linear trends.They can present, however, unreasonably curved shapes when extrapolated outside the magnitude range used for their calibration.That is why we imposed c 3 > -6, c 6 > 0 and c 7 < 3.For similar reasons, we did not allow c 1 to be smaller than 0.5 or larger than 1.8.Furthermore, if the maximum magnitude of the independent variable was smaller than 5, we imposed the linear fit as the preferred one.All three functional forms were fitted to each magnitude type.The one leading to the smallest RMSOE adj (Equation 1), given the aforementioned constraints on the free parameters, was the preferred one (Figure 2).

Magnitude range and saturation effects
Due to issues already discussed in section 4.1, we had to discard magnitude values outside the frequency range of the older magnitude scales, i.e.M s <3, M s >8, m b or M L or M d >6.0, before performing POR (Figure 2).No considerations were made regarding m B since our input sources lack estimates for this scale.We also discarded data points where the difference between the two variables was larger than 1.5 as outliers (Figure 5d).
M L and M d calculations can either overestimate Luckett et al. (2018) or underestimate (Deichmann, 2006) the true size of events with M w smaller than about 3, depending on the agency-specific calibration.To avoid having to visually examine each case, we relied again on RMSOE adj (Equation 1).If an abrupt nonlinearity is observed at the low end of the magnitude range, the goodness of fit would deteriorate (increased RMSOE adj ), since the selected functional forms are not capable of capturing a double asymptotic trend.This would indicate that this magnitude type does not perform consistently at low magnitudes, since it is calibrated for a limited frequency range.To check for this, we performed each regression two times (Figure 2); once applying no lower bound cut-off (Figure S4a) and once discarding all data points whose independent variable (x axis) was smaller than a threshold value M t (Figure S4b).The value of M t was chosen based on the frequency sampling behind each magnitude scale and was 4 for M s and 3 for m b or M L or M d .Whichever case led to the smallest RMSOE adj was preferred for the conversion of events of magnitude larger than M t .
Finally, the validity range of the derived conversion equations is defined by the 1 st and 99 th percentile of the independent variable, after the application of all the above filters (Table 2).A schematic summary of the fitting procedure described in sections 4.2.1 -4.2.5 is shown in Figure 2.

Application of conversion relations
The application procedure for the conversion relations is illustrated in Figure 3.If at least one available magnitude type is M w , we select this value as the homogenized magnitude.If multiple M w are available for one event we prioritize in descending order: CSEM, GII/IPRG, Cyprus Geological Survey (NIC), Harvard University, US Geological Survey (NEIC) and GCMT (Figure S3).This hierarchy can be modified by the user on a case-by-case basis.Alternatively, the M w reporting agencies can be ranked based on increasing RMOSE y=x against the rest.
If an event is not available in M w , we select the magnitude type available for this event, whose conversion relation has the lowest RMSOE adj (Equation 1), respecting possible validity range constraints.If no relation is applicable, we repeat this step overlooking the validity range, and the relation with the lowest RMSOE adj is extrapolated to match the size of the event.If the latter is below the validity range, we ignore any potential abrupt nonlinearity at low magnitudes and apply the corresponding conversion relation derived without any lower bound cut-off.If none of the magnitude solutions has a conversion relation, then the median of the available magnitudes (regardless of scale) is used as proxy M w .For the homogenized events, we report the total uncertainty around the M w estimate as σ = σ 2 y + σ 2 meas , where σ y is the root of the mean squared (vertical) errors of the conversion relation (with M w plotted on the y axis) and σ meas is the measurement uncertainty accompanying the original magnitude scale.The latter was provided only in the ISC data.Since POR minimizes the perpendicular offsets and not the vertical ones, it leads to larger σ y compared to a SR with M w as the dependent variable.

Combining magnitude types of compatible magnitude scales
The grouping procedure worked well, leading to magnitude types of the same agency and of similar scale, e.g.JSO M L and JSO M Lv , being identified as equivalent.
The acronyms used for each agency follow ISC's naming scheme (http://www.isc.ac.uk/iscbulletin/agencies/).Overall, M L and M d estimates coming from the same agency were frequently grouped, since their calibration is interconnected (see §4.1).The procedure increased the sample size of the fitted data points and expanded the validity range of the derived equations (e.g. Figure 5d; S5a), without affecting their scatter.With the grouping procedure enabled, the average number of events behind each conversion increased by 40%, while the mean RMSOE adj among all the derived conversion relations remained unchanged.RMSOE y=x performed well as an unsupervised indicator, even in challenging situations.In Figure 4a, the m b estimates of ISC and NEIC are close to the diagonal with reasonable scatter, resulting in RMSOE y=x < 0.25.On the other hand, in Figure 4b, RMSOE y=x is significantly larger than the threshold we set, since even though IPRG M L and JSO M L are centered around the diagonal, they present extensive scatter, with differences up to 2 magnitude units.In both cases, blindly fitting a single-parameter linear curve (solid black line in Figure 4) would have misinformed us that the fit was equally good.

Regression trends
The frequency-band limitations of each magnitude scale ( §4.1) were verified during our analysis.A lot of the derived conversion relations display saturation effects above M w 5 (e.g. Figure 5d; S5cd) and below M w 3  (e.g. Figure S5b).Had we not tested for the lower bound cut-off at M 3, most of the fitted functional forms would have been nonlinear (Figure 5ac; S4).For large sample sizes, our two nonlinear fits present similar shapes that differ a lot from the linear curve if extrapolated outside the fitted magnitude range (Figure 5cd).We did not come across magnitude types that underestimate larger magnitudes while overestimating smaller ones.Datasets containing such a trend could benefit from third-degree polynomial fitting.
The conversion relations that were most frequently used when homogenizing the original magnitude estimates in the various catalogs (Figure 5; S5) present reasonable scatter around the diagonal, except for JSO M L which does not correlate well with M w (Figure S5d).Even though the ISC Bulletin includes dozens of magni- tude types, only 10 were needed to homogenize almost every event (Table 2).

Comparison with other already homogenized catalogs
Regarding the proxy M w values already present in our input catalogs, ISC-GEM used OR (Di Giacomo et al., 2015) and EMME probably used SR (since they are reporting R x 2 values).Within our case study region, the vast majority of EMEC's events originate from Papaioannou (2001) and Abdallah et al. (2004).Papaioannou (2001) reports moment magnitudes, which are converted mostly using the equations of Papazachos et al. (1997), who used SR.As a result, Papaioannou (2001)'s proxy M w values do not agree well with the corresponding M w values from Moment Tensor Solutions (Gruenthal and Wahlstroem, 2012).Abdallah et al. (2004) report M L , for which CENEC derived a conversion equation to convert to M w , employing CSQ.With all of that in mind, the significant scatter in the proxy M w values developed in this study, when compared mainly to EMME's and EMEC's (Figure 6) should be expected.

Merging multiple catalogs
The first step towards building a unified catalog is to identify which seismic events are included in more than one catalog, i.e. duplicates.Usually, each input catalog has its own scheme for assigning a unique identifier (ID) to each event, thus making the identification of common events non-trivial and the use of a "duplicate finding" algorithm a necessity.This generally takes the form of a window-searching algorithm by which multiple representations of the same event are identified due to their proximity in space, time, and, occasionally, magnitude.The configuration of these windows (margins) and the quality of the information provided by the catalog will greatly influence the possibility of misassigning duplicates (Weatherill et al., 2016).One of the most common pitfalls are inaccuracies in the earthquake's origin time.For example, EMEC (Gruenthal and Wahlstroem, 2012) systematically provides origin times down to minutes (and not seconds).The electronic versions of the catalogs fill in empty entries of missing time information (e.g.second) with zeros.Finally, a bulletin might report local time and not Coordinated Universal Time (UTC) or use a non-standard geographical coordinate system.To ensure best practice, the compiler should adopt margins that are larger than the uncertainty of the methods and models used by the agencies (Schweitzer, 2006).That said, an algorithm with overestimated margins might mistakenly flag clustered events (fore/aftershocks) as duplicates.To address this trade-off, re-searchers often define margins that are bulletin-specific or variable with time (e.g.Wang et al., 2009).For the analog era, when the data were sparser, it is likely that wider time/space windows are needed.Some studies even resort to manual inspection for the few larger magnitude events that have a greater impact on the hazard estimates (e.g.Beauval et al., 2013).
Most modern unified catalogs either do not clearly state their criteria for identifying the duplicate events or choose arbitrary values based more on expert opinion than data-driven analysis.The chosen margins for instrumental catalogs vary significantly, on the order of 10-120 seconds and of 30-100 km (e.g.Wang et al., 2009;Faeh et al., 2011;Beauval et al., 2013;Poggi et al., 2017), while the magnitude is rarely used as a deciding factor.

Relevance of the magnitude scale when merging catalogs
The order in which the compiler will do the merging and the magnitude homogenization is not always fixed.
If the compiler chooses to discard the magnitude as a criterion for the identification of duplicates and has great confidence in the selected margins for origin time and location, then one could first merge all catalogs into one, storing all magnitude solutions for each unique event, and then perform the magnitude homogenization.This way, the compiler ends up with more magnitude solutions per event, hence more potential magnitude pairs to derive the conversion relations.We, however, preferred to do the opposite: first homogenize all sources in M w (using the magnitude solutions of ISC and GII), and then merge all datasets utilizing the event's size to constrain the duplicate finding algorithm.One reason to do that is that when one relies on earthquake solutions alone (no stations data), it is often impossible to distinguish a fore/aftershock from a duplicate, even if the margins are ideally selected.Our main concern was not to contaminate the derivation of the magnitude conversion relations with any artifacts that the merging process might introduce.For example, events of the same earthquake sequence can be misidentified as duplicates, mixing up incompatible solutions as magnitude pairs for the regression.

Methodology
We aimed at deriving both margins (i.e.origin time and location difference) based on the trends we observe in the actual data used in this study, minimizing the need for expert opinion.To do that, multiple solutions for the same event are needed in order to calculate the discrepancy in the data.The ISC Bulletin reports for each event available parametric solutions from all contributing agencies, as well as a "prime" set, which according to ISC describes best the reviewed event.This rich database enables the statistical analysis of the discrepancy in the solutions, either between agency-pairs or in comparison to ISC's proposed "prime" solution.We did the latter, assuming that ISC's origin solutions were the most accurate, since they are derived using the richest available dataset.Although this is a valid assumption in terms of epicenter and origin time, local velocity models from regional networks often lead to more accurate depth estimates.We did not use the depth as a criterion for duplicate finding, however.
For our case study region (and for most of the world), ISC's IASPEI Seismic Format (Storchak, 2006) files are sufficient to deduct data-driven margins for all of the input catalogs and bulletins.GII and EMSC are contributing agencies to ISC, while ISC-GEM, IRIS, EMME and EMEC are compiled using mainly ISC data in the first place.Hence, using only the ISC Bulletin, we were able to derive margins that are suitable for all our input catalogs.
The performed statistics do not have to be complex.Once the distribution of differences is established, the compiler can pre-define a percentile-based value to be used as the margin for duplicate-finding.If the differences in time, space, and (optionally) magnitude solutions between two events fall within these data-driven margins, then a duplicate is flagged.We chose to use the 95 th percentile as threshold for the definition of the margins in the duplicate-finding algorithm.The chosen percentile was large enough to ensure that the margins cover the modeling uncertainty in the computed solutions, yet small enough to discard unreasonably large outliers that are often observed possibly due to logging errors or miscalculations.
When two events in different catalogs are identified as duplicates, one should decide which parametric information describe the event best.A hierarchy must be defined in advance to dictate this process.Regarding the origin time and location, this hierarchy is usually pre-defined by the compiler based on the following considerations: • ISC-GEM has re-assigned origin times, epicenters and hypocenters to moderate-to-large magnitude events after 1900, following clearly documented up-to-date methods (Di Giacomo et al., 2015).Hence, ISC-GEM was considered as the most reliable source of parametric information.
• ISC collects and reviews the most comprehensive dataset of both raw and parametric seismic data for each event.As a result, the solutions they recommend or originally compute are highly credible.EMSC does a similar job having, however, fewer contributing agencies.
• Local agencies close to the epicenter are more likely to have a detailed velocity model for the region and higher network density, when compared to global agencies.
• The level of accuracy in date and time is also an important factor.For example, EMEC provides origin times down to minutes (and not seconds), while GII reports seconds with consistency only after 1983.
• Compilations providing unclear documentation regarding their merging procedure and original sources should not be favored.
• The regression methods used in the magnitude homogenization process ( §4.2.1) should be taken into account when ranking magnitude estimates.
• Bulletins reporting more than one magnitude solution provide greater flexibility in selecting the most suitable conversion equation ( §4.2.6).Our preferred hierarchy regarding the proxy M w was: ISC-GEM; ISC; IRIS; GII; EMSC; EMEC; EMME ( §4.3.3),prioritizing our conversion relations over EMME's and EMEC's, and acknowledging the flexibility that the multiple magnitude solutions in the ISC Bulletin provide.Regarding the time and location solutions, our preferred hierarchy (Figure S3) was: ISC-GEM; ISC; EMSC; GII; IRIS; EMME; EMEC, prioritizing the reviewed relocations of ISC, GII and EMSC.The merging process is sequential, i.e. we are always looking for duplicates between two catalogs only.The margins are tested against the difference of the catalog that is being merged with the preferred solution among all the catalogs that have already been unified.

Variability in origin time and location solutions within ISC
The vast majority of absolute differences in origin time among ISC's contributing agencies with respect to ISC's prime solution are less than 10s, with 9.5s being 95th percentile (Table S1).That said, outliers of more than two minutes also exist, perhaps due to a zero value being added when a measurement of seconds is not available.As far as location is concerned, the vast majority of the differences is less than 100 km, with 85 km being the 95 th percentile (Table S1).Observed differences, in the order of 500-1000 km, were attributed to input errors (typos).Figure 7 demonstrates that the median annual differences in origin time and location are decreasing with time, probably due to increasing number of stations and improved velocity models (Bondar et al., 2015).That said, after 1985, namely when most nearby national networks were set up, the differences appear to be relatively stable with time.
Figure 8 shows that the differences in origin time and location are increasing with magnitude, with a 2-fold increase between magnitudes below 3 and magnitudes above 4. Moderate-to-large events are usually also covered by more distant networks with large-scale velocity models and looser azimuthal coverage.That could explain these observations.One would also expect decreasing uncertainty in the location with decreasing rupture size (Kagan, 2003).
Figure 9 shows that the deviation from ISC's prime solutions is largely agency-dependent.Interestingly, national agencies nearby (e.g.JSO) do not perform consistently better than teleseismic ones (e.g.NEIC), as one would expect (Bondar et al., 2004).Nevertheless, when looking at the origin time differences of the most important agencies for the case study area, the 90 th percentile is usually less than 5s and always less than 10s, in agreement with previous studies (e.g.Bondar et al., 2004Bondar et al., , 2015)).On the other hand, the variability in the epicentral location is higher than expected (Bondar et al., 2015), with the 90 th percentile being rarely below 30 km.

Parametric windows for duplicate identification
We utilized the absolute differences among agencies with respect to ISC's prime solution as a data-driven proxy for the definition of parametric windows during the duplicate identification.We defined the margins for location and origin time as the 95 th percentile of the aforementioned deviation.We used the values of the second and third column of Table S1 as margins for the instrumental events after 1964, i.e. 10s and 85 km (Table 3).Special attention was paid to specific limitations in the input catalogs.In particular, EMEC does not report seconds for any event, while GII reports seconds only after 1983.Therefore, this margin should be adjusted to 2 minutes and these catalogs should be the last ones merged.For the events before 1964 there is not enough data to do any sort of statistical analysis (Figure 7) and thus expert opinion cannot be avoided.We used 30 minutes and 100 km as margins for that timeperiod (Poggi et al., 2017).Additional levels of complexity could be added.One  could make the margins magnitude-dependent, to capture the trends shown in Figure 8 or use tighter post-1964 and post-1983 margins when merging the ISC Bulletin with CSEM and their product with GII respectively (Table S1).

Catalog overview
The unified catalog contains 25000 time-stamped hypocenters between 1900 and 2017, of which more than 5500 events have moment magnitude M w larger than 3.It is available in electronic format online (Grigoratos et al., 2023).We also report the preferred original sources for the origin and magnitude solutions, the measurement uncertainty behind the original magnitude estimate, and the total magnitude uncertainty after conversion.Moment tensor solutions were available for only a quarter of the events, while the rest had magnitude solutions that needed conversion to M w .The average conversion uncertainty (σ y ) was 0.3 and none of the events required extrapolation of the derived conversion relations.For only 3% of the events, none of their magnitude solutions could be associated with a conversion relation and thus the median of their original magnitudes is reported.
Most of the events in our catalog are post-1983 (Figure 10), coinciding with the development of the first local networks along the DSTFZ, by Israel and Jordan.The foundation of ISC in 1964 had already expanded the magnitude range of the cataloguing below magnitude 5.5, while also improving the reliability of the location solutions.That said, about 7% of the events in the unified catalog were not reported by the ISC Bulletin (Table S2).
One third of the events in our catalog, including the 1995 M w 7.2 rupture, are found in the Gulf of Aqaba (Figure 11), in contrast to the low seismic activity reported there in the previous millennium (Figure 1) (Grigoratos et al., 2020).On the other hand, the many large historical earthquakes in the segments north of the Dead Sea lake have not been followed by similar levels of activity in recent decades.Finally, the cluster of post-1985 seismicity east of the DSTFZ, in south Jordan (Latitude ~30; Figure 11), is most likely related to potash mines (Rodgers et al., 2003).Groundwater extraction is also linked to a few clusters in and around Lake Kinneret   (Sea of Galilee; Shalev et al., 2023).Although we cannot exclude other instances of anthropogenic seismicity, we are not aware of other established cases.
We should specify that the catalog has not been declustered and the spatio-temporal variation of the magnitude of completeness of the catalog has not been assessed.The target magnitude of M w 3 was chosen simply because it was viewed as a low enough value to aid future derivations of the regional b-values.

Conclusions
The creation of a homogenized earthquake catalog is an error-prone procedure that requires a good understanding of the heterogeneities among the available bulletins.Common events within the bulletins have to be identified and assigned with the most suitable origin time and location solution, while all the events have to be harmonized into a single magnitude scale.This process requires several decision variables that are usually defined using qualitative measures or expert opinion, without a clear exploration of the associated uncertainties.To address this issue in a more quantitative way, we developed a framework, which can utilize multiple databases, such as the ISC Bulletin, to explore the relations between earthquake solutions from different seismic networks and agencies, in order to produce a unified parametric earthquake catalog.The proposed data-driven approach defines spatio-temporal margins within which duplicate events fall and converts the various reported magnitudes into a common scale.Given the density and geographical coverage of ISC's database, we believe that the proposed methodology can be applied to a number of regions worldwide.
To that end, the MATLAB and Python scripts used in this workflow have been made publicly available.We applied them to the Dead Sea Transform Fault Zone and derived a list of more than 5500 instrumental events with M w larger than 3.One third of the events in our catalog, including the 1995 M w 7.2 rupture, are found in the Gulf of Aqaba, in contrast to the low seismic activity reported there in the previous millennium (Grigoratos et al., 2020).On the other hand, the many large historical earthquakes in the segments north of the Dead Sea lake have not been followed by similar levels of activity in the last century.
As far as the magnitude homogenization is concerned, the frequency dependence of the older magnitude scales was evident during our analysis.Most of the derived conversion relations underestimate events with M w either below 3 or above 6.We introduced the root of the mean squared orthogonal errors (RM-SOE), corrected for sample size and number of free parameters, as a metric to determine the magnitude range of the regression, the fitted functional form, and which of the available magnitude solutions is to be converted to M w .In cases where the data points for the regression were limited, we further employed RMSOE to determine whether magnitude estimates of similar scale could be grouped together without alteration of their underlying correlation with M w .With the exception of JSO M L , all key magnitude types in the catalogs correlated reasonably well with M w , given their frequency-band limitations.The average conversion uncertainty (σ y ) in our unified catalog was 0.3 magnitude units, which assuming a (mostly underreported) measurement uncertainty of 0.2, results in an overall uncertainty of about 0.36 behind each proxy M w .
Having homogenized the magnitude of the events, we then used a window-searching algorithm to identify multiple representations of the same event in the various catalogs, based on their proximity in space and time.We defined these two margins analyzing the discrepancies in the solutions provided by ISC's contributing members.The differences in origin time and location are agency-dependent, and are generally decreasing with time and decreasing magnitude.In the last 50 years, the solutions rarely deviate more than 10s or 85 km from ISC's reviewed solution.

Figure 2
Figure 2 Flowchart of the fitting procedure used in the derivation of the conversion relations.

Figure 3
Figure 3 Flowchart describing the application of the derived conversion relations.M range is the validity range of the conversion relation and M i is the magnitude value of the event in the reported magnitude scale.

Figure 4
Figure 4 Correlating magnitude estimates from different agencies.

Figure 5
Figure 5 Derived conversion relations for four of the most common magnitude types.The functional form with the lowest RMSOE adj is plotted as a solid curve.

Figure 6
Figure 6 Comparison of proxy M w values computed by this study, with the ones derived by either EMEC (a) and EMME (b).

Figure 7
Figure 7 Temporal trend of the absolute differences among ISC's contributing agencies with respect to ISC's prime solution in terms of (a) origin time and (b) location.Red lines indicate the 5 to 95 percentile range; the black line indicates the median (50 th percentile).

Figure 8
Figure 8 Trend of absolute differences among ISC's contributing agencies with respect to ISC's prime solution with magnitude; (a) origin time and (b) location.Red lines indicate the 5 to 95 percentile range; the black line indicates the median (50 th percentile).

Figure 9
Figure 9 Agency-specific statistics on deviation from ISC's prime solution: (a) origin time; (b) epicenter.

Figure 10
Figure 10 Density of earthquake magnitudes with time for the homogenized catalog.

Figure 11
Figure 11 Homogenized catalog of earthquakes with M w ≥3 between 1900 and 2017.The black lines indicate main faults along the DSTFZ (Grigoratos et al., 2020).

Table 1
Sources of parametric earthquake data (within our spatial boundaries).

Table 2
Derived conversion relations for the ten most common magnitude types.

Table 3
Margins used in the duplicate finding algorithm.