Geocoding Applications for Social Science to Improve Earthquake Early Warning

,


Introduction
Geocoding is the process of using a street address, intersection, census tract, zip code, or some other type of location information and determining its geographical coordinates (latitude and longitude).Pioneered in the late 1960s for use in the census, New Haven, Connecticut was the first city in the world with a geocodable street network database (e.g., Smith and White, 1971).Over the decades, scholars have used geocoding techniques to examine the effect that physical proximity has had on the careers of women writers in Victorian-era London, England (Bourrier et al., 2021), to help with the recovery in New Orleans post-Hurricane Katrina in 2005 (Gardere et al., 2020), and in disease surveillance for public health purposes (Lin, 2022;Shaheen et al., 2021; In addition, geolocation through location-based services (LBS) has become very popular for use in various mobile phone applications (or apps) since the early 2000s (e.g., Huang, 2022).LBS have greatly increased our ability to understand individual and community demographics and the ways in which they travel and move about the world.There are many humanitarian applications to the use of LBS, such as tracking the location of individuals with dementia (Abbas and Michael, 2022), helping students around campus during the COVID-19 pandemic to avoid transmission 'hotspots' (Elalami et al., 2022), and with alerting capabilities for impending weather, flooding, and other natural hazards (e.g., Bopp and Douvinet, 2020).
Geolocation also extends to earthquake early warning (EEW) alerting capabilities.The U.S. Geological Survey (USGS) operates and maintains the ShakeAlert® EEW system (e.g., Given et al., 2018), which is now operational in California, Oregon, and Washington.The idea for an EEW system in the United States has been around since the late 1980s after the 1989 M6.9 Loma Prieta, California earthquake (for a timeline, see McBride et al., 2022b), and gained traction in the United States in 2006 (U.S. Geological Survey, 2019).Although early warning systems are human-centered (e.g., Kelman and Glantz, 2014;Sumy et al., 2021), the ideas and conceptualization around EEW largely came from seismologists and the physical science community.For the United States, this changed almost a decade later in 2015 with the development of the Joint Committee for Communication, Education, Outreach, and Technical Engagement (JC-CEO&TE, de Groot et al., 2022).
Earthquake early warning (EEW) alerts are sent out based on magnitude and intensity thresholds.Geolocation is vital in determining seismic intensity, or the severity of earthquake shaking, as intensity varies by location on relatively small spatial scales (on the order of tens to hundreds of meters) due to microzonation (e.g., Kumar Shukla, 2022;Rastogi et al., 2023;Pilz et al., 2015).Seismic intensity impacts what an individual feels during earthquake shaking, whether they receive an earthquake early warning alert (or not), and whether a person takes a protective action (or not).The magnitude and intensity thresholds for EEW vary from country to country; for example, the West Coast of the United States (California, Oregon, and Washington) receives alerts at lower intensities (Bostrom et al., 2022; U.S. Geological Survey, 2021) compared to Japan (Nakayachi et al., 2019) and New Zealand (Becker et al., 2020).
As EEW expands worldwide, the earthquake science community is collecting a wealth of social science data and information about those who received an EEW alert, what people experienced during earthquake shaking (seismic intensity), people's behavior and whether they take protective action during an earthquake, such as 'Drop, Cover, and Hold On' (e.g., McBride et al., 2022b).In this study, I demonstrate the range and utility of geocoding social science data for the purposes of informing and improving EEW.I first discuss the Google Maps Geocoding Application Programming Interface (API) methodology for geocoding, chosen here because the software is open-access and free to use.I then demonstrate the applications of geocoding to two case studies using: 1) survey data collected in Oakland, California, United States (McBride et al., 2023), and 2) video data from the 2018 M7.1 Anchorage, Alaska, United States earthquake (McBride et al., 2022b).I conducted the geocoding in the two case studies described here.I then consider limitations and ethical considerations around the methods used and how to address privacy and protection of these data.Finally, I discuss applications of geocoding and other location-based techniques to evaluate the distribution and effectiveness of alerts, which will inform future improvements to earthquake early warning (and earthquake science broadly) worldwide.

Methods: Google Maps Geocoding Application Programming Interface (API)
The Google Maps Geocoding Application Programming Interface (API) is freely and openly available, does not require proprietary software that may be cost prohibitive, is accessible over the Internet, and can be enabled within several clients, such as JavaScript and Python, without the need for large amounts of scale up time.A call to the Google Maps Geocoding API does require an API key, which may require a small fee depending on how frequently the Geocoding API is used.
Geocoding works best when starting with an accurate street address (e.g., Yang et al., 2004;Kilic and Gülgen, 2020), either provided by the individual directly or through the identification of a landmark from which a street address can be obtained.From a street address, I use the Google Maps Geocoding API to obtain geographic coordinates (latitude and longitude) for mapping purposes.The output from the Google Maps Geocoding API is in JavaScript Object Notation (JSON) format and is easily readable by a range of different computing languages.
As an example, I examine the street address for the headquarters of the EarthScope Consortium: '1200 New York Avenue NW Suite 400 Washington DC 20005-3929' (Figure 1).The address for the EarthScope Consortium headquarters has nine 'address components': the suite number or subpremise (400), the street number (1200), the route or street (New York Avenue Northwest), the neighborhood (Northwest Washington), the locality (Washington), the administrative area (District of Columbia), country (United States), postal code (20005), and postal code suffix (3929).The 'long name' has all parts spelled out, while the 'short name' contains abbreviations; for example, the US ('short name') for the United States ('long name').The readable address is provided in the 'formatted address' output (1200 New York Ave NW #400).I conduct a quality check on the data by comparing the 'address components' (input) with the 'formatted address' (output).
The geographic coordinates (latitude and longitude) are provided in the 'geometry' section of the JSON output (Figure 1).The 'location' is the geocoded location of the EarthScope Consortium headquarters, with latitude ('lat') and longitude ('lng') coordinates.There are four 'location types' that the Google Maps Geocoding API uses: rooftop, range interpolated, geometric center, and approximate.The 'rooftop' output is the most precise, while the 'approximate' output is the least precise.I will discuss these outputs in context with the case studies and examples in the following sections.
Lastly, the 'viewport' output provides a level of uncertainty on the location.I use the Euclidean distance formula between the 'location' output with the 'northeast' and 'southwest' viewport bounds, respectively, and take a mean (average) of these two outputs to obtain a level of uncertainty on the geolocation.In this example, the distance between the 'location' and the 'northeast' and 'southwest' viewports are 1.6 m and 1.8 m, respectively, with a mean of 1.7 m (5 feet).The view-

Geographic Coordinates
Figure 1 The Google Maps Geocoding API JSON output for the EarthScope Consortium headquarters in Washington DC.The nine 'address components' (input) are individually labeled.The 'formatted address' (output) provides a quality control check on the input parameters.The geocoded output in the 'geometry' section contains the geographic coordinates (latitude and longitude), and location type and viewport bounds, which together provide a proxy for geographic uncertainty.port bounds provide a level of location uncertainty that is smaller than the footprint of the building itself; thus, these considerations should be taken as a proxy or level of uncertainty, rather than a robust location uncertainty.Additional context on location uncertainty will be provided within the case study sections below.For more thorough and complete information on the Google Maps Geocoding API, the reader is referred to the Developers page (https://developers.google.com/maps/documentation/geocoding).

Case Study 1: Survey Data for Geolocation
Most online survey providers have built-in tools to obtain an internet protocol (IP) address without any input from the survey responder (e.g., Sumy et al., 2020).However, geocoding IP addresses may be unreliable and output inaccurate geographic locations (e.g., Poese et al., 2011;Callejo et al., 2022), with potential uncertainties on the order of kilometers (e.g., Ma et al., 2023).Due to the potentially large location uncertainties, survey designers can directly ask questions about an individual's location, with approval by an Institutional Review Board or other research ethics committee (e.g., Grady, 2015).The respondent can then 'opt-in' to providing details about their location to a specificity that they feel comfortable with, whether it be a postal address, landmark, or some other geographic identifier.For earthquake early warning, people receive alerts within a certain spatial area based on earthquake magnitude and intensity thresholds.This spatial area is known as an alerting geofence.Extending the work of McBride et al. (2023), I seek to use survey data to examine the data latencies at the top ten locations with the most survey responses inside the alerting geofence to determine: 1) who received an alert and with what data latencies; and 2) who did not receive an alert (and should have) or who received alerts at very long data latencies (>120 s).2023) to gather broad information around location in a way that respected the survey respondent's privacy (see Acknowledgements for ethical approval information).However, upon examination, I found that these questions also produced widely different information ranging from a postal address (precise information that can be easily geocoded), to a landmark or building that required some initial identification and preprocessing of the information, or a ZIP code (broad information that was difficult to narrow down, and therefore often discarded; Figure 2).
The test of the ShakeAlert system in Oakland, California provides an excellent example of the types of location responses received by survey.The USGS coordinated with the California Governor's Office of Emergency Services (CalOES), the Federal Communications Commission (FCC), and local emergency management partners to conduct a test of the ShakeAlert system in Oakland, California on 27 March 2019 at 11 AM local time.The test took place in downtown Oakland during a weekday a year before the COVID-19 pandemic, so many respondents were at their offices and workplaces in public or commercially zoned locations.This alleviates a privacy concern about revealing too much information about an individual's personal or residential property in this study.
The Oakland, California test covered a spatial area of 2.24 km 2 in downtown Oakland centered around Broadway.The survey gathered a total of 1,013 responses in an area with 40,000 people, reaching 2.5% of the population within the alerting geofence (McBride et al., 2023).Initial data cleaning to remove inaccurate results left 828 responses to analyze.Here I discuss the manual inspection of the raw survey data to find the best postal addresses (and most easily geocoded location information) from a variety of different responses, starting with the landmark information (Figure 2), a practice not de- The number of survey responses by location.We examine the number of received alerts (white), with alerts that arrived >120 s (red), inexact timing of alerts (grey), and did not receive (black).The Alameda County Administration Building (ACAB) received the most alerts and is located on the perimeter of the alerting geofence.), even landmarks such as 'Starbucks' were accurately identified.The most commonly incorrect part of the postal address for the Oakland test was the ZIP code, which may reflect the difference between their home and office addresses and their corresponding ZIP codes.This is also recognized as a common error within the U.S. Geological Survey's 'Did You Feel It?' (DYFI?) community intensity survey (Wald et al., 2011).Once I had street addresses, either from the survey respondent themselves or from the use of Google Maps (maps.google.com) to translate a landmark or intersection to a street address, I found a geographic location through the Google Maps Geocoding API.The location type output of the Google Maps Geocoding API (Figure 1), the description of these location types, and the survey data from the Oakland test of the ShakeAlert EEW system that most likely resulted in the location type are documented in Table 1.There are four main location type outputs: rooftop, range interpolated, geometric center, and approximate, in order from most precise (rooftop) to least (approximate, Figure 2 and Table 1).The approximate location type stems from ZIP code information only and has uncertainty on the order of kilometers.The median uncertainty for the approximate locations was 1.5 km, which I adopt as the maximum uncertainty threshold for the other location types.The other location types have median uncertainty on the order of 200 m or less.I reiterate that the viewport information (Figure 1) is a proxy for uncertainty and does not reflect the actual uncertainty in these locations.I geocoded a total of 823 survey responses (Table 1).Of these, 64 locations (8% of the total) resulted in 'approximate' location types and two were above the median uncertainty threshold of 1.5 km; all were discarded.
Here, I used the remaining 757 geocoded locations combined with the data latency information collected via surveys during the Oakland test of the ShakeAlert EEW system to examine the alert receipt and median alert latency at the top ten locations with survey responses (Figure 3), which extends the work of McBride et al. (2023).The map (Figure 3a) shows the alerting polygon and the distribution of locations, which are primarily confined to government offices or other large office buildings.The largest number of survey responses received at any one location (location #1: ACAB; Figure 3a) was fifty (50), regardless of whether an alert was received or not (Figure 3b).In context with the research questions identified above, I find that at the top ten locations with the most survey response, 1) the median latency of the alerts ranged from 6-19 s (Figure 3a), and 2) alerts were largely received, yet some alerts took a very long time (>120 s) or were not received at all (Figure 3b).

Case Study 2: Video Data for Geolocation
Now I consider a second case study for geolocation with the use of video data, which demonstrates the utility of geolocation.While the previous method with survey data was forward approaching (e.g., can ask the survey respondent about their location), this method with video data is backward (or forensic) approaching (e.g., using metadata provided via social media or identifying landmarks within the video itself, without contacting the individual).Video data from household surveillance cameras are increasingly used to check in on children and pets (e.g., Ur et al., 2014;Bernd et al., 2022), provide insurance claim information (e.g., Wong et al., 2009;Ahmad et al., 2019), and protect from theft (e.g., Pandya et al., 2018).As a society, we also are increasingly publicly surveilled waiting at a stoplight by state and local departments of transportation (e.g., Zhang et al., 2022), in a grocery store to better understand retail behavior and prevent theft (e.g., Alikhani and Renzetti, 2022), and even in school classrooms for safety-related and distance learning purposes (e.g., Johnson et al., 2018;King and Bracy, 2019;Fisher et al., 2020).
An increasing ubiquity of smartphone cameras combined with social media platforms (YouTube, TikTok, Facebook, and Twitter, as examples) provide public spaces for content related to earthquake experiences (e.g., Earle et al., 2010;Crooks et al., 2013;Stefanidis et al., 2013).After a potentially damaging earthquake, the Earthquake Engineering Research Institute (EERI) deploys a Virtual Earthquake Reconnaissance Team (VERT, 2023, EERI Learning) to collect online videos and imagery (McBride et al., 2022a).These ephemeral data must be identified and downloaded within a short time span.For instance, the social media platforms WhatsApp and Instagram allow users to post 'stories' that are only available for 24 hours after the original post.In addition, videos may sometimes be deleted or removed from a site due to its sensitive content (e.g., the terrifying nature of earthquake shaking, building collapse, etc.).Traditional news media sources may also help as they typically piece together several videos with location information for 'B-roll' that can be individually examined for protective action behavior.
Video data must be collected quickly and efficiently through a variety of approaches.Teams already in place and ready to virtually deploy, such as through EERI VERT, gather video information over a span of one week or more after the event (e.g., McBride et al., 2022b).The use of keywords and hashtags help to identify earthquake footage, and dates and location information help to rule out unrelated videos (e.g., Crooks et al., 2013;McBride et al., 2022a).At times, a video is tagged by the original poster (OP) or reposter and/or news reporter as coming from the event or a certain geolocation nearby the event.Social media comments on the post or newsreel help to determine whether this geographic information is correct.Often people will comment asking for location information, and if the OP responds, this helps provide a landmark or other identifying information to determine a geolocation.At other times, location information can be gleaned by examining the film frame-byframe for identifying features.Information is more easily obtained from videos collected at a public location, such as a restaurant, public park, or school or work environment.These landmarks are translated into a street address for use in the Google Maps Geocoding API (Figure 2), and often result in a 'rooftop' location type (Table 1).
However, unlike the survey data, I cannot directly ask for location related information.People may also post videos from their personal (home) address or a private location.In this instance, if someone comments for more location information on social media, the OP typically provides a nearby landmark, neighborhood, and/or intersection that provides inexact location information.These data often result in the 'geometric center' or 'range interpolated' output from the Google Maps Geocoding API (Figure 2).However, the OP may post the video to social media under their own name.Depending on the location of the natural hazard event, the area that the hazard impacted, and the uniqueness of a person's name, a street address can be determined through online, open-access resources such as White Pages (www.whitepages.com)or through Voter Records (www.voterrecords.com).More common names in the United States, such as Smith or Johnson, are more difficult to determine.We also may obtain an inaccurate result if a person moved or switched jobs, yet updates  The family highlighted in this video provided a news conference about their experience using their names, which allowed for geolocation.Personal information on both Twitter posts is redacted here due to privacy concerns.
were not made to their social media accounts or other open-access directories.Considerations for privacy are paramount and we are unable to geocode videos with insufficient metadata or lack of other open-access information.Additional information about privacy concerns is discussed in the Limitations and Considerations section.
The 30 November 2018 M7.1 Anchorage, Alaska, United States earthquake (U.S. Geological Survey, 2023) provides an example of how we can use video data to obtain location information.
The 2018 Anchorage earthquake was a deep event (46.7 km or 29 mi deep) and people experienced a maximum Modified Mercalli Intensity (MMI; Stover and Coffman, 1993) VIII (severe shaking and moderate to heavy damage).According to the USGS Prompt Assessment of Global Earthquakes for Response (PAGER), the estimated economic losses were significant, requiring a regional or national response (U.S. Geological Survey, 2023).Fortunately, due to the depth of the earthquake and lessons learned during the 1964 M9.2 Alaska earthquake, there were no earthquake shaking related fatalities (Alaska Earthquake Center, 2018).
As an example, a YouTube video collected by the Anchorage (Alaska) School District shows a classroom of high school students taking the recommended protective action in the United States ('Drop, Cover, and Hold On') within three seconds (Anchorage School District YouTube Channel, 2018).This video demonstrates the importance of earthquake drills, as the students did not hesitate to take the recommended protective measures (Adams et al., 2022).In Figure 4, we provide snapshots of two videos posted to Twitter from personal locations that demonstrate individuals fleeing their homes during earthquake shaking.The earthquake occurred in late November with snow on the ground, thus people who chose to flee risked exposure to the elements (Figure 4).
McBride et al. (2022a) found a total of 124 videos for the Anchorage earthquake from social media (Twitter and YouTube) and news media sources.Videos from the news media typically included multiple video segments, which brought the total up to 145 videos.Geolocating videos also helps to compare the video data gathered at a particular location and remove any duplicates.I geolocated a total of 80 videos (55%) using the procedures outlined above (Figure 5).The output from the Google Maps Geocoding API had a 'rooftop' location type for all but three of the locations, with a median uncertainty of 80 m.
For the 2018 Anchorage earthquake, I determine the level of shaking (seismic intensity) that people experienced based on the USGS ShakeMap (U.S. Geological Survey, 2021).The ShakeMap reports the MMI along with other seismic information, such as peak ground acceleration (PGA) and peak ground velocity (PGV) at certain frequencies (Wald et al., 2006;Worden et al., 2010).The nominal grid spacing for the 2018 Anchorage earthquake is on the order of 0.167º (1.85 km) in both latitude and longitude.I determine the closest grid node by calculating the Euclidean distance between each of the geolocated videos with the USGS ShakeMap information to determine the MMI that people felt during this earthquake.The median distance between the geolocated videos and the closest MMI grid node is on the order of 688 m.
I find that the people who uploaded videos experienced MMI 4.7-7.6 with a median MMI 7.1 (Figure 5).The minimum MMI 4.7 was a video uploaded to YouTube from Seward, Alaska, 140 km away from the  earthquake epicenter.Even with this video removed, the median MMI 7.1 remains.Most of the videos are clustered within Anchorage and neighboring areas such as Eagle River and Wasilla (Figure 5).Additional videos were collected from rural areas of Alaska that experienced light to moderate levels of shaking.For this case study, I use the ShakeMap to determine the seismic intensity of the geocoded videos (color coded circles in Figure 5).However, the videos also may provide a source of information about what people experienced during the event that could help determine the seismic intensity, especially in areas where instrument coverage is sparse and/or people are unaware of the DYFI survey.

Discussion: Applications to Earthquake Early Warning
There are several benefits of determining the geographic location of social science data through geospatial analyses, such as geocoding.First, geocoding helps to reduce both the survey and video datasets.For instance, I want to concatenate survey responses that originate from the same location to better understand EEW alert latencies such as in case study 1, which can be done once the geocoding is completed.With the videos, I can sort them by geolocation and compare the videos to verify the authenticity of the video and make sure that I do not double count.This is particularly helpful for large datasets and instances where the news media uses different cuts of a video or splices/jumps the video to save time.Sorting by locations helps an analyst look through the videos more carefully and determine duplicates that may not have been caught in the initial processing.
Second, the geocoding of social science data allows researchers to determine whether earthquake early warning alerts are reaching areas within the alerting geofence.The data latencies in alert receipt, and whether an alert was even received or not, can then be examined by location.In EEW, there is a seismic intensity threshold at which people want to be alerted that varies from country to country (e.g., Nakayachi et al., 2019;Becker et al., 2020;Bostrom et al., 2022).Further, if an alert is deemed appropriate for a given spatial area, alerts need to stay within that area and not 'leak' outside of the alerting zone.If not, alerting areas that do not feel shaking or only feel light shaking could potentially give rise to the 'cry wolf' effect (e.g., LeClerc and Joslyn, 2015).
Through geocoding techniques, McBride et al. ( 2023) find that the alerts mostly stay within the geofence during a test of the ShakeAlert system in Oakland, California.However, geocoding demonstrates that data latencies within the alerting geofence are on the order of 10s and that even individuals at the same location within the alerting geofence might not all receive an alert, as demonstrated by the new analysis of the top ten locations that received alerts as presented in Figure 3.These findings give rise to concerns over the long latencies in alert delivery and in the variation between cellphones that receive an alert (e.g., cellphone carriers, wireless data transmission networks, and cell phone types).In a technical test of the system, McBride et al. (2023) found that there did not appear to be any technological privilege associated with different cell phone types; further examination outside of the lab and placed into practice is still needed.
Third, geocoding social science data allows for an understanding of what people experienced during an earthquake.Geocoding allows us to correlate a particular location with its seismic intensity, as demonstrated through the video reconnaissance footage.An understanding of seismic intensity, which is location dependent, provides information about an individual's choice of protective action (if any).From surveys collected in Japan and New Zealand, people tend to use the time afforded by earthquake early warning to mentally prepare themselves for shaking and do not take a protective action (Nakayachi et al., 2019;Becker et al., 2020).Survey respondents reported that they mentally prepare themselves over taking a physical protective action because they still expect low shaking intensities that would not warrant protection, even when they receive an alert.Mental preparation was also found from video footage collected after an alert was sent during the 2021 M6.2 Petrolia, California, United States earthquake (Baldwin, 2022).Geolocation allows researchers to place the video footage in context with seismic intensity and allows for a better understanding of what people experienced during an earthquake and whether this impacts their choice of protective action.
Conversely, the geolocation of social science data may provide additional information about what happened during an earthquake (e.g., books falling off shelves, light fixtures shaking, etc.), which aids in determination of seismic intensity and whether alerts were received by those who should have.Instrumental intensity is collected by seismometers from around the world, which can readily detect moderately sized earthquakes (M5+) at large epicentral distances (e.g., Ekström et al., 2012).However, in remote areas, areas without dense seismometer coverage, and/or for small earthquakes (M<3), collecting instrumentally recorded information may be challenging.The geolocation of surveys has already proven useful for seismologists to better understand seismic intensity through the USGS 'Did You Feel It?' survey (Wald et al., 2011;Quitoriano and Wald, 2020;Goltz et al., 2022), where location and now even EEW alerting information can be asked directly.A potential next step for 'Did You Feel It?' would be to upload videos that could corroborate survey response information, such as objectively viewing how long shaking lasted instead of relying on survey responses alone.
The videos also capture the duration of earthquake shaking and what people experienced during an earthquake, which may affect how a person or group chooses how to respond to an earthquake, early warnings, and in the aftermath of an event (e.g., Jon et al., 2016;Vinnell et al., 2022).Conversely, these social science data may also be relevant to physical science in helping to constrain a duration magnitude (e.g., Lee et al., 1972;Eaton, 1992;Hirshorn et al., 1987), with the realization that one would have to correct for 'building response' (instead of instrument response) which is affected by the amplitude, duration, and frequency of earthquake shaking.This may prove too difficult to use for magnitude in practice, as each building would have its own response to correct for, yet these videos may be able to help in regions where seismic networks are sparse and more data is needed.Both magnitude and intensity are required parameters in estimating earthquake alerting accuracy and calibrating alerting thresholds.
In addition, earthquake early warning is simply one mechanism to help individuals and communities prepare for earthquakes, know what protective actions to take during an earthquake, and how to respond in the aftermath of an event.These survey and video data also could help structural and civil engineers, emergency responders, and even insurance companies accurately account for damage that occurred during earthquakes (e.g., Coburn and Spence, 2002).VERTs collect videos to understand human behavior during earthquakes and to assess the level of damage within a particular region for structural health monitoring purposes (e.g., McBride et al., 2022a).The combination of video reconnaissance with geolocation can also help emergency responders by showing where damaged areas are after an earthquake event and therefore prioritizing where emergency services are needed most (e.g., et al., 2012;Li et al., 2022).Videos could also demonstrate to insurance companies unbiased information about the damage sustained during an earthquake, to better document how the earthquake impacted a particular building and/or adjust insurance rates.Broadly, geocoding can assist in better understanding the relationship between seismic intensity and earthquake damage, which can be used to calibrate risk informed earthquake early warning alerting thresholds.

Limitations and Considerations
The user response information obtained via surveys or videos may be biased.Surveys can be biased because only those willing to fill out the survey and contribute respond (otherwise known as self-selection or a convenience sample), so they often do not include a representative sample of a particular population (e.g., Sackett, 1979;Salkind, 2010;Sumy et al., 2020;McBride et al., 2023;Goltz et al., 2020).For earthquakes or other potentially traumatic experiences, survey information may be biased depending on their own perceptions, such as people often thinking that earthquake shaking lasts longer than they experienced or other exaggerated reports (e.g., Fraser et al., 2016;Bossu et al., 2017).
While videos may present an opportunity for more objective information, the videos can be cut, cropped, or otherwise filtered or changed in some way that could also provide biased information.Also, those who stop to take videos versus those who have security cameras operating in the background have likely altered their behavior in some way, such that they are not taking an appropriate protective action (e.g., Martin-Jones, 2022).These considerations likely bias the data.
In addition, tradeoffs exist between the information that survey respondents provide and the geolocation uncertainty.For example, although landmarks are straightforward to geocode from survey responses, they may be overemphasized because a landmark is easily identifiable and may garner a disproportionate number of mentions in the survey responses.In the case study on Oakland, California, this is unlikely to be the case due to self-selection bias as the surveys went to primarily local and state government office 'landmarks' (Figure 3).While a landmark can be easily geocoded, there is uncertainty of whether a person was at this location or not.The bias towards landmark information may need to be considered in future applications of the geocoding methodology.
The collection of survey and video data around a potentially traumatic earthquake experience and narrative must be considered with care, for both the human subject and the researcher.For instance, the USGS DYFI? survey is subject to the Privacy Act of 1974 and the Paperwork Reduction Act of 1995, respectively.Location information can only be asked by generic questions (e.g., ZIP code, landmark, partial address, etc.), and a street address cannot be asked for directly or specifically requested (Goltz et al., 2020).This limits our ability to geocode all addresses and adds to the uncertainty in our location information.Ethical considerations around privacy limit the ability to reach out to a survey respondent, even if their contact information is provided, and care must be taken to keep their responses confidential and anonymous when working collaboratively due to cybersecurity concerns (e.g., Natural Hazards Center, 2021).For the survey collected in Oakland, California, the geolocations were for mostly commercially zoned and public places since the test of the ShakeAlert system took place in downtown Oakland on a weekday before COVID-19, which alleviated a privacy concern.I note that human subjects research approval is only required with the surveys and not when we download publicly available video information.
In addition, it is important to not directly contact the individuals who responded to a survey or uploaded a video to protect their privacy, as further inquiry may cause emotional upset or harm.In turn, researchers' interactions with the video data must be limited because viewing someone's experience during a natural hazard event can also be traumatic (e.g., Kiyimba and O'Reilly, 2016).Secondary trauma, when another individual sees or listens to the traumatic experience of another person, also can take an emotional toll on the part of the researcher.Reducing or limiting the amount of daily interaction with the video data and/or turning the sound off can lessen the impact of secondary trauma on the researcher (McBride et al., 2022a).
For geocoding purposes, accurate data entry can significantly improve the ability to geolocate the data (e.g., Yang et al., 2004;Kilic and Gülgen, 2020).As researchers, we need to consider how important the accuracy and precision of the geocoded result needs to be (e.g., Roongpiboonsopit and Karimi, 2010), which will vary based on the research questions and context.For understanding EEW alert receipt and seismic intensity, I would want the most accurate and precise information possible, with an uncertainty on the order of meters.As determined through other studies, the street addresses with 'rooftop' location type output from the Google Maps Geocoding API typically produces a geolocation within the footprint of the building, produces better results within the United States compared to internationally, and is the best among web-based solutions, with errors on the order of tens of meters (e.g., Chow et al., 2016;Kilic and Gülgen, 2020).
Additional sources of uncertainty include the online, freely available, personal records used.These records (like voting records) can sometimes be out of date, and there is very little control or understanding of the uncertainty of this information in this study.For instance, someone could have moved locally or have a relatively common last name that makes it sometimes difficult to determine whether the information is correct.In particularly transient areas or for socially vulnerable individuals, online records such as the White Pages may be incorrect or out of date (e.g., Dempsey, 2022).As an example, I looked my own name up in online records and found that my listed address is incorrect.The limitations and considerations around the data and the geocoding methodology limit our ability to extend this work to a plethora of physical science applications, yet these limitations may be overcome in the future.

Conclusions and Future Directions
Here I demonstrate the usefulness of geocoding social science data to improve the ShakeAlert earthquake early warning system in the United States.The novelty here is not in the geocoding method itself, but rather in its application to survey and video data used to better understand the functionality and inform potential improvements to EEW.Geocoding social science data allows researchers to: 1) determine whether earthquake early warning alerts stay within the alerting geofence, so as to not cause undue panic or stress to those who may only experience light shaking; 2) determine when an alert is received at a particular location and whether there is a range of data latencies at a particular location to suggest improvements to the system, such as demonstrated in the case study for Oakland, California; and 3) correlate the survey or video location with seismic intensity to corroborate what a person experienced during an earthquake to more accurately calibrate earthquake early warning alerting thresholds, such as demonstrated with the 2018 M7.1 Anchorage, Alaska earthquake.The approaches described here are very manually intensive, requiring a team of researchers to manually collect and analyze data, which can take months or more.A future direction includes incorporating machine learning and artificial intelligence techniques to simplify the data gathering, geolocation analysis, and understanding of human behavior (e.g., Chachra et al., 2022;Ofli et al., 2022).
In addition, geolocation has underexplored and underutilized seismological applications for earthquakes that occur in relatively remote and rural areas, structural and civil engineering applications for structural health monitoring, and emergency response and management to provide resources to areas who need them the most, to name a few (e.g., Kankanamge et al., 2019).The geocoding of online and thumbnail questionnaires, such as DYFI? (Wald et al., 2011;Quitoriano and Wald, 2020), the European-Mediterranean Seismological Center's LastQuake app (Bossu et al., 2015(Bossu et al., , 2018)), and the University of California-Berkeley's MyShake app (Chachra et al., 2022;Kong et al., 2023), contributes to the situational awareness in emergency response after an earthquake.Thus, the future of geocoding for the benefit of EEW lies with calibrating these felt reports with who received an alert (or not) to determine appropriate EEW intensity thresholds for a particular area and how people responded during the event (e.g., Goltz et al., 2022), and adjust the thresholds if necessary.
Additionally, cell phone applications and their location-based services improve situational awareness and emergency response efforts.However, we need to look beyond those who are using EEW apps to those who are not (e.g., Bopp and Douvinet, 2022).Through geocoding, we may find potentially vulnerable sociodemographic groups who we need be thoughtful about how to best reach through alerting strategies.Targeted public education and outreach campaigns around earthquake early warning to these communities, potentially through drills in formal education environments (Adams et al., 2022) or at museums and other free-choice learning environments (Sumy et al., 2022b), may provide a potential solution.As earthquake early warning is expanding in use worldwide (Allen and Stogaitis, 2022;McBride et al., 2022a), a focus on communities who might not have the socioeconomic ability or technological privilege to use apps or receive alerts (e.g., due to the poor coverage of wireless communication networks), have language barriers that prevent their understanding of alert messages, and/or other access and functional needs will help drive education and outreach around earthquakes and early warning in a way that can increase societies' resilience and disaster preparedness (e.g., Sumy et al., 2022a).

Figure 2
Figure2Flowchart showing the survey or video data inputs, the types of answers produced based on the information provided, the precision of these types of information, and the location type output from the Google Maps Geocoding Application Programming Interface (API).
McBride et al. (2023) conducted two tests of the ShakeAlert system in coordination with the Federal Emergency Management Agency's (FEMA) Integrated Public Alerting & Warning System (IPAWS) Wireless Emergency Alert (WEA) in Oakland and San Diego County, California, respectively, before the system went live for public alerting for California in October 2019.The two survey questions asked about location were: 1) What was your physical location [during the test]?You can choose to report your Zone Improvement Plan (ZIP) code, physical address, or suburb, and 2) If you do not know your exact location, can you provide the closest identifiable landmark?These two questions allowed McBride et al. (

Figure 3
Figure3(a) Map of the alerting geofence (red polygon) and the ten locations with the largest survey response.The symbols are color-coded by their median data latency (e.g., when the alert arrived at a particular location) and sized by the number of survey responses that reported receiving an alert.Alerts that arrived at >120 s at that location are removed and not considered in the median calculation.The numbers at each location refer to the landmarks identified in the x-axis of Figure3b.There are locations that received alerts that are located outside of the geofence and are denoted with a black circle.(b) The number of survey responses by location.We examine the number of received alerts (white), with alerts that arrived >120 s (red), inexact timing of alerts (grey), and did not receive (black).The Alameda County Administration Building (ACAB) received the most alerts and is located on the perimeter of the alerting geofence.

Figure 4
Figure 4 Videos posted to Twitter from home security cameras.(a) Two adults flee their home during earthquake shaking.The hashtag #earthquake, information about which earthquake in the post, and the timing of the posting all help to determine that this was the 30 November 2018 Anchorage, Alaska, United States earthquake.This information was posted to a personal account which helped with geolocation.(b) The Daily Mail US obtained video footage from an adult who evacuated a house with a child.Note that the adult is barefoot and lightly clothed outside in the snow where exposure to the weather presents a concern.The family highlighted in this video provided a news conference about their experience using their names, which allowed for geolocation.Personal information on both Twitter posts is redacted here due to privacy concerns.

Figure 5
Figure 5 Intensities from the 2018 Anchorage, Alaska, United States earthquake.The locations of the video footage (circles) in the Anchorage, Eagle River, and Wasilla areas are color-coded by Modified Mercalli Intensity (MMI).The instrumental (grey triangles) and 'Did You Feel It?' (grey squares) intensity information are shown in the background to provide context as to how this video information may help.MMI 6, 6.5, and 7 contours are shown and labeled.

Table 1
Google Maps Geocoding API output for the Oakland test of the ShakeAlert system