Towards Linking Mobility, Environment, and Behavior: A Scalable Framework Using Smartphone Location Data
About the event
Towards Linking Mobility, Environment, and Behavior: A Scalable Framework Using Smartphone Location Data
Ph.D. Preliminary Examination for Olufunso Oje
Advisor: Assefaw Gebremedhin
Abstract
Smartphone-derived location data has opened new opportunities for understanding human mobility, environmental exposure, and behavior at a level of granularity never before seen. Among such data sources, Google Location History (GLH) provides one of the most temporally rich and spatially detailed records of individual mobility, passively collected from consenting users over months to years. While the promise of GLH is great, realizing its full potential in research requires the development of scalable methods for extraction, cleaning, storage, enrichment, and analysis. More importantly, these methods must address data privacy issues, mismatches in spatiotemporal partition, and the lack of frameworks for integrating environmental and behavioral context. This proposal presents a comprehensive system that addresses these challenges and lays the groundwork for a scalable and extensible geospatial data analysis framework built around GLH data.
The first component of this work focuses on building a data pipeline for processing the raw and semantically annotated GLH data. The dataset contains over 280 million location points collected over a decade for specific individuals from 372 participants. The data comprises three elements: raw location traces, activity segments (e.g., walking, driving), and place visits (e.g., store, restaurant). We developed a pipeline to extract these records from JSON exports. The process involved normalizing the formats, resolving inconsistencies, and assigning a unique identifier to each participant. The cleaned data is stored in a partitioned PostgreSQL database.
To attach meaningful context to the location data, particularly environmental factors like greenspace (NDVI), air pollution (NO₂, PM₂.₅), and proximity to parks or food outlets, a method was needed to resolve the mismatch between the fine-grained GLH data and the coarser or variable resolutions of environmental datasets. As a result, we created a new algorithm called Hierarchical Grid Partitioning (HierGP), which builds a tree-based system of spatial grid cells with multiple levels of resolution. HierGP allows for the efficient binning of location points to grid cells that align with the spatial resolution of environmental data. It combines spatial partitioning with temporal binning and supports what this framework calls Spatiotemporal Partitions (STPs). This reusable unit reflects location and time in a format that works for environmental datasets. This framework allows for extracting environmental metrics, avoids querying each location point individually, and supports privacy preservation by ensuring that environmental data is attached at the STP level, not the raw point level. The HierGP algorithm has been published (Oje et al. 2024) and is the core linkage system for the remaining analyses.
An extension of the work was the development of a visit detection methodology for identifying people’s interaction with food outlets. Traditional methods rely primarily on GIS-based buffers around residential addresses, which can misrepresent real behavior. Using the GLH data, we developed a method to identify actual food outlet visits by individuals based on distance to the outlet, duration of stay, revisit intervals, and other temporal characteristics. This methodology has also been published (Oje et al. 2025).
In addition to this, we generated personalized environmental reports for each participant, showing their exposure to green space, air pollution, park visitation, and outdoor walking time. These reports provide easy-to-understand graphics showing how their specific metrics stacked up against the average for their cohort and a snapshot of their environmental and behavioral trends over time. A sample report is included in the appendix.
In addition to the methodological developments above, the enriched dataset has been used in related downstream studies. One study (published) investigated the relationship between walking activity and obesity. The analysis showed that individuals with higher GLH-measured walking frequency had significantly lower odds of obesity. Another ongoing study examines how greenspace exposure is associated with mental health during the COVID-19 pandemic. The study tracks changes in mobility, outdoor activity, and exposure to green environments across different pandemic phases to uncover links between behavioral changes and anxiety, depression, or stress symptoms.
A framework paper is being drafted to formally present the entire pipeline, from raw data ingestion to environmental enrichment, behavioral modeling, and application. In contrast to previous publications on specific components (e.g., grid partitioning or food outlet detection), this manuscript will present the system, enabling reproducibility and usage in other mobility environment research settings.
This proposal seeks to advance geospatial health research by creating a robust and scalable framework for working with passively collected location data. The work ranges from methodological innovation to applied analytics and tool development. Integrating behavioral, environmental, and spatial data enables a more fine-grained analysis of the interrelations between place, behavior, and health.