Correcting Bias in Crowdsourced Data to Map Bicycle Ridership of All Bicyclists

Tipo de publicação

Artigo

Curso ou área do conhecimento

Urbanismo

Veículo

Uban Science

Tipo de autoria

Pessoa Física

Nome do autor

Avipsa Roy et alii

Língua

Inglês

Abrangência geográfica

País estrangeiro específico

País

Estados Unidos

Ano da publicação

2019

Palavra chave 1

Big data

Palavra chave 2

Crowdsourcing

Palavra chave 3

georreferenciamento

Palavra chave 4

Infraestrutura

Palavra chave 5

Mapeamento

Palavra chave 6

Strava Metro

Descrição

Traditional methods of counting bicyclists are resource-intensive and generate data with
sparse spatial and temporal detail. Previous research suggests big data from crowdsourced fitness
apps offer a new source of bicycling data with high spatial and temporal resolution. However,
crowdsourced bicycling data are biased as they oversample recreational riders. Our goals are to
quantify geographical variables, which can help in correcting bias in crowdsourced, data and to
develop a generalized method to correct bias in big crowdsourced data on bicycle ridership in different
settings in order to generate maps for cities representative of all bicyclists at a street-level spatial
resolution. We used street-level ridership data for 2016 from a crowdsourced fitness app (Strava),
geographical covariate data, and official counts from 44 locations across Maricopa County, Arizona,
USA (training data); and 60 locations from the city of Tempe, within Maricopa (test data). First,
we quantified the relationship between Strava and official ridership data volumes. Second, we used a
multi-step approach with variable selection using LASSO followed by Poisson regression to integrate
geographical covariates, Strava, and training data to correct bias. Finally, we predicted bias-corrected
average annual daily bicyclist counts for Tempe and evaluated the model’s accuracy using the test
data. We found a correlation between the annual ridership data from Strava and official counts (R2 =
0.76) in Maricopa County for 2016. The significant variables for correcting bias were: The proportion
of white population, median household income, traffic speed, distance to residential areas, and
distance to green spaces. The model could correct bias in crowdsourced data from Strava in Tempe
with 86% of road segments being predicted within a margin of ±100 average annual bicyclists. Our
results indicate that it is possible to map ridership for cities at the street-level by correcting bias in
crowdsourced bicycle ridership data, with access to adequate data from official count programs and
geographical covariates at a comparable spatial and temporal resolution.

Prezado usuário: em caso de problemas no acesso ou de incorreção nas informações desta publicação do Acervo ou em caso de sua exposição contrariar direitos autorais, favor entrar em contato com o Observatório da Bicicleta.