7–11 Jul 2025
Yildiz Technical University, Istanbul
Europe/Brussels timezone

Assessing Walkability with Multimodal Large Language Model: Opportunities and Challenges

Not scheduled
20m
Yildiz Technical University, Istanbul

Yildiz Technical University, Istanbul

Oral Track 11 | EMERGING TECHNOLOGIES

Speaker

Dr Donghyun Kim (UNIST)

Description

Recent advances in Street View Imagery (SVI) and computer vision technologies have significantly improved the ability to capture urban features and overcome the limitations of traditional field-based audits. However, previous computer vision methods have primarily focused on mapping spatial distributions such as visual complexity, street enclosure, and greenery, often relying heavily on datasets such as Place Pulse for subjective analysis (Zhang et al., 2018; Ma et al., 2021). Despite their potential, these methods face high barriers to entry due to complex model training and significant computational resource requirements (Ito et al., 2024). The emergence of multimodal large language models (MLLMs), such as GPT-4, offers a promising alternative by allowing intuitive operation via textual prompts, thereby lowering the barriers to using SVI data for urban studies. However, the applicability of MLLMs to assess core urban planning concepts such as walkability remains largely unexplored.
This study investigates the potential of MLLMs to assess walkability by performing automated pairwise comparisons of 200 Street View images representing local streets (width ≤12m) in Seoul, Korea, selected based on land use and building characteristics. Based on Alfonzo's Hierarchy of Walking Needs, the MLLM assessed safety, comfort, enjoyment, and overall walkability - key factors driving pedestrian demand - while providing detailed justification for its choices (Alfonzo, 2005). To validate the reliability of the MLLM, six iterative evaluation sets were conducted to ensure consistent results. Natural language processing techniques were used to analyze the justifications, providing insight into how different streetscape elements were prioritized in the evaluations.
To test validity, the same rating sets were presented to 30 urban planning experts, and their responses were compared to those of the MLLM. This approach allows for an in-depth discussion of the MLLM's potential to provide scalable and cost-effective walkability assessments, while also highlighting its limitations, such as capturing nuanced cultural and contextual factors. By bridging cutting-edge AI technologies with urban planning, this study contributes to the advancement of automated tools for assessing walkability and supporting equitable, sustainable urban design.

References

Alfonzo, M.A., 2005. To Walk or Not to Walk? The Hierarchy of Walking Needs. Environment and Behavior, 37(6), pp.808–836.
Ito, K., Kang, Y., Zhang, Y., Zhang, F., & Biljecki, F., 2024. Understanding urban perception with visual data: A systematic review. Cities, 152, p.105169.
Ma, X., Ma, C., Wu, C., Xi, Y., Yang, R., Peng, N., Zhang, C., & Ren, F., 2021. Measuring human perceptions of streetscapes to better inform urban renewal: A perspective of scene semantic parsing. Cities, 110, p.103086.
Zhang, F., Zhou, B., Liu, L., Liu, Y., Fung, H.H., Lin, H., & Ratti, C., 2018. Measuring human perceptions of a large-scale urban region using machine learning. Landscape and Urban Planning, 180, pp.148–160.

Keywords Multimodal large language model; Walkability; Streetscape; Street-view imagery
Best Congress Paper Award No

Primary authors

Dr Donghyun Kim (UNIST) Prof. Gihyoug Cho (UNIST)

Co-author

Ms Minji Ryu (UNIST)

Presentation materials

There are no materials yet.