Multi-source data fusion for modelling urban heat mitigation across six European cities
Gerardo Ezequiel Martin Carreno
MSc Urban Spatial Science · UCL
Six cities · Five data sources · 118 features · Open data
61,000
Deaths from heat across Europe.
One summer. 2022.
Among the deadliest natural disasters in modern European history.
Streets designed to maximise winter sunlight had become heat traps.
Ballester et al. (2023) Nature Medicine
Studies look at buildings, trees, or streets separately. Heat comes from how they interact.
The same tree cools 8–12°C in Berlin but only 0–4°C in Athens. Same strategy, different results.
Planners need the where, the why, and the how. Most models only give the first.
RQ1: What drives urban heat across contexts?
RQ2: Where should cities intervene to protect the most vulnerable?
Oceanic (Cfb): Amsterdam, Paris
Mediterranean (Csa): Athens, Barcelona
Transitional: Berlin Cfb/Dfb · Madrid Csa/BSk
40,344 grid cells at 30m resolution · ~6 km² per city
118 features from 5 open-source tools. Zero proprietary data. Any European city can replicate this tomorrow.
Land Surface Temperature
NDVI Vegetation Index
Where vegetation disappears, temperatures spike. This inverse pattern is the dependent variable our model learns to predict.
Source: Urbanity global dataset
Our six cities: network topology
Amsterdam’s regular canal grid channels cooling winds. Athens’s organic fabric traps them. Network topology determines whether park cooling reaches surrounding streets.
Source: 10M street-level images
Our six cities: coverage points
Satellites see canopy from above. Street imagery sees shade from below. Divergence between the two validates the multi-source approach.
Canyon geometry drives nighttime heat retention. Paris’s uniform Haussmann fabric vs Athens’s irregular growth produce fundamentally different thermal dynamics.
VoxCity 3D model
Voxelised urban form
XGBoost with SHAP explainability: the model shows its work at every stage. The planner keeps the final call.
40,344 cells · 30m resolution · 6 cities · Target: UHI anomaly (each cell’s temperature minus its city mean — not raw LST, which inflates R² to 0.97 by capturing climate, not urban form)
Why XGBoost? Heat responds non-linearly to urban form. Trees cool effectively until water stress kicks in. Sealed surfaces trap heat, but the tipping point differs by climate. We need a model that captures these thresholds and shows its reasoning.
Global R² 0.695 → Blended 0.841. Climate-zone specialisation cuts error by 48%.
Hot areas sit next to hot areas. Without careful testing, the model could just memorise where things are instead of learning why they’re hot. Five checks prevent that.
Spatial cross-validation
The model never sees its test area’s neighbours during training. 5 folds, 600m spatial blocks, stratified by heat intensity and density.
City-demeaned targets
Raw temperature R² = 0.97, but that mostly captures climate. Subtracting each city’s mean isolates what urban form does. R² = 0.84.
Scale sensitivity (MAUP)
Tested at 30m, 60m, 90m. At 60m the model loses 17% of explained variance; it falls between radiative and advective physics. 30m is the right scale.
Residual autocorrelation
Moran’s I confirms significant spatial clustering in predictions (I = 0.66–0.92, p<0.001), validating why spatial blocking was essential.
Stability & tuning
5 random seeds (σR² = 0.064). 200 Bayesian trials optimised 820 trees, depth 6, lr 0.03. Features: 165 base → 189 with spatial lags → 118 after correlation filter (|r|>0.92).
The model learns morphology, not spatial patterns.
| City | R² | RMSE |
|---|---|---|
| Paris | 0.932 | 0.49°C |
| Barcelona | 0.926 | 0.85°C |
| Amsterdam | 0.880 | 0.44°C |
| Athens * | 0.730 | 0.63°C |
| Berlin | 0.727 | 0.66°C |
| Madrid | 0.718 | 1.53°C |
The model works, but accuracy isn’t the finding. What the model reveals about what drives heat is.
* Athens R² drops with blending (−2.8%): its topographic basin overrides climate-zone correction. That’s not a failure; it’s a finding. Geography matters.
De-sealing matters 3× more than tree planting.
Trees provide shade, biodiversity, air quality. But for temperature reduction, the surface is the stronger lever.
Geographic contingency isn’t noise.
It’s the signal.
The regression tells us how hot each cell is. But planners need to know which areas are dangerously hot. That’s a classification problem: we turn continuous temperature into a binary risk flag.
When classifying hotspots, child density becomes #1. ~2× higher hotspot probability in vulnerable areas.
Priority: Heat risk (40%) + Vulnerability (35%) + Cooling potential (25%)
Process per scenario
Modify features in priority zones (top 10%)
Re-predict with blend model
Compute cooling Δ vs. baseline
Rank by cooling and cost-effectiveness
⚠ Limitations
Correlation ≠ causation: predictions are hypotheses for field validation
Feature correlation: impervious & vegetation (r≈−0.7) partially double-count cooling
All 5 objectives converge → 50% de-sealing
We know where hotspots are and what drives heat. Now the prescription question: how much change is enough? We tested four levers in a full grid search: depaving (30-50%) · vegetation (+10-50%) · tree canopy (+15-50%) · albedo (+0-20%) = 648 combinations, each bootstrapped 500 times.
All optimal strategies converge on 50% de-sealing. Below this threshold, evaporative cooling pathways become viable — a threshold effect.
| Strategy | Depaving | Vegetation | Trees | Albedo | Cooling |
|---|---|---|---|---|---|
| Maximum | 50% | +50% | +50% | +20% | −1.27°C |
| Cost-effective | 50% | +10% | +20% | 0% | −1.20°C |
95% of maximum cooling with substantially fewer resources.
Same anchor everywhere: de-seal first. Supporting levers change by climate and morphology.
e.g. permeable paving in basin-floor neighbourhoods like Omonia
e.g. opening sealed Eixample courtyards, Superblocks-style depaving
e.g. OASIS schoolyard depaving, extending Seine corridor cooling
e.g. Schwammstadt sponge-city approach in polycentric cores
e.g. cool surfaces + permeable paving (canals already cap water cooling)
e.g. cool roofs + drought-tolerant planting (avoid water-intensive greening)
Predicted cooling from cost-effective strategy
The cost-effective strategy still includes +20% tree canopy. This isn't "stop planting trees". It's "invest in the surface beneath, not just the canopy above."
⚠ Green gentrification risk
Greening raises property values and can displace the people it aims to protect. Documented in Barcelona Superblocks and NYC’s High Line.
Our vulnerability weighting (35%) targets current vulnerable populations, but the model can’t prevent market dynamics. That requires policy.
The model says where. Communities decide how.
1
Cooling propagates 300 m under wind. Coordinated action across blocks, not plots.
2
Permeability drives 3× more cooling than vegetation. De-sealing deserves equal investment.
3
Same tree cools 8–12°C in Berlin but 0–4°C in Athens. Geography is the signal.
But: GEE features hold the top 8 SHAP positions — satellite data alone carries most of the signal. Any city with free Landsat access can run this pipeline tomorrow.
Waiting for perfect evidence while heatwaves kill thousands is itself a policy choice.
Every city can start
shaping cooler cities
today.
Open data. Open tools. Open method.
61,000 deaths demand tools that work now, with data cities already have.
Anguelovski, I. et al. (2019) Green gentrification. Landscape Urban Plan.
Ballester, J. et al. (2023) Heat-related mortality in Europe, summer 2022. Nature Medicine, 29(7), 1857–1866.
Ballester, J. et al. (2025) Heat-related mortality in Europe, summer 2024. Nature Medicine.
Camps-Valls, G. et al. (2025) AI for modeling and understanding extreme weather and climate events. Nature Communications, 16, 1919.
Chen, T. & Guestrin, C. (2016) XGBoost. Proc. ACM SIGKDD, 785–794.
Cheval, S. et al. (2024) Systematic review of UHI. Climate Risk Mgmt., 44.
Eyni, A. et al. (2025) Distributional outcomes of UHI reduction pathways. Scientific Reports, 15, 93896.
Fujiwara, K. et al. (2026) VoxCity: 3D city model generation. Comput. Environ. Urban Syst.
Giannopoulou, K. et al. (2011) Athens urban heat island. Climatic Change, 104(3).
Gorelick, N. et al. (2017) Google Earth Engine. Remote Sens. Environ., 202, 18–27.
Gunawardena, K. et al. (2017) Evaporative cooling in urban areas. Sci. Total Environ., 590, 758–775.
Hoffman, J. S. et al. (2020) Historical housing policies and intra-urban heat. Climate, 8(1), 12.
Hou, Y. et al. (2024) GlobalStreetscapes. ISPRS J. Photogramm.
Kovats, R.S. & Hajat, S. (2008) Heat stress and public health. Annu. Rev. Public Health, 29.
Lundberg, S. & Lee, S. (2017) SHAP values. NeurIPS, 30.
Milojevic-Dupont, N. et al. (2023) EUBUCCO. Scientific Data, 10, 146.
Oke, T.R. et al. (2017) Urban Climates. Cambridge University Press.
Santamouris, M. (2020) Heat vulnerability. Energy and Buildings.
Schwaab, J. et al. (2021) Urban trees reducing LST in European cities. Nat. Comms., 12, 6763.
Seneviratne, S.I. et al. (2021) Extreme events in a changing climate. IPCC AR6 WG1, Ch. 11.
Stewart, I.D. & Oke, T.R. (2012) Local Climate Zones. BAMS, 93(12), 1879–1900.
Yap, W. et al. (2023) Urbanity: automated modelling and analysis of multidimensional networks. npj Urban Sustainability, 3, 45.
Open data · Open tools · Open method
Gerardo Ezequiel Martin Carreno · UCL CASA · The Bartlett