Geospatial Route Clustering¶
Objectif : décoder les traces GPS de toutes les activités, les visualiser sur carte interactive et identifier les zones d'entraînement par clustering géographique (DBSCAN).
- Chargement des polylines depuis MySQL
- Décodage Google Encoded Polyline → coordonnées GPS
- Carte de toutes les routes par sport
- Heatmap de densité
- Clustering DBSCAN des zones d'entraînement
- Progression annuelle
In [1]:
import sys
sys.path.append('..')
import pandas as pd
from src.data.geo_loader import load_activities_with_polylines
from src.features.geo_features import decode_all, cluster_by_start, cluster_stats
from src.viz.map_charts import map_routes_by_sport, map_heatmap, map_clusters, map_annual_progression
1. Chargement des activités avec polylines¶
In [2]:
USER_ID = 1
df_raw = load_activities_with_polylines(user_id=USER_ID)
print(f"Activités avec polyline : {len(df_raw)}")
print()
print(df_raw.groupby('type').agg(
n=('id', 'count'),
dist_moy_km=('distance_km', 'mean'),
annees=('year', lambda x: f"{x.min()}-{x.max()}")
).round(1).to_string())
Activités avec polyline : 2290
n dist_moy_km annees
type
Canoeing 2 2.5 2017-2022
Crossfit 6 0.8 2016-2016
Hike 91 5.6 2016-2025
Kayaking 6 4.9 2022-2024
Ride 710 25.5 2014-2026
Run 1255 5.2 2014-2026
StandUpPaddling 1 2.0 2022-2022
Swim 88 1.0 2016-2026
Walk 34 4.0 2021-2024
Workout 97 0.5 2016-2026
2. Décodage des polylines¶
In [3]:
df = decode_all(df_raw, sample_n=25)
print(f"Activités décodées : {len(df)}")
total_points = sum(len(c) for c in df['coords'])
sampled_points = sum(len(c) for c in df['coords_sampled'])
print(f"Points GPS totaux : {total_points:,}")
print(f"Points GPS samplés : {sampled_points:,}")
print()
print("Exemple de route (5 premiers points) :")
print(df.iloc[0]['coords'][:5])
Activités décodées : 2257 Points GPS totaux : 670,675 Points GPS samplés : 53,314 Exemple de route (5 premiers points) : [(47.23788, 5.13043), (47.23804, 5.13034), (47.23873, 5.13029), (47.23917, 5.13022), (47.23953, 5.13014)]
3. Carte des routes par sport¶
In [4]:
m_sports = map_routes_by_sport(df, max_routes=800)
m_sports.save('../data/map_routes_by_sport.html')
print("Carte sauvegardée : data/map_routes_by_sport.html")
m_sports
Carte sauvegardée : data/map_routes_by_sport.html
Out[4]:
Make this Notebook Trusted to load map: File -> Trust Notebook
4. Heatmap de densité GPS¶
In [5]:
m_heat = map_heatmap(df)
m_heat.save('../data/map_heatmap.html')
print("Carte sauvegardée : data/map_heatmap.html")
m_heat
Carte sauvegardée : data/map_heatmap.html
Out[5]:
Make this Notebook Trusted to load map: File -> Trust Notebook
5. Clustering DBSCAN des zones d'entraînement¶
DBSCAN groupe les activités dont les centroïdes (milieu géographique de la route) sont à moins de eps_km l'un de l'autre.
Un cluster = une zone géographique d'entraînement récurrente.
In [6]:
df_clustered = cluster_by_start(df, eps_km=3.0, min_samples=5)
stats = cluster_stats(df_clustered)
n_clusters = (df_clustered['cluster'] >= 0).sum()
n_noise = (df_clustered['cluster'] == -1).sum()
print(f"Clusters trouvés : {df_clustered['cluster'].max() + 1}")
print(f"Activités clusterisées : {n_clusters} / {len(df_clustered)}")
print(f"Activités isolées (bruit) : {n_noise}")
print()
print("Stats par cluster :")
print(stats[['n_activities', 'sports', 'avg_distance_km', 'total_distance_km', 'date_first', 'date_last']].to_string())
Clusters trouvés : 28
Activités clusterisées : 2101 / 2257
Activités isolées (bruit) : 156
Stats par cluster :
n_activities sports avg_distance_km total_distance_km date_first date_last
cluster
1 901 Canoeing, Crossfit, Hike, Ride, Run, Swim, Walk, Workout 7.29 6566.80 2015-01-04 02:25:41 2024-07-06 06:06:47
17 481 Canoeing, Hike, Ride, Run, Walk, Workout 9.72 4676.09 2019-05-18 09:33:46 2024-08-29 08:37:31
18 303 Hike, Ride, Run, Swim, Walk, Workout 11.73 3553.97 2020-11-29 04:36:54 2026-06-12 06:06:45
7 79 Hike, Kayaking, Ride, Run, Walk, Workout 4.86 383.90 2016-08-12 07:42:59 2025-08-16 08:55:37
19 75 Kayaking, Ride, Run, StandUpPaddling, Swim, Walk, Workout 29.39 2204.20 2021-05-25 01:37:01 2024-07-18 07:11:50
6 42 Hike, Ride, Run, Swim, Walk 6.69 280.82 2016-07-25 06:04:56 2026-05-08 09:48:26
0 29 Hike, Ride, Run, Walk 7.89 228.70 2014-12-26 03:41:00 2025-12-24 09:08:30
14 20 Ride, Run, Swim, Workout 5.89 117.82 2017-09-10 10:03:20 2023-09-10 05:23:54
20 20 Run 7.36 147.28 2022-08-06 06:01:29 2026-05-24 10:28:07
10 15 Ride, Run, Swim, Workout 5.48 82.15 2017-07-01 01:21:27 2022-05-28 11:22:05
25 15 Ride, Run, Swim, Workout 16.12 241.84 2024-09-03 06:45:28 2026-06-07 11:06:09
16 14 Ride, Run, Swim, Workout 4.34 60.80 2018-07-07 01:56:06 2021-07-03 04:01:24
11 13 Hike, Ride, Run, Swim, Workout 4.18 54.30 2017-07-09 09:16:37 2023-06-25 05:20:18
3 9 Ride, Run, Swim, Workout 4.58 41.25 2016-02-07 02:02:46 2018-05-12 11:41:35
21 9 Ride 49.94 449.43 2023-05-13 05:07:37 2024-07-19 05:03:11
27 8 Run, Swim, Workout 5.84 46.74 2025-05-30 07:20:50 2026-05-15 11:31:40
8 8 Ride, Run, Swim, Workout 3.87 30.93 2016-11-27 02:56:12 2022-05-01 02:20:51
2 7 Run, Swim, Workout 1.35 9.42 2015-11-15 11:29:58 2017-06-03 04:39:39
4 6 Ride, Run, Workout 7.51 45.05 2016-03-13 09:57:03 2017-03-19 03:42:14
24 6 Run, Swim, Workout 11.71 70.25 2024-06-29 07:23:15 2025-06-22 07:40:07
23 6 Hike, Ride, Run, Swim, Workout 4.80 28.79 2023-08-18 03:04:47 2023-08-19 11:38:48
5 5 Hike, Run 2.89 14.45 2016-07-13 03:50:47 2021-12-24 09:03:08
9 5 Ride, Run, Swim, Workout 5.26 26.30 2017-05-13 02:44:11 2017-05-13 03:41:27
15 5 Ride, Run, Workout 2.71 13.53 2018-03-11 01:00:55 2018-03-11 03:09:14
13 5 Ride, Run, Swim, Workout 5.03 25.15 2017-09-03 02:53:19 2017-09-03 04:01:46
12 5 Swim 0.57 2.84 2017-07-29 04:47:24 2022-08-03 07:13:46
22 5 Hike, Ride, Walk 14.47 72.36 2023-06-26 11:02:35 2024-04-07 01:28:17
26 5 Ride, Run, Swim, Workout 5.80 28.99 2025-05-25 01:03:14 2025-05-25 12:58:00
6. Carte des clusters¶
In [7]:
m_clusters = map_clusters(df_clustered, stats)
m_clusters.save('../data/map_clusters.html')
print("Carte sauvegardée : data/map_clusters.html")
m_clusters
Carte sauvegardée : data/map_clusters.html
Out[7]:
Make this Notebook Trusted to load map: File -> Trust Notebook