Geospatial Route Clustering¶

Objectif : décoder les traces GPS de toutes les activités, les visualiser sur carte interactive et identifier les zones d'entraînement par clustering géographique (DBSCAN).

  1. Chargement des polylines depuis MySQL
  2. Décodage Google Encoded Polyline → coordonnées GPS
  3. Carte de toutes les routes par sport
  4. Heatmap de densité
  5. Clustering DBSCAN des zones d'entraînement
  6. Progression annuelle
In [1]:
import sys
sys.path.append('..')
import pandas as pd

from src.data.geo_loader import load_activities_with_polylines
from src.features.geo_features import decode_all, cluster_by_start, cluster_stats
from src.viz.map_charts import map_routes_by_sport, map_heatmap, map_clusters, map_annual_progression

1. Chargement des activités avec polylines¶

In [2]:
USER_ID = 1

df_raw = load_activities_with_polylines(user_id=USER_ID)
print(f"Activités avec polyline : {len(df_raw)}")
print()
print(df_raw.groupby('type').agg(
    n=('id', 'count'),
    dist_moy_km=('distance_km', 'mean'),
    annees=('year', lambda x: f"{x.min()}-{x.max()}")
).round(1).to_string())
Activités avec polyline : 2290

                    n  dist_moy_km     annees
type                                         
Canoeing            2          2.5  2017-2022
Crossfit            6          0.8  2016-2016
Hike               91          5.6  2016-2025
Kayaking            6          4.9  2022-2024
Ride              710         25.5  2014-2026
Run              1255          5.2  2014-2026
StandUpPaddling     1          2.0  2022-2022
Swim               88          1.0  2016-2026
Walk               34          4.0  2021-2024
Workout            97          0.5  2016-2026

2. Décodage des polylines¶

In [3]:
df = decode_all(df_raw, sample_n=25)
print(f"Activités décodées : {len(df)}")

total_points = sum(len(c) for c in df['coords'])
sampled_points = sum(len(c) for c in df['coords_sampled'])
print(f"Points GPS totaux  : {total_points:,}")
print(f"Points GPS samplés : {sampled_points:,}")
print()
print("Exemple de route (5 premiers points) :")
print(df.iloc[0]['coords'][:5])
Activités décodées : 2257
Points GPS totaux  : 670,675
Points GPS samplés : 53,314

Exemple de route (5 premiers points) :
[(47.23788, 5.13043), (47.23804, 5.13034), (47.23873, 5.13029), (47.23917, 5.13022), (47.23953, 5.13014)]

3. Carte des routes par sport¶

In [4]:
m_sports = map_routes_by_sport(df, max_routes=800)
m_sports.save('../data/map_routes_by_sport.html')
print("Carte sauvegardée : data/map_routes_by_sport.html")
m_sports
Carte sauvegardée : data/map_routes_by_sport.html
Out[4]:
Make this Notebook Trusted to load map: File -> Trust Notebook

4. Heatmap de densité GPS¶

In [5]:
m_heat = map_heatmap(df)
m_heat.save('../data/map_heatmap.html')
print("Carte sauvegardée : data/map_heatmap.html")
m_heat
Carte sauvegardée : data/map_heatmap.html
Out[5]:
Make this Notebook Trusted to load map: File -> Trust Notebook

5. Clustering DBSCAN des zones d'entraînement¶

DBSCAN groupe les activités dont les centroïdes (milieu géographique de la route) sont à moins de eps_km l'un de l'autre. Un cluster = une zone géographique d'entraînement récurrente.

In [6]:
df_clustered = cluster_by_start(df, eps_km=3.0, min_samples=5)

stats = cluster_stats(df_clustered)
n_clusters = (df_clustered['cluster'] >= 0).sum()
n_noise = (df_clustered['cluster'] == -1).sum()

print(f"Clusters trouvés : {df_clustered['cluster'].max() + 1}")
print(f"Activités clusterisées : {n_clusters} / {len(df_clustered)}")
print(f"Activités isolées (bruit) : {n_noise}")
print()
print("Stats par cluster :")
print(stats[['n_activities', 'sports', 'avg_distance_km', 'total_distance_km', 'date_first', 'date_last']].to_string())
Clusters trouvés : 28
Activités clusterisées : 2101 / 2257
Activités isolées (bruit) : 156

Stats par cluster :
         n_activities                                                     sports  avg_distance_km  total_distance_km          date_first           date_last
cluster                                                                                                                                                     
1                 901   Canoeing, Crossfit, Hike, Ride, Run, Swim, Walk, Workout             7.29            6566.80 2015-01-04 02:25:41 2024-07-06 06:06:47
17                481                   Canoeing, Hike, Ride, Run, Walk, Workout             9.72            4676.09 2019-05-18 09:33:46 2024-08-29 08:37:31
18                303                       Hike, Ride, Run, Swim, Walk, Workout            11.73            3553.97 2020-11-29 04:36:54 2026-06-12 06:06:45
7                  79                   Hike, Kayaking, Ride, Run, Walk, Workout             4.86             383.90 2016-08-12 07:42:59 2025-08-16 08:55:37
19                 75  Kayaking, Ride, Run, StandUpPaddling, Swim, Walk, Workout            29.39            2204.20 2021-05-25 01:37:01 2024-07-18 07:11:50
6                  42                                Hike, Ride, Run, Swim, Walk             6.69             280.82 2016-07-25 06:04:56 2026-05-08 09:48:26
0                  29                                      Hike, Ride, Run, Walk             7.89             228.70 2014-12-26 03:41:00 2025-12-24 09:08:30
14                 20                                   Ride, Run, Swim, Workout             5.89             117.82 2017-09-10 10:03:20 2023-09-10 05:23:54
20                 20                                                        Run             7.36             147.28 2022-08-06 06:01:29 2026-05-24 10:28:07
10                 15                                   Ride, Run, Swim, Workout             5.48              82.15 2017-07-01 01:21:27 2022-05-28 11:22:05
25                 15                                   Ride, Run, Swim, Workout            16.12             241.84 2024-09-03 06:45:28 2026-06-07 11:06:09
16                 14                                   Ride, Run, Swim, Workout             4.34              60.80 2018-07-07 01:56:06 2021-07-03 04:01:24
11                 13                             Hike, Ride, Run, Swim, Workout             4.18              54.30 2017-07-09 09:16:37 2023-06-25 05:20:18
3                   9                                   Ride, Run, Swim, Workout             4.58              41.25 2016-02-07 02:02:46 2018-05-12 11:41:35
21                  9                                                       Ride            49.94             449.43 2023-05-13 05:07:37 2024-07-19 05:03:11
27                  8                                         Run, Swim, Workout             5.84              46.74 2025-05-30 07:20:50 2026-05-15 11:31:40
8                   8                                   Ride, Run, Swim, Workout             3.87              30.93 2016-11-27 02:56:12 2022-05-01 02:20:51
2                   7                                         Run, Swim, Workout             1.35               9.42 2015-11-15 11:29:58 2017-06-03 04:39:39
4                   6                                         Ride, Run, Workout             7.51              45.05 2016-03-13 09:57:03 2017-03-19 03:42:14
24                  6                                         Run, Swim, Workout            11.71              70.25 2024-06-29 07:23:15 2025-06-22 07:40:07
23                  6                             Hike, Ride, Run, Swim, Workout             4.80              28.79 2023-08-18 03:04:47 2023-08-19 11:38:48
5                   5                                                  Hike, Run             2.89              14.45 2016-07-13 03:50:47 2021-12-24 09:03:08
9                   5                                   Ride, Run, Swim, Workout             5.26              26.30 2017-05-13 02:44:11 2017-05-13 03:41:27
15                  5                                         Ride, Run, Workout             2.71              13.53 2018-03-11 01:00:55 2018-03-11 03:09:14
13                  5                                   Ride, Run, Swim, Workout             5.03              25.15 2017-09-03 02:53:19 2017-09-03 04:01:46
12                  5                                                       Swim             0.57               2.84 2017-07-29 04:47:24 2022-08-03 07:13:46
22                  5                                           Hike, Ride, Walk            14.47              72.36 2023-06-26 11:02:35 2024-04-07 01:28:17
26                  5                                   Ride, Run, Swim, Workout             5.80              28.99 2025-05-25 01:03:14 2025-05-25 12:58:00

6. Carte des clusters¶

In [7]:
m_clusters = map_clusters(df_clustered, stats)
m_clusters.save('../data/map_clusters.html')
print("Carte sauvegardée : data/map_clusters.html")
m_clusters
Carte sauvegardée : data/map_clusters.html
Out[7]:
Make this Notebook Trusted to load map: File -> Trust Notebook

7. Progression annuelle — une couche par année¶

In [8]:
years = sorted(df['year'].unique())
print(f"Années disponibles : {years}")

m_annual = map_annual_progression(df, years)
m_annual.save('../data/map_annual.html')
print("Carte sauvegardée : data/map_annual.html")
m_annual
Années disponibles : [np.int32(2014), np.int32(2015), np.int32(2016), np.int32(2017), np.int32(2018), np.int32(2019), np.int32(2020), np.int32(2021), np.int32(2022), np.int32(2023), np.int32(2024), np.int32(2025), np.int32(2026)]
Carte sauvegardée : data/map_annual.html
Out[8]:
Make this Notebook Trusted to load map: File -> Trust Notebook

8. Analyse : exploration du territoire¶

Quelle proportion du territoire a été couverte chaque année ? On mesure la surface du bounding box moyen par cluster.

In [9]:
import numpy as np

print("Surface approximative couverte par cluster (bounding box) :")
for c_id, stat in stats.iterrows():
    subset = df_clustered[df_clustered['cluster'] == c_id]
    all_lats = [p[0] for row in subset['coords'] for p in row]
    all_lngs = [p[1] for row in subset['coords'] for p in row]
    lat_span = max(all_lats) - min(all_lats)
    lng_span = max(all_lngs) - min(all_lngs)
    # 1 deg lat ~ 111 km, 1 deg lng ~ 111 * cos(lat) km
    avg_lat = np.mean(all_lats)
    area_km2 = lat_span * 111 * lng_span * 111 * np.cos(np.radians(avg_lat))
    print(f"  Cluster {c_id:2d} ({stat['n_activities']:3d} activités, {stat['sports']}) : {area_km2:.1f} km²")
Surface approximative couverte par cluster (bounding box) :
  Cluster  1 (901 activités, Canoeing, Crossfit, Hike, Ride, Run, Swim, Walk, Workout) : 1560.9 km²
  Cluster 17 (481 activités, Canoeing, Hike, Ride, Run, Walk, Workout) : 878.9 km²
  Cluster 18 (303 activités, Hike, Ride, Run, Swim, Walk, Workout) : 1353.2 km²
  Cluster  7 ( 79 activités, Hike, Kayaking, Ride, Run, Walk, Workout) : 77.3 km²
  Cluster 19 ( 75 activités, Kayaking, Ride, Run, StandUpPaddling, Swim, Walk, Workout) : 704.7 km²
  Cluster  6 ( 42 activités, Hike, Ride, Run, Swim, Walk) : 66.8 km²
  Cluster  0 ( 29 activités, Hike, Ride, Run, Walk) : 21.9 km²
  Cluster 14 ( 20 activités, Ride, Run, Swim, Workout) : 33.7 km²
  Cluster 20 ( 20 activités, Run) : 67.1 km²
  Cluster 10 ( 15 activités, Ride, Run, Swim, Workout) : 42.1 km²
  Cluster 25 ( 15 activités, Ride, Run, Swim, Workout) : 340.7 km²
  Cluster 16 ( 14 activités, Ride, Run, Swim, Workout) : 32.4 km²
  Cluster 11 ( 13 activités, Hike, Ride, Run, Swim, Workout) : 16.4 km²
  Cluster  3 (  9 activités, Ride, Run, Swim, Workout) : 1.4 km²
  Cluster 21 (  9 activités, Ride) : 672.2 km²
  Cluster 27 (  8 activités, Run, Swim, Workout) : 7.6 km²
  Cluster  8 (  8 activités, Ride, Run, Swim, Workout) : 6.8 km²
  Cluster  2 (  7 activités, Run, Swim, Workout) : 1.5 km²
  Cluster  4 (  6 activités, Ride, Run, Workout) : 2.6 km²
  Cluster 24 (  6 activités, Run, Swim, Workout) : 13.5 km²
  Cluster 23 (  6 activités, Hike, Ride, Run, Swim, Workout) : 5.9 km²
  Cluster  5 (  5 activités, Hike, Run) : 2.8 km²
  Cluster  9 (  5 activités, Ride, Run, Swim, Workout) : 1.3 km²
  Cluster 15 (  5 activités, Ride, Run, Workout) : 1.4 km²
  Cluster 13 (  5 activités, Ride, Run, Swim, Workout) : 14.5 km²
  Cluster 12 (  5 activités, Swim) : 0.1 km²
  Cluster 22 (  5 activités, Hike, Ride, Walk) : 5.6 km²
  Cluster 26 (  5 activités, Ride, Run, Swim, Workout) : 8.0 km²