Performance Predictor — CTL / ATL / TSB¶

Objectif : modéliser la charge d'entraînement (Fitness-Fatigue) et prédire les temps de course.

  • CTL (Chronic Training Load) = forme à 42 jours
  • ATL (Acute Training Load) = fatigue à 7 jours
  • TSB (Training Stress Balance) = forme − fatigue = état du jour
  • Pace (min/km) = cible normalisée cross-distance et cross-sport

Source des données : base MySQL Laravel (Strava + Withings + Nolio)

In [1]:
import sys
sys.path.append('..')

import pandas as pd
import plotly.io as pio
pio.renderers.default = 'notebook'

from src.data.loader import load_activities, load_competitions, load_weight_measurements
from src.features.training_load import compute_tss, build_daily_load, compute_ctl_atl_tsb, build_race_features
from src.models.performance_predictor import train, evaluate_loo, add_pace, FEATURE_COLS, OPTIONAL_FEATURES, TARGET
from src.viz.charts import chart_fitness_fatigue, chart_feature_importance, chart_predicted_vs_actual

1. Chargement des données¶

Modifier USER_ID avec l'ID de ton utilisateur en base.

In [2]:
USER_ID = 1

activities = load_activities(user_id=USER_ID)
competitions = load_competitions(user_id=USER_ID)
weights = load_weight_measurements(user_id=USER_ID)

print(f"Activités    : {len(activities)} séances")
print(f"Compétitions : {len(competitions)} courses")
print(f"Pesées       : {len(weights)} mesures")
activities.head(3)
Activités    : 2969 séances
Compétitions : 21 courses
Pesées       : 618 mesures
Out[2]:
id user_id type name start_date_local distance moving_time elapsed_time total_elevation_gain average_speed max_speed average_heartrate max_heartrate average_cadence average_watts max_watts suffer_score distance_km moving_time_h
0 2057 1 Ride 2014/12/26_13,69_1:13'48" 2014-12-26 03:41:00 13653.1 4414.0 4414.0 13.0 3.09 7.0 151.3 193.0 NaN 56.0 NaN 63.0 13.6531 1.226111
1 2059 1 Run 2014/12/26_1,19_7'14" 2014-12-26 10:17:05 1228.4 395.0 395.0 NaN 3.11 8.0 193.2 207.0 NaN NaN NaN 31.0 1.2284 0.109722
2 2058 1 Run 2014/12/26_0,28_1'32"8 2014-12-26 10:28:15 276.4 93.0 93.0 NaN 2.97 3.5 173.1 195.0 NaN NaN NaN 3.0 0.2764 0.025833

2. Calcul du TSS par activité¶

In [3]:
activities = compute_tss(activities)

print("TSS par type de sport :")
tss_coverage = activities.groupby('type').agg(
    total=('id', 'count'),
    with_tss=('tss', lambda x: x.notna().sum()),
    avg_tss=('tss', 'mean')
).round(1)
print(tss_coverage)
TSS par type de sport :
                 total  with_tss  avg_tss
type                                     
Canoeing             2         1     63.8
Crossfit             9         0      NaN
Hike                91        29     72.5
Kayaking             6         6     68.9
Ride               715       711     26.8
Run               1283       834     72.9
StandUpPaddling      1         1     40.0
Swim               375       224     65.0
VirtualRide        292       292     31.8
Walk                34        34     40.8
WeightTraining      34        33     25.4
Workout            127        58      6.7

3. CTL / ATL / TSB¶

In [4]:
daily = build_daily_load(activities)
daily = compute_ctl_atl_tsb(daily)

print(f"Période : {daily.index.min().date()} -> {daily.index.max().date()}")
print(f"CTL max : {daily['ctl'].max():.1f}  |  ATL max : {daily['atl'].max():.1f}")
print(f"TSB min : {daily['tsb'].min():.1f}  |  TSB max : {daily['tsb'].max():.1f}")
daily.tail()
Période : 2014-12-26 -> 2026-06-21
CTL max : 99.9  |  ATL max : 143.8
TSB min : -53.3  |  TSB max : 32.9
Out[4]:
tss activity_count ctl atl tsb
date
2026-06-17 58.525634 1 77.677528 80.738581 -3.061054
2026-06-18 92.516268 2 78.026658 82.306452 -4.279794
2026-06-19 33.212000 1 76.972245 75.770895 1.201350
2026-06-20 57.431266 1 76.512479 73.329485 3.182993
2026-06-21 64.781501 1 76.236468 72.191560 4.044909

4. Visualisation Fitness-Fatigue¶

In [5]:
fig = chart_fitness_fatigue(daily, competitions)
fig.show()

5. Features de course (CTL/ATL/TSB au jour J de chaque course)¶

In [6]:
race_features = build_race_features(
    competitions=competitions,
    daily_load=daily,
    weight_df=weights if not weights.empty else None,
)

race_features[['event_name', 'competition_date', 'sport', 'race_distance_km',
               'achieved_time_h', 'ctl', 'atl', 'tsb', 'tss_sum_8w']].round(2)
Out[6]:
event_name competition_date sport race_distance_km achieved_time_h ctl atl tsb tss_sum_8w
0 Foulées du Lac Kir 2018 2018-04-14 run 5.0 0.37 14.49 26.43 -11.94 889.52
1 UT3C 2019 2019-05-18 run 10.0 0.88 9.73 2.65 7.08 731.16
2 Semi de Paris 2022 2022-03-06 run 21.1 2.03 64.94 66.25 -1.31 3971.53
3 Semi de Troyes 2022 2022-05-15 run 21.1 2.01 64.48 76.13 -11.65 3486.05
4 Semi de Reims 2022-10-15 run 21.1 2.11 48.19 30.55 17.64 2889.39
5 Semi de Paris 2023 2023-03-05 run 21.1 2.22 52.11 72.28 -20.18 2650.83
6 Marathon de Paris 2023 2023-04-02 run 42.2 5.27 47.91 76.14 -28.23 2255.18
7 Ironman 70.3 Vichy 2023 2023-08-19 triathlon 113.0 7.44 57.28 98.57 -41.29 3120.40
8 Semi de Paris 2024 2024-03-03 run 21.1 2.14 63.64 74.26 -10.62 3519.25
9 RAP 300k 2024 2024-04-26 bike 300.0 17.45 56.22 72.73 -16.51 2714.24
10 Ironman 70.3 Sables d'Olonne 2024 2024-06-30 triathlon 113.0 6.48 55.60 92.33 -36.73 2730.09
11 Foulées du Petit Bleu 2024 2024-09-08 run 5.0 0.48 31.37 43.63 -12.26 1247.97
12 Semi Tournefeuille 2024 2024-10-13 run 21.1 2.43 43.33 70.66 -27.33 2251.52
13 Toulouse Run Expérience 2024 2024-11-10 run 42.2 5.57 43.35 80.41 -37.06 2075.13
14 Half Frenchman 2025 2025-05-30 triathlon 113.0 6.57 96.48 110.93 -14.45 5274.09
15 Ironman Sables d'Olonne 2025 2025-06-22 triathlon 226.3 14.04 97.20 143.81 -46.60 4904.91
16 Foulées du Petit Bleu 2025 2025-09-14 run 10.0 1.05 61.25 56.53 4.72 3009.11
17 Semi Tournefeuille 2025 2025-10-12 run 21.1 2.45 57.23 60.22 -2.99 2776.76
18 Toulouse Run Expérience 2025 2025-11-02 run 42.2 5.79 52.64 82.93 -30.29 2219.20
19 TCS London Marathon 2026 2026-04-26 run 42.2 5.01 83.81 114.99 -31.17 4544.00
20 Half Frenchman 2026 2026-05-15 triathlon 113.0 6.38 78.13 96.38 -18.25 4334.06

6. Vue par sport¶

In [7]:
rf_pace = add_pace(race_features.dropna(subset=['race_distance_km']))

print("Courses par sport (avec distance renseignée) :")
summary = rf_pace.groupby('sport').agg(
    n_courses=('achieved_time_h', 'count'),
    pace_moyen_min_km=('pace_min_per_km', 'mean'),
    ctl_moyen=('ctl', 'mean'),
    tsb_moyen=('tsb', 'mean'),
).round(2)
print(summary)
Courses par sport (avec distance renseignée) :
           n_courses  pace_moyen_min_km  ctl_moyen  tsb_moyen
sport                                                        
bike               1               3.49      56.22     -16.51
run               15               6.42      49.23     -13.04
triathlon          5               3.60      76.94     -31.47

7. Entraînement du modèle (cible : pace en min/km)¶

In [8]:
try:
    model, importance = train(race_features)
    print("Modèle entraîné sur pace (min/km).")
    print(importance.to_string(index=False))
except ValueError as e:
    print(f"[!] {e}")
Modèle entraîné sur pace (min/km).
     feature  importance
   weight_kg    0.284545
   fat_ratio    0.162420
         tsb    0.136727
         atl    0.099575
 tss_days_4w    0.078722
tss_days_12w    0.075373
         ctl    0.064612
 tss_sum_12w    0.040487
  tss_sum_8w    0.031167
  tss_sum_4w    0.022150
 tss_days_8w    0.004221

8. Évaluation Leave-One-Out¶

In [9]:
metrics = evaluate_loo(race_features)
n = metrics['n_samples']
print(f"Cross-validation LOO ({n} courses avec distance)")
if metrics['mae_min_per_km']:
    print(f"  MAE pace  : {metrics['mae_min_per_km']} min/km")
    print(f"  MAE temps : +/- {metrics['mae_min']:.0f} min en moyenne")
else:
    print("  Pas assez de données pour une évaluation robuste.")

print("")
print("Par sport :")
for sport in race_features['sport'].dropna().unique():
    m = evaluate_loo(race_features, sport=sport)
    if m['mae_min_per_km']:
        print(f"  {sport:12s}: +/- {m['mae_min']:.0f} min  ({m['n_samples']} courses)")
    else:
        print(f"  {sport:12s}: trop peu de données ({m['n_samples']} courses)")
Cross-validation LOO (12 courses avec distance)
  MAE pace  : 1.798 min/km
  MAE temps : +/- 193 min en moyenne

Par sport :
  run         : +/- 16 min  (10 courses)
  triathlon   : trop peu de données (4 courses)
  bike        : trop peu de données (1 courses)

9. Importance des features¶

In [10]:
try:
    fig = chart_feature_importance(importance)
    fig.show()
except NameError:
    print("Modèle non entraîné — skip.")

10. Temps prédit vs temps réel¶

In [11]:
try:
    feature_cols = [c for c in FEATURE_COLS + OPTIONAL_FEATURES if c in race_features.columns]
    valid = add_pace(race_features.dropna(subset=['race_distance_km']))
    valid = valid.dropna(subset=feature_cols + [TARGET])

    pred_pace = model.predict(valid[feature_cols].values)
    pred_time_h = (pred_pace * valid['race_distance_km']) / 60

    fig = chart_predicted_vs_actual(
        actual=valid['achieved_time_h'],
        predicted=pd.Series(pred_time_h, index=valid.index),
        labels=valid['event_name'] + ' (' + valid['sport'] + ')',
    )
    fig.show()
except Exception as e:
    import traceback
    traceback.print_exc()