P020 — Cross-Stage Diffusion Audit D0→D1 - Teleforge

Notebook client du framework nostos.benchmarks.pipeline_audit.

Question : pourquoi les diagrammes g–g sont-ils 1000× plus diffus sur hardware commercial raw (Teltonika FMC880) que sur smartphone / BeagleBone à gravité pré-compensée firmware ? Où se creuse la différence dans le pipeline ?

Compagnon empirique du dépôt eSoleau INPI DSO2026011691 (P-PIPELINE, 31 mars 2026). Cf. P020.md pour le paper complet.

from pathlib import Path
import sys
NB_DIR = Path.cwd()
NOSTOS_ROOT = NB_DIR.parent.parent.parent.parent.parent / 'nostos'
sys.path.insert(0, str(NOSTOS_ROOT / 'src'))

import telemachus as tele
from nostos.benchmarks import load_clermont, load_greensboro
from nostos.benchmarks.pipeline_audit import (
    run_pipeline_audit, format_audit_table, DEFAULT_STAGES,
)
from IPython.display import Markdown, display
print(f'telemachus {tele.__version__} | Framework OK')

telemachus 0.8.0 | Framework OK

1. Datasets — version enrichie 14 avril¶

AEGIS chargé en mode complet : 33 trips + gyroscope + OBD (vitesse via PID 0x0D = vérité terrain indépendante du GPS).

UAH et PVS prévus mais non disponibles localement (cf. scripts/download_public_datasets.sh).

# AEGIS via telemachus (Open, Zenodo DOI 10.5281/zenodo.19609044)
DEEPTECH = NB_DIR.parent.parent.parent.parent.parent  # deeptech root
TELE_DATA = DEEPTECH / "data" / "telemachus"
df_aegis = tele.read(str(TELE_DATA / "aegis-telemachus.parquet"))
df_aegis = df_aegis.dropna(subset=["ax_mps2", "ay_mps2", "az_mps2", "lat", "lon"]).reset_index(drop=True)
# Prefer OBD speed (ground truth) over GPS haversine when available
if "speed_obd_mps" in df_aegis.columns:
    df_aegis["speed_mps"] = df_aegis["speed_obd_mps"].ffill().fillna(0).clip(0, 50)

datasets = {
    'AEGIS_full': df_aegis,
    'Clermont': load_clermont(),
    'Greensboro': load_greensboro(post_daxos_only=True),
}
for name, df in datasets.items():
    moving_pct = (df['speed_mps'] > 3).mean() * 100
    print(f'  {name:12s} : {len(df):>9,d} samples, moving {moving_pct:.1f}%')

  AEGIS_full   : 1,063,350 samples, moving 82.5%
  Clermont     :    10,884 samples, moving 54.8%
  Greensboro   :     3,176 samples, moving 89.4%

2. Audit cross-stage¶

Stages successifs D0 → D1.a (IMU Calibrator Rodrigues) → D1.b (GPS Cleaner) → D1.c (SQS Scorer). Métriques de diffusion à chaque étape.

df_audit = run_pipeline_audit(datasets)
df_audit[['dataset', 'stage', 'n_samples', 'n_moving', 'hull_area_g2', 'std_long_g', 'std_lat_g', 'g_norm_mean_mps2']]

Burst sampling: 475 frames @ 24 Hz, effective 3 Hz (gap 124531 ms)

Burst sampling: 60 frames @ 1 Hz, effective 1 Hz (gap 22690 ms)

validate_d0: timestamps non monotones: 11 inversions

Burst sampling: 6 frames @ 0 Hz, effective 0 Hz (gap 26000 ms)

3. Tableaux pivot par métrique¶

display(Markdown('### Hull area (g²) — surface du nuage *g–g*'))
display(Markdown(format_audit_table(df_audit, 'hull_area_g2')))
display(Markdown('### Std longitudinal (g)'))
display(Markdown(format_audit_table(df_audit, 'std_long_g')))
display(Markdown('### |g| mean (m/s²) — biais d échelle IMU'))
display(Markdown(format_audit_table(df_audit, 'g_norm_mean_mps2')))

4. Mosaïque heatmaps g–g (3 datasets × 4 stages)¶

from gg_diagram import gg_heatmap
from PIL import Image, ImageDraw, ImageFont
tmp_dir = NB_DIR / 'tmp_panels'
tmp_dir.mkdir(exist_ok=True)
panel_paths = {}
for name, df in datasets.items():
    df_cur = df
    for stage_name, stage_func in DEFAULT_STAGES:
        df_cur = stage_func(df_cur)
        fig = gg_heatmap(df_cur, ax_col='ax_mps2', ay_col='ay_mps2',
                         speed_col='speed_mps', hz=10.0)
        pth = tmp_dir / f'{name}_{stage_name}.png'
        fig.write_image(str(pth), scale=1)
        panel_paths[(name, stage_name)] = pth
stage_names = [s[0] for s in DEFAULT_STAGES]
dataset_names = list(datasets.keys())
sample = Image.open(panel_paths[(dataset_names[0], stage_names[0])])
cw, ch = sample.size
HEADER, LABEL_W = 60, 140
W = LABEL_W + cw * len(stage_names)
H = HEADER + ch * len(dataset_names)
mosaic = Image.new('RGB', (W, H), 'white')
draw = ImageDraw.Draw(mosaic)
try:
    f = ImageFont.truetype('/System/Library/Fonts/Helvetica.ttc', 16)
except OSError:
    f = ImageFont.load_default()
for j, s in enumerate(stage_names):
    draw.text((LABEL_W + j * cw + cw // 2 - 40, 20), s, fill='black', font=f)
for i, d in enumerate(dataset_names):
    draw.text((20, HEADER + i * ch + ch // 2 - 8), d, fill='black', font=f)
    for j, s in enumerate(stage_names):
        img = Image.open(panel_paths[(d, s)])
        mosaic.paste(img, (LABEL_W + j * cw, HEADER + i * ch))
fig_path = NB_DIR.parent / 'figures' / 'diffusion_audit_mosaic.png'
fig_path.parent.mkdir(exist_ok=True)
mosaic.save(str(fig_path))
print(f'sauvé: {fig_path.relative_to(NB_DIR.parent)}')
from IPython.display import Image as IPImage
IPImage(filename=str(fig_path))

Burst sampling: 475 frames @ 24 Hz, effective 3 Hz (gap 124531 ms)

Burst sampling: 60 frames @ 1 Hz, effective 1 Hz (gap 22690 ms)

validate_d0: timestamps non monotones: 11 inversions

Burst sampling: 6 frames @ 0 Hz, effective 0 Hz (gap 26000 ms)

sauvé: figures/diffusion_audit_mosaic.png

5. Persistance¶

out_csv = NB_DIR.parent / 'experiments' / 'diffusion_audit_results.csv'
df_audit.to_csv(out_csv, index=False)
print(f'sauvé: {out_csv.relative_to(NB_DIR.parent)}')

sauvé: experiments/diffusion_audit_results.csv