Early programming ✓
Alpha — now
Alpha
Fix & expand
Current phase
Labeling tool — S/E single-pass
S=start, E=end, W=save. A/D±5f, B/F±50f. Bold overlay.
LR mirror new database
Negate X, swap joint pairs. ~20 labeled vids → ×2 dataset.
Merge FERN v1 clips
ffmpeg concat + auto-label JSON from 5s clips by gesture folder.
Idle class
Record stand-still videos. Add as class 0. Retrain.
Window offset fix
Shift window ~15f earlier to catch gesture onset, not midpoint.
Subject-independent eval
Leave-one-subject-out split. True generalisation accuracy.
CUDA utilisation probe
Check DataLoader workers, batch size, mixed precision.
Beta
Multi-camera + streaming
After Alpha milestones done
45° camera — Option B
Feature-level fusion: concat front+angle → 60-feature input. Fixes heel_tap.
3D reconstruction — Option C
Stereo triangulation. Paper ablation row. Checkerboard calibration session.
Frame sync buffer
Hold frames until both cameras deliver same timestamp.
DroidGrid integration
Phone → MJPEG socket → MediaPipe pipeline. Overlay streamed back.
Confidence smoothing
Temporal N-frame majority vote. Stops label flicker in live inference.
Gold
Paper-ready
Freeze → write
Augmentation expansion
Mirror + rotation ±5° + brightness. Applied last, after all real data collected.
Ablation study
Single-cam vs Option B vs Option C. Three rows in results table.
Dataset freeze
No new data after this point. All eval numbers reproducible.
Paper sections
Methodology, results, related work, conclusion. Architecture diagrams.
Deployment package
Self-contained ZIP. install.ps1. Tested on 3 machines including CPU-only.
Release
Publish & open
~18 weeks from now
Paper submitted
IEEE Sensors Journal (primary) or MDPI Sensors. ISMAR as backup.
GitHub open-source
Full source + weights + README + demo video.
Dataset public
Skeleton CSVs released. Raw videos optional (consent required).
Post-release
Future paths
Choose one or more
Industrial deployment
Factory floor ergonomics. Edge device (Jetson). ISO compliance.
VR/AR locomotion
Foot gestures as controller input. ISMAR / IEEE VR target.
Physiotherapy
Rehab exercise counting and form correction. Clinical partnership.
Unsupervised segmentation
Auto-segment unlabeled videos. PhD-level research direction.
Multi-person
Track multiple subjects simultaneously. Crowd analysis use case.
Database expansion — Alpha priority
Step 1 — LR mirror (run now)
Reads skeleton CSVs + label JSONs from new DB (~20 vids)
Negates X, swaps BlazePose left/right joint pairs
Writes *_mirror.csv + *_mirror.json
Result: ~20 → ~40 labeled subjects, zero re-recording
Step 2 — v1 merge (run after)
Reads v1 clips from v1_data/<gesture>/ folders
Groups by (subject, angle) from filename pattern
ffmpeg concat with 5-frame gap between clips
Auto-generates label JSON with accurate frame offsets