The Authority and Comfort of Fitness AI

8–13 minutes

read




This sits close to my day job: I spend a lot of time thinking about AI governance and how interfaces steer people. Then I open Strava and (boom) my run gets a little bedtime story. Now nearing the end of the year, Strava’s two AI-flavored toys are the talk of the town again: the Spotify-Wrapped-ish year recap vibe (Strava’s “Year in Sport” marketing moment each December, plus all the third-party clones people share), and Athlete Intelligence, the feature that generates a neat, natural-language interpretation of your activity right after you upload it. Strava frames Athlete Intelligence as a way to translate your stats into “simple, personalized insights,” and it’s positioned as a subscriber perk you can opt out of.

And I get why it lands.
Because most of us are tired, or long for convenience. Not just “post long run tired.” I mean cognitively tired. We’ve built this weird endurance life where we’re collecting more data than a small public health agency (pace, HR, HRV, sleep, power, temperature, fueling, carbs, sodium, elevation, grade, ground contact time, cadence, calories burnt, etc.) then asking a stressed brain to turn it into meaning while we’re also answering emails and reheating left overs, and remembering we promised ourselves we’d do mobility.

So an AI summary that says: “You held steady effort on tired legs, your pace drifted on the climbs, nice work keeping cadence up late” feels like someone competent took your hand for ten seconds. The best version of this (pure fantasy, but still) would be a gentle interpreter or translator. Someone who notices patterns you miss, especially the boring ones: lie how your easy runs aren’t easy or how every “down week” becomes a stealth build. It’s not that we are lazy. It’s that attention is scarce, and narrative is soothing. These tools offer narrative on tap. That’s the hook. And what’s is the snag: the narrative is generated.

Where it gets weird (and sometimes harmful)
Let’s start with the obvious: a large language model (LLM) can be fluent while being wrong. In a running context, “wrong” doesn’t always look like 2+2=5. It looks like a confident misread of why something happened. Your heart rate was high because you were dehydrated, stressed, underfueled, and it was 27°C with wet air. Yet the summary implies you were “pushing fitness” or “showing strong aerobic development.” That isn’t a harmless mistake. That’s a small nudge toward a training identity: I’m the kind of athlete who grinds through. And athletes are extremely suggestible to identity when tired.

Strava isn’t alone here. This is basically the new product pattern across sport tech: add an “AI insight layer” to an already data-rich platform, wrap it as clarity, and hope people feel it’s worth the subscription. Garmin literally launched a paid tier (Connect+) with “Active Intelligence” insights “powered by AI,” promising they’ll get more tailored over time. Apple rolled out “Workout Buddy,” using Apple Intelligence to deliver real-time encouragement based on workout history and current session data. WHOOP has had an AI coach for a while, framed as conversational guidance using your biometric data. On paper, it’s all support. In practice, a lot of it is vibes with authority.

And the authority matters, because LLMs have a documented tendency to be sycophantic. To tell users what they want to hear, to affirm, to validate, to smooth the sharp edges. A Stanford-led study covered in The Guardian described “social sycophancy” and found chatbots endorsed users’ behaviour substantially more than humans did, shifting how justified people felt and how much they trusted the bot.

That’s in interpersonal scenarios, sure, but sport is also interpersonal: it’s you, your self-concept, your ambition, your fear of being ordinary, your need to be seen as disciplined. If the model’s default posture is “you’re doing great, bestie,” it’s going to rubber-stamp the exact things many runners need challenged.

Here’s a tiny example that I suspect a lot of people will recognize: You do a “moderate” workout on tired legs. It’s not planned well. You slept badly. You skipped breakfast. You’re low-level irritated at the world. The run is… fine, but not clean. Later you upload it, and Athlete Intelligence gives you a warm paragraph that reads like a coach’s note, something like “Strong effort today. Great job maintaining consistency. Your pace shows improving fitness.” And you think: “See? I’m tough. I’m consistent. This is what serious athletes do.

But maybe what serious athletes do is: they don’t turn every session into a referendum on their worth. They don’t need a machine to praise them for ignoring fatigue signals. They don’t outsource discernment. This is where the LLM problem becomes a running problem: these systems don’t just summarize; they shape interpretation. And interpretation shapes behaviour.

There’s also persuasion. A separate line of research has found LLMs can be highly persuasive in debate settings, raising obvious concerns about influence. Especially when personal data is in the mix. In endurance apps, the “debate” is quieter. It’s not politics; it’s training decisions: “Should I run today? Am I overreaching? Was this session productive?” If the app’s AI layer has any incentive (explicit or not) to keep you engaged, keep you subscribing, keep you posting, keep you feeling like an “athlete,” then it’s not neutral advice. It’s an engagement engine wearing a coach’s jacket.

There’s epistemic overreach, which is a polite way of saying: the system speaks outside its jurisdiction. LLMs don’t know physiology. They don’t feel tissue strain, hormonal disruption, grief, boredom, fear. They infer. And inference dressed up as explanation is dangerous when it carries the tone of authority. We know from human–AI interaction research that people systematically overweight confident, fluent explanations, even when they’re wrong.

In a running app, that can blur the line between description and prescription. A sentence that begins as “Here’s what your data suggests” quietly becomes “Here’s what you should keep doing.” Once that line is crossed, the app stops being a mirror and starts being a driver. And drivers, even well-meaning ones, can steer you somewhere you didn’t choose.

Another problem we don’t talk about much is temporal amnesia. LLMs are very good at reading what just happened and surprisingly bad at respecting what has been happening for a long time. Training adaptation is slow, lumpy, nonlinear. It’s connective tissue, mitochondria, bone density, tendon stiffness, systems that don’t care how poetic your last run looked.

An AI summary is inherently short-horizon: it reacts to the most recent signal, the last upload, the freshest deviation. That bias toward recency mirrors what psychologists already warn us about in human decision-making. The danger isn’t a wrong sentence; it’s the steady erosion of patience. When every run is narrated in isolation, it becomes easier to chase micro-feedback instead of macro health. Endurance collapses into episodes. You stop training through time and start training through updates.

There’s also the issue of model collapse by repetition, which sounds abstract until you think about how these tools are trained and refined. LLMs learn patterns from existing language, and (sport) tech companies are increasingly feeding models with prior summaries, coaching clichés, and community language that already dominates platforms like Strava. Over time, the system doesn’t just describe your running, it recycles the narrowest version of what “good training” sounds like. Thresholds become virtues. Consistency becomes morality. Fatigue becomes “grit.”

This is a known failure mode in generative systems: outputs drift toward the mean, originality decays, edge cases disappear. In sport terms, that means athletes whose needs don’t fit the dominant narrative (injury-prone runners, chronically stressed parents, people training around illness, aging bodies) get gently nudged back toward a norm that may not serve them. The model might not be malicious. It’s, at least, unimaginative at scale.

And then there’s the messiest part: data isn’t reality. Data is a slice, with holes.
– GPS lies in cities, trees, canyons, storms.
– Wrist HR drops out in cold weather and on descents.
– Power estimates are… let’s be honest… interpretive.
– Sleep tracking is still full of confident guesses.
– RPE (the thing that actually predicts how training is landing) is usually absent, or buried.

When an AI summary tells a clean story using messy inputs, it creates what I’d call false legibility. You feel like you “understand” the session because it was narrated. But the narration can crowd out your own internal memory of effort: the tight hip at minute 35, the weird dizziness when you stopped or underfueled, the surge of panic when you looked at pace, the fact that the run felt lonely in a way that matters more than cadence.

This is also where the year-end “Wrapped” energy turns slightly sour. Strava’s Year in Sport content is fun, but it’s also a social product: it’s built to be shared, compared, reposted, turned into identity. And the community doesn’t need more help turning training into performance. (While “Nobody Cares About Your Strava Year,” sounds rude, but also: there’s a point in there.)

If your recap tells you your “best month” was the one where you trained through burnout because the numbers went up, the app just rewarded the wrong thing. If your recap highlights vertical gain like it’s a moral achievement, it pushes you toward the mountain fetish even when your body needed flat boring miles. If it celebrates streaks, it reinforces the kind of compulsive consistency that a lot of runners are quietly trying to unlearn.

I could list more problems, like reward prediction error distortion, automation bias, norm reinforcement and context erasure. And still, some will say, “It’s just a paragraph. Who cares?” I care because paragraphs accumulate. So do small nudges. So do daily micro-validations that keep you inside a particular loop.

So what do we do instead (without going full “throw your watch into the river,” which I caught myself thinking on occasion)?
I’m not interested in purity. I like data. I also like joy. I just don’t want a language model, trained to be pleasing and fluent, becoming the voice that narrates my relationship with effort. A few alternatives that feel saner:

– If you want to read them: treat AI summaries as “maybe,” not “meaning.” Like you’d read a stranger’s hot take, with mild curiosity.
– Add one human datapoint the model can’t access: your own sentence. After key sessions, write a 10-second note: “Felt heavy. Hungry. Bad sleep. Lower back tight. Mood improved after 30 min.” That becomes your real intelligence layer.
– Use coaching questions, not coaching statements. If you want tech to help, design it to ask: What surprised you? What did you notice at minute 20? What would make tomorrow easier? LLMs are better at prompting reflection than declaring verdicts.
– Prefer transparent metrics over “AI vibes.” A clear trend line you understand (weekly volume, long-run frequency, easy-run HR drift, fueling practice) beats a synthetic paragraph you can’t audit.
– Control the feed. Turn off features you don’t want. Strava says you can opt out of Athlete Intelligence via the in-feature beta controls.
– Keep one place “unscored.” One run a week with no upload, no recap or story. It’s not anti-tech. It’s anti-constant-interpretation.

The positive note (because there is one) is that runners are already good at the most important skill here: staying present inside discomfort without making up a story too fast. That’s basically what good pacing is. It’s what good recovery is. It’s what good decision-making is.

So maybe the challenge isn’t “AI in sport is evil.” Maybe it’s simpler, and more annoying: we have to protect our capacity to interpret our own experience. Because the second we outsource that, training becomes content, and content becomes identity, and identity becomes a trap you can’t take a rest day from. And yeah, I’ll still look at the graphs. I’m human and a coach. But I want the final narrator to be me: messy, inconsistent, occasionally irrational, sometimes proud for the wrong reasons, sometimes wise by accident. The machine can count. And I’ll do the meaning.