Methodology
How we test wearables and nutrition apps
Retail hardware, paid app tiers, no review units. Here's how we arrive at the numbers on the compatibility matrix and the scores in the ranking reviews.
Hardware
All reviews and matrix entries are verified on retail hardware that we own. No loaners, no review units, no comped devices. Current test inventory:
- Apple Watch Series 10 (46mm, cellular) — watchOS 11.3
- Apple Watch Series 9 (45mm) — watchOS 11.3
- Garmin Epix Pro Gen 2 51mm — firmware 19.21
- Garmin Forerunner 965 — firmware 20.16
- Oura Ring Gen 4 — companion app 8.4
- Whoop 4.0 — app 4.11
- Samsung Galaxy Watch 7 44mm — One UI Watch 6 on Wear OS 5
- Fitbit Charge 6 — firmware 1.203.16
- Apple Vision Pro — visionOS 2.4
Companion phones: iPhone 16 Pro (iOS 18.4) and Pixel 9 (Android 15).
Apps
We test on current versions of the App Store / Play Store / Galaxy Store / visionOS builds at the time of publication. We subscribe to paid tiers (MyFitnessPal Premium, Cronometer Gold, etc.) where the paid tier changes integration behaviour, and we note that in the review.
Compatibility matrix
For each cell of the matrix:
- We set up the integration fresh — uninstall the nutrition app, re-install, sign in, enable all relevant HealthKit / Health Connect / vendor-API permissions.
- We verify that activity flows wearable → app within 24 hours.
- We verify the reverse (calorie target → wearable's companion app) where the app claims two-way sync.
- We log the path (native, framework-mediated, vendor API) and the direction of the working flow.
- We record a "last verified" date and re-test every 2-3 months or on major OS / vendor release.
Review scores
Scores on ranking pages (Apple Watch, Garmin, Vision Pro) are subjective and weighted:
- Accuracy and trust in the underlying data — 30%
- Speed to log a meal — 25%
- Platform-specific integration quality (complications, on-watch UX) — 20%
- Sync reliability across the relevant frameworks — 15%
- UI polish and battery overhead — 10%
We don't claim these scores are objectively reproducible. We claim that a week with each app on each device, run in rotation, produces a defensible ordering. The weighting favours accuracy because accuracy is the whole point of tracking.
What we don't do
- We don't run controlled calorimetry studies. We're not a lab. For the "wearables overestimate" piece we cite published research.
- We don't reverse-engineer apps. Our integration calls are through public APIs and user-facing settings.
- We don't benchmark vendor accuracy claims ourselves — we cite them, note the source, and link to the vendor's methodology page when available.
Corrections
We correct factual errors inline with a dated editorial note. If we've changed a score materially, we'll note it in the article's "What changed" section. Corrections: editors@wearablesnutrition.com.