foxfitbehind-the-scenesai

The Code Review System

AI-written code needs review. We created a review prompt library organised into phases, each examining a specific aspect of the codebase.

Just like human-generated code, AI-written code needs review. We don’t blindly accept output. The challenge is making review systematic rather than ad-hoc; catching issues through structure, not luck.

We built a review-runner: an automated pipeline of 68 review prompts organised into 12 phases. Each prompt examines a specific slice of the codebase and generates a report with prioritised issues. A batch runner script executes them in sequence, with a two-pass approach: analyse first, then fix.

What The Phases Cover

There are three groups;

  • Correctness comes first

    Dead code (nine prompts scanning every directory for unused functions, stale TODOs, rogue print() statements). Data model integrity (schema, versioning, relationship definitions). Security (input validation, predicate safety, logging privacy, secrets management, privacy manifest). And the largest by some distance, concurrency, which gets twelve prompts to itself because concurrency bugs are subtle, hard to reproduce, and tend to show up in production rather than testing.

  • Then quality

    Performance reviews across startup, data queries, reports, and every view layer. View audits against SwiftUI best practices and our design system. Design token consistency across platforms. The workout UI gets its own phase because those views are the most performance-sensitive in the app.

  • Then readiness

    Accessibility reviews across every screen group; Dynamic Type, contrast, VoiceOver labels, focus management. This phase is the one we’re most glad we built. And App Store readiness; entitlements, build settings, metadata, production configuration.

How It Works

The review-runner is a bash script that orchestrates everything. Each task runs in two passes. The first pass is analysis; Claude Code reads the relevant files and writes a report listing issues found, prioritised critical, high, medium, or low. The second pass is the fix; a fresh Claude Code session reads that report, applies the fixes, then runs the build and tests to verify nothing broke.

Tasks track their status (done, failed, running). We can resume from any point, re-run individual tasks, or filter by phase. A single git commit happens at the end when everything passes.

Why Not Just One Big Review?

Focused prompts produce better results. A prompt specifically about concurrency in HealthKitManager catches things a general “review the code” prompt misses. The AI goes deeper when the scope is narrower.

It also makes review manageable. We can run the design token checks independently when we’ve only changed styling. We don’t have to review the entire codebase for a UI tweak.

The system evolves too; when we find a new category of bug, we add a prompt to catch it next time.