The first platform where developers test their vibe coding skills.

Solve real coding challenges on your own machine with your own AI agent, Claude Code, Cursor, or Codex. Kodwai scores how well you direct the agent, not what you memorized, and ranks you on a public leaderboard.

Start a challenge→

fully freebring your own agentclaude code, cursor or codex

// live demo

browsesolve with your agentsubmitscore

Made for developers who want to work at

//01the premise

LeetCode doesn't prove much anymore.

The work changed. You point an agent at a real problem, catch it when it is confidently wrong, and check what it actually shipped. That judgment is the skill, and it is the thing kodwai scores.

// three reasons the old test fails

01the format is stale
Built for a different era
Whiteboard puzzles and LeetCode grinds were designed for engineers working alone with nothing but an editor. Point an agent at one and it clears the puzzle in seconds. You learn nothing about the engineer.
02green is not the same as good
Passing tests proves little
A single careless prompt can make the suite go green and still show no judgment at all. No verification, no decomposition, no recovery when the agent goes confidently wrong. The checkmark hides everything that matters.
03the real work is unmeasured
Nothing measures how you work
You spend your day directing an agent: writing the spec, catching hallucinations, checking what actually shipped. That is the skill that decides who is good now, and until kodwai, nothing put a number on it.

//02how it works

From pick to scored, in five steps.

No sandbox, nothing to install that fights you. You work on your own machine with your own agent, and Kodwai scores the whole session.

01
Pick a challenge
Browse real, ticket-sized problems across every category you actually ship in. Filter by difficulty and pick one that looks like the work you actually do.
02
Run the CLIcli
Start it from your terminal and choose your agent. We download PROBLEM.md, starter files and tests, init a git repo, and start the timer.
$npx @kodwai/cli challenge <slug>Claude CodeCursorCodex
03
Solve on your machine
Work the problem with your own agent in your own editor. No sandbox to fight, no artificial constraints, just how you really build.
04
Submitcli
One command packages your code, git history, test runs, agent transcript, and the time you took, then ships it for scoring.
$npx @kodwai/cli submit
05
Get your score
Direction, Outcome, and Lift land with per-signal evidence, so you can see why each axis scored the way it did. Then you are on the leaderboard.

//step 02 · run the cli

One command pulls the problem, sets your agent, inits git, and starts the clock. Your agent opens right where you left off. Then you build it your way.

and you are scoreddirection 48 · outcome 33 · lift 13

//03the aha

See exactly how you vibe code.

Same challenge, two developers. A careless one-shot prompt can pass the tests. It still scores low, because passing tests is not skill. kodwai reads the whole session, so the score rewards how you drive.

focus

● both shown · highlighted in rust

Welcome to Claude Code (v2.1.34)

cwd: ~/kodwai/rate-limiter · model: claude-opus-4-7

session · one-shot prompt

Careless one-shot

algorithm: rate limiter · hard · 45 min

build a sliding-window rate limiter, make the tests pass

Done. Added RateLimiter with a deque per key.

12 passed

ship it

no verification · no edge probing · 1 turn

0/ 100

low scoretests green, judgment absent

Direction

21 / 50

Outcome

33 / 35

Lift

4 / 15

Tests are green, but no steering, no verification, no recovery. Direction collapses the total.

Welcome to Claude Code (v2.1.34)

cwd: ~/kodwai/rate-limiter · model: claude-opus-4-7

session · driven session

Engineer who drives

algorithm: rate limiter · hard · 45 min

spec first: per-key window, monotonic clock, no memory leak

Plan: window store, eviction, concurrency guard.

9 passed, 3 failing

the burst test races. add a per-key lock, prove it.

pytest -k concurrency → 3 passed

verified · race fixed · 4 commits

0/ 100

high scoresteered, checked, hardened

Direction

47 / 50

Outcome

33 / 35

Lift

12 / 15

Tests green and the agent was steered, verified, and hardened. Direction carries the score.

kodwai reads the whole session: the prompts, the recovery, the test runs, the commits. The score is dominated by Direction, the part a one-shot cannot fake.

//04the challenges

Problems worth shipping.

15 live challenges across 10 categories and three difficulties. Each one is scoped like a real ticket, not a riddle. Pick the track that looks like the work you actually do.

easybackend

Bookshelf REST API

Junior Backend Engineer interview. Build a small REST API from scratch with CRUD, filters, validation, persist...

~60 minsolve

hardbackend

Multi-Currency Wallet Ledger with Idempotent Transfers

Senior Backend Engineer interview. Build the core double-entry ledger for a multi-currency wallet: atomic tran...

~120 minsolve

hardbackend

Process / Task Orchestrator-Lite

Platform / Infra interview. Build a task orchestrator that runs a DAG with dependency ordering, a global concu...

~120 minsolve

//05the score

What the score actually measures.

A one-shot “solve this” prompt clears the tests, so passing tests is not enough. The score is dominated by how you direct the agent, the part a careless prompt cannot fake.

0/ 100

sample run · rate limiter

direction45 / 50

outcome34 / 35

lift12 / 15

Direction50 pts

how you steer, verify, and decompose

Intent fidelity96

Verification rigor92

Spec precision89

Decomposition86

Recovery84

Engagement90

Outcome35 pts

what actually shipped, and whether it holds

Tests passed100

Code quality93

Complexity88

Lift15 pts

the edges a one-shot prompt misses

Edge-case coverage82

evidence · one signal

Verification rigor

axis · direction+6 pts

you → transcript · turn 14

“before we move on, write a test that fires 1k concurrent requests and assert no tokens leak past the window”

Why it scored. You forced the agent to prove the concurrency claim instead of trusting it. Cited from turn 14, 41s before the first commit.

scored 0 to 100 direction 50 outcome 35 lift 15 every signal cites its evidence

//06climb

Rank, earn, and prove it.

Every scored run moves you up the global leaderboard and builds a public profile you can send to anyone.

global leaderboarddifficulty-weighted

Jamie Brooks@jamieCLAUDE CODE96 /100

Sarah Chen@schenCLAUDE CODE94 /100

Kenji Tanaka@ktanakaCURSOR93 /100

04Alex Mendez@amendezCODEX91 /100

05Priya Rao@priyarCLAUDE CODE90 /100

your spot: unranked · run one to claim itjoin

public profile

Jamie Brooks@jamie · claude-codeRANK 1

96Direction48 / 50Outcome34 / 35Lift14 / 15

earned

shareX LinkedIn

Badges that stack up.

shareable to x & linkedin

First Blood

Five Down

Ten Strong

Quarter Century

On Fire

Week Warrior

Monthly Machine

Top 10%

Speed Demon

Perfectionist

Polyglot

Claude Master

Cursor Pro

Early Adopter

Milestones, streaks, skill and agent badges land automatically as you submit. Your profile at kodwai.com/developers/you shows your score, your rank, the badges you have earned, and the agents you drive. Built to send to anyone, including a hiring manager instead of a take-home.

//07the proof

Numbers we are happy to stand behind.

No vanity metrics. Just what the platform is, what it costs you, and how honestly it measures the way you actually work.

the category1st

platform built to score how you drive AI agents, not what you memorized.

the setup0%

local. Your machine, your agent, your editor. No sandbox to fight, no fake constraints.

the price$0

to play. Fully free, bring your own agent, keep your machine.

the score0 axes

Direction, Outcome, and Lift, every signal citing its own evidence.

measured, not marketedevery signal cites its own evidence

//09questions

Frequently asked questions.

Everything worth knowing before your first run. Still curious, the answer is one message away.

What is vibe coding, and how do you score it?// scoring

Vibe coding is building real software by directing an AI agent instead of typing every line yourself. Kodwai scores the session across three axes: Direction (how you steer, verify, and decompose), Outcome (what actually shipped and whether it passes), and Lift (the edge cases a one-shot prompt misses). Every signal cites its own evidence from your transcript, commits, and test runs.

Which agents and languages are supported?// agents · langs

Bring your own agent. Claude Code and Cursor are first-class, and anything you run in your terminal works, including Codex CLI, Aider, Cline, and more. Challenges span every mainstream category and most mainstream languages, since you solve on your machine with your own setup.

Do I solve challenges locally or in a sandbox?// local

Locally, always. The CLI downloads the problem, starter files, and tests, inits a git repo, and starts the timer. You work in your own editor with your own agent. There is no browser sandbox to fight and no artificial constraints.

Is it really free?// pricing

Yes. Solving challenges, your score, your profile, and the leaderboard are free for developers. The hiring track is the paid product, for teams running interviews.

How can a score be fair if a one-shot prompt passes the tests?// fairness

Passing tests is necessary but not sufficient. The score is dominated by Direction, the part a careless prompt cannot fake. A solution that clears tests with no steering, no verification, and no decomposition scores poorly on the axis that matters most.

What does the public profile show?// profile

Your score, your rank, the badges you have earned, and the agents you drive, at kodwai.com/developers/you. It is built to send to anyone, including a hiring manager instead of a take-home.

Question that is not here?hakan@kodwai.com→

//10· begin

Stopgrindingpuzzles.
Provehowyoubuild.

Fully free. Your own agent, your own machine, your own editor. You pick your path on the way in.

Start a challenge→

The first platform where developers test their vibe coding skills.

LeetCode doesn't prove much anymore.

Built for a different era

Passing tests proves little

Nothing measures how you work

From pick to scored, in five steps.

Pick a challenge

Run the CLIcli

Solve on your machine

Submitcli

Get your score

See exactly how you vibe code.

Problems worth shipping.

Bookshelf REST API

Multi-Currency Wallet Ledger with Idempotent Transfers

Process / Task Orchestrator-Lite

What the score actually measures.

Rank, earn, and prove it.

Badges that stack up.

Numbers we are happy to stand behind.

Frequently asked questions.

Stopgrindingpuzzles.Provehowyoubuild.

Stopgrindingpuzzles.
Provehowyoubuild.