We lived through the 3 AM pages, the cascading failures, and the post-mortems where everyone asked "why didn't anyone know how to handle this?" So we built the training tool we wish we had.
Built by RoboticForce, Inc.
YouBrokeProd was born from a simple frustration: incident response training was terrible. Pages of runbooks nobody read. Tabletop exercises that felt nothing like the real thing. On-call engineers going into their first real production incident with zero hands-on experience.
The best way to get good at incident response is to handle incidents. But that means your production users pay the price while you learn. That's backwards.
We built realistic incident simulations based on real post-mortems from across the industry - the actual failure modes that take down real systems. Not toy examples. Not contrived puzzles. The exact categories of problems that will wake you up at 3 AM.
Now you can build the muscle memory, learn the debugging patterns, and develop the calm confidence that separates good on-call engineers from great ones - without any users getting hurt.
YouBrokeProd is built by RoboticForce, Inc. - a real company, not a side project. We build developer tools focused on making engineering teams more effective.
Our team has been on the other side of production incidents at companies of all sizes. We know the difference between engineers who freeze when the pager goes off and engineers who calmly work the problem. That difference comes down to practice.
Our users include SREs, DevOps engineers, platform teams, and technical founders who want to understand what happens when their infrastructure breaks. If you're responsible for keeping something running in production, this is for you.
Make every engineer and technical founder confident handling production incidents - so the next real production incident is just another problem to solve, not a panic spiral.
Pick an incident type and difficulty. You get realistic symptoms, logs, metrics, and access to the same debugging tools you use in real life.
Use the terminal to run commands, check logs, and analyze metrics. Race against the clock to find the root cause before the situation gets worse.
Apply the correct fix, earn points and reputation, then read the post-mortem. Every scenario has a detailed explanation of the real incident it was based on.
Join engineers and founders who are building real incident response skills through realistic practice.
Start Training Free