A great engineering technical screen is 45–60 minutes long, uses a real-engineering problem (not a memorized algorithm), allows the candidate's normal tools (including AI assistants if your team uses them on the job), is graded against a written rubric with at least four dimensions, and is calibrated against a pass rate of 35–50%. If your screen doesn't hit those marks, you're either filtering out strong candidates, advancing weak ones, or both.
Most companies have a technical screen that nobody designed. It was written by an engineer in 2019, copy-pasted to a Google Doc, used 400 times, and never re-evaluated. Twice a year, someone on the hiring team notices it's producing weird signal — great candidates failing, mediocre candidates passing — and someone else says "we should redesign it" and nobody does. Meanwhile, the screen is the highest-leverage stage in your hiring pipeline: every candidate who passes it costs your team 6–10 interviewer hours downstream, and every false negative is a senior engineer who joined a competitor instead.
This is the practical guide we wish every hiring manager and TA leader had. It's the result of cross-referencing how technical screens are run at companies with strong engineering cultures — including AI labs, infrastructure scale-ups, and developer-tools companies — and the patterns that consistently produce better hires across roles and seniority levels. If you're rebuilding a screen this quarter, or auditing one that has drifted, start here.
The two costs you're balancing
Every technical screen sits on a trade-off. On one side: false positives — candidates who pass the screen and burn 6–10 hours of interviewer time at the onsite without a real chance of an offer. On the other side: false negatives — candidates who would have been great hires but failed the screen because of an arbitrary trigger (memorized algorithm not seen, anxiety on a specific format, language they didn't know well).
Cheap screens optimize for one of these costs and ignore the other. A 30-minute LeetCode-style filter minimizes false positives but creates massive false negatives. A friendly conversational screen minimizes false negatives but creates massive false positives. The right answer is a screen that's deliberately tuned to a target pass rate — usually 35–50% — and is graded against a rubric strict enough to discriminate but generous enough to surface non-obvious strong signals.
The five things to actually measure
The biggest design error in technical screens is measuring the wrong things or only one of the right things. Below are the five dimensions a strong screen tests. A 60-minute exercise can hit 3–4 of these. Anything testing only one (typically "do they know the algorithm") is broken by design.
- Problem decomposition. Can the candidate take a vague problem and break it into solvable pieces? Do they ask clarifying questions before coding, or do they jump straight to a solution? This is the highest-correlation signal for on-the-job performance and the easiest one to fake-test (interview-prep books warn candidates to "ask clarifying questions," producing performative questions that aren't really clarifying).
- Code fluency in their preferred language. Can they write a non-trivial function without struggling against the syntax? You're not testing whether they know every standard-library method — you're testing whether they can think in code without the tool getting in the way. Always let candidates use the language they're most comfortable in.
- Reasoning under pressure. When their first approach doesn't work, can they recover? Do they get stuck and stay stuck, or do they back up, restate the problem, and try something else? This is the failure-mode test, and it's where strong vs mediocre candidates separate most sharply.
- Communication while building. Can they think out loud in a way you can follow? Can they explain a tradeoff in plain language? Can they take feedback from the interviewer mid-solution without defensiveness? This is the team-fit signal nobody calls a team-fit signal.
- Taste. When the candidate writes the second version of their solution, what do they choose to clean up? Naming, factoring, edge cases, performance? Their taste is what they bring to every code review and every design doc for the next decade. It's not the easiest thing to measure in 60 minutes, but the cleanup pass — "now if you had another half hour, what would you improve" — surfaces it cleanly.
A well-designed screen tests at least three of these five. A great screen tests four. The fifth one (taste) is the most underused; adding a "what would you do with another 30 minutes" prompt at the end of every screen is one of the highest-ROI changes most companies can make this quarter.
What a strong screen looks like — structure and time budget
Here's the canonical structure. It works for backend, full-stack, and infrastructure roles with minor variations. Frontend roles need a slightly different shape (heavier on UI reasoning, lighter on data-structure work). ML/AI engineer roles need their own variant covered separately.
60-minute screen, time budget
| 0–5 min | Intros, team context, framing the exercise. The candidate should know exactly what's being tested and what the time budget is. |
| 5–45 min | Core exercise. Real-engineering problem with a 5-line obvious solution and a 30-line good solution. Candidate thinks out loud while building. |
| 45–55 min | Extension or refinement. "What if the data was 1000x bigger?" or "How would you test this?" or "What would you improve with another 30 minutes?" |
| 55–60 min | Candidate questions. Stop on time. Running over signals disrespect for the candidate's calendar and biases against strong candidates who book back-to-back interviews. |
Good questions vs bad questions
The single biggest lever in technical screen quality is the question. A great question makes a strong rubric easy to apply; a bad question makes every screen feel like a coin flip. Here's how to tell the difference.
| Good questions | Bad questions | |
|---|---|---|
| Source | Pulled from a real problem your team faced (sanitized). | Pulled from a LeetCode problem set. |
| Solution space | Multiple reasonable approaches; tradeoffs to discuss. | Single correct approach; you either know it or don't. |
| Ambiguity | Some — candidate has to ask clarifying questions to scope. | None — problem statement reads like a spec. |
| Difficulty curve | Trivial first version, harder optimizations and extensions. | One hard step; you get it or you fail. |
| Language friction | Solvable in any reasonable language. | Optimized for a specific language or library. |
| Calibration | 30–50% pass rate across at least 30 calibration screens. | Never calibrated; pass rate unknown. |
| Length | Fits in 30–40 minutes for a strong candidate. | Can't be finished in the time slot, ever. |
| Discriminates | Separates strong from average within the same level. | Most candidates either ace it or zero it. |
The single most useful question type for screens in 2026 is the "build a small system in 45 minutes" exercise: parse a structured input, transform it, output a structured result. Easy first version is a function. Harder version handles edge cases, errors, and structure. Even harder version refactors for testability or performance. Every dimension you want to test — decomposition, code fluency, reasoning, communication, taste — surfaces naturally in this format.
The rubric: four dimensions, three levels each
A screen without a written rubric is a vibes test. Vibes tests favor candidates who look like the interviewer, signal-pattern-match the team, and present well in 60 minutes — not the ones who'll do the best work. Below is the template most teams converge on after a few quarters of iterating. Customize the dimensions to your role but keep the structure.
Technical Screen Rubric Template
2: Asked one or two reasonable clarifications; jumped into code without an explicit plan but adjusted as edge cases emerged.
1: Coded immediately; missed major scope; didn't ask anything until stuck.
2: Working code but messy structure, awkward naming, or one obvious bug. Functional but would need iteration.
1: Did not produce working code in the time budget, or code had multiple correctness issues they didn't notice.
2: Got stuck once; recovered with a small nudge from the interviewer; communicated what was confusing.
1: Got stuck and stayed stuck; did not communicate confusion or shift approach.
2: Thought out loud some of the time; responded to feedback but slightly defensively; explanations sometimes muddled.
1: Coded silently; did not respond well to mid-solution feedback; explanations were either too brief or evasive.
Two notes on using this. First: the interviewer should fill the rubric out within 30 minutes of the screen ending, not "later that day." Memory of specific moments fades fast; the rubric should capture what actually happened, not a generalized impression. Second: the rubric is the artifact you bring to the debrief, not your gut feeling. If your gut and the rubric disagree, the rubric is usually right — gut feelings encode bias the rubric doesn't.
The AI-era updates almost nobody has made yet
In 2026, the typical engineering job involves Claude Code, Cursor, Copilot, or some other AI assistant for 30–70% of code-writing time. The typical technical screen still bans AI tools and tests pure unaided coding. This mismatch is the single biggest gap in technical hiring right now — companies are filtering for a skill that doesn't reflect the job.
The honest move is to redesign the screen for an AI-assisted world. Two changes do most of the work.
1. Decide your AI policy explicitly, then publish it.
Either you allow AI tools or you ban them — but make the choice explicit and communicate it to the candidate before the interview. Both choices are defensible. Banning AI works if you specifically want to test pure problem-decomposition fluency, which still matters for staff-level architecture roles. Allowing AI works if you want to test how the candidate uses the tool: how they prompt it, evaluate the output, debug what it produces, and own the result. The undecided middle — where the candidate doesn't know if AI is OK and the interviewer hasn't decided either — is the worst possible state.
2. If you allow AI, change what you're grading.
You're no longer testing "can they write this code from scratch." You're testing "can they direct the tool effectively." Strong AI-assisted candidates show these signals:
- They read the AI output critically before pasting it. They catch hallucinations and ask the AI to fix them with specific direction.
- They prompt iteratively — not "build me a CSV parser" but "build me a CSV parser that handles RFC 4180 escaping, with a clear error type for malformed rows."
- They take ownership of the final code. They can explain every line. They wouldn't ship code they don't understand.
- They use the AI to extend their capability, not to replace it. The hardest part — the system design or the architecture call — is still theirs.
A candidate who pastes a Claude completion verbatim without understanding it is showing you the failure mode of the AI-assisted era. A candidate who steers Cursor through a 60-minute exercise, catches a subtle bug it introduced, and produces clean shippable code is showing you the version of the engineer your team will hire most of going forward.
Looking for engineers who fit the way you actually work?
Our culture-matched job board reaches engineers who research culture before they respond — the kind of candidates whose values map to engineering-driven, learning-oriented, or ship-fast cultures. Get in front of them with a culture profile that signals what your team is actually like.
Get a Culture Profile → What Engineers Read First →Three failure modes worth naming
The "we'll know it when we see it" screen
No rubric, no calibration, vibes only. Common at small startups and at large companies whose hiring bar has drifted slowly over a decade. The fix is straightforward but unpopular: write the rubric, calibrate against your last 30 screens (look at hired engineers' actual performance vs their screen score), and tune the cutoffs. Most teams discover that 20–30% of their recent hires would have failed the rubric they're about to write; this is uncomfortable but it's the data telling you the truth about your bar.
The "interview-prep grinder" screen
Common at companies that use stock LeetCode-medium questions. The screen rewards candidates who've memorized 400 puzzles and penalizes candidates who haven't, regardless of underlying engineering ability. You can spot the failure pattern in your data: a year after hire, your top performers don't correlate with screen scores, and your weakest performers were screen rock stars. The fix is to switch to real-engineering questions sourced from your actual problems — expect a quarter of recalibration but you'll end with a screen that produces hires whose on-the-job performance matches their interview signal.
The "make the candidate dance" screen
Long, hostile, designed to find what the candidate doesn't know. Common at companies whose technical interviewers were burned once and now over-correct in the other direction. This screen has a hidden cost most TA teams don't measure: it kills offer acceptance. Top candidates report the experience to their network, and your funnel shrinks at the source. The fix is to treat the screen as a two-way evaluation — the candidate should leave the call thinking your team is sharp, fair, and worth joining, regardless of the outcome.
Operationalizing the screen across the team
Designing a great screen is the easy half. Getting consistent execution across a team of interviewers is the part that takes real work. Three things every well-run hiring team puts in place:
- Calibration sessions every quarter. Pick three recent screens. Have the original interviewer present them to the rest of the team. Score independently, then discuss the deltas. The discussions surface where interviewers' rubrics drift — usually around "communication" and "taste" — and re-anchor the bar.
- Shadowing for new interviewers. Every new interviewer shadows three screens, then reverse-shadows three more (they run the screen, the senior interviewer observes). No one runs solo screens before completing this rotation. It's annoying scheduling-wise; the alternative is a year of inconsistent signal.
- Quarterly screen audit. Look at the screen-pass rate, the onsite-pass rate of screen-passers, and the first-year performance of screen-passers who joined. The relationships between these three numbers tell you whether your screen is doing its job. If screen-pass and onsite-pass are loosely correlated but neither correlates with first-year performance, your whole funnel has a problem upstream of the screen design.
The bottom line
A technical screen that produces 35–50% pass rates, uses a real-engineering question, runs in 45–60 minutes, allows the tools your team uses on the job, and is graded against a four-dimension rubric is dramatically better than the screen most companies have today. The redesign is straightforward; the calibration is harder; the cultural buy-in to actually use the rubric is the hardest part. But the ROI is genuine: every false negative you avoid is a senior engineer your team gains, and every false positive you eliminate is 6–10 hours of interviewer time your team gets back.
If you're auditing your screen this quarter and want to see what good looks like from the engineer side — the kind of culture, signal, and pace strong candidates expect — our guide to engineering candidate experience and our research on what engineers read first are companion pieces. Or post your roles on our culture-matched board to reach the engineers whose values map to how your team actually works.