About the Role

About Mercor

Mercor's mission is to organize human intelligence to power the AI economy. We partner with leading AI labs and enterprises to provide the human intelligence essential to AI development. Our vast talent network trains frontier AI models in the same way teachers teach students: by sharing knowledge, experience, and context that can't be captured in code alone. Today, more than 30,000 experts in our network collectively earn over $3 million a day.

Mercor is creating a new category of work where expertise powers AI advancement. Achieving this requires an ambitious, fast-paced and deeply committed team. You’ll work alongside researchers, operators, and AI companies at the forefront of shaping the systems that are redefining society. Mercor is a profitable Series C company valued at $10 billion. We work in-person five days a week in our San Francisco, NYC, or London offices.

About the Role:

Frontier AI companies are increasingly bottlenecked on expert judgment — capturing it reliably, validating it at scale, and turning it into durable model behavior. This role sits at the center of that problem.

You'll build the ML systems that power Mercor's Frontier Data Products: the infrastructure that scores, validates, and improves complex work products where correctness is rarely binary and labels are often noisy, delayed, or disputed. A single job can stay live for days, interleaving model inference, automated checks, expert review, disagreement resolution, and feedback loops. Your work determines how models reason over ambiguous inputs, when they should defer to humans, how quality is measured, and how feedback compounds into better systems over time.

This is applied ML product engineering under real production constraints — incomplete ground truth, shifting requirements, latency and cost tradeoffs, and workflows where a silent model failure corrupts the final output. It is not an offline benchmarks role.

What You'll Do

• Build ML systems that score, validate, and improve complex work products where correctness is nuanced and labels are imperfect.

• Design evaluation frameworks for ambiguous tasks where ground truth is partial, delayed, or disputed.

• Build feedback loops that turn review, disagreement, correction, and adjudication into measurable model and system improvements.

• Own production ML behavior end-to-end: precision/recall tradeoffs, regression detection, drift, latency, cost, and explainability.

• Improve model quality using the right tool for the job — prompting, fine-tuning, retrieval, active learning, heuristics, and error analysis.

• Partner with backend engineers to integrate inference into durable, long-running workflows without sacrificing debuggability or human oversight.

What Makes This Role Different

• The architecture is not set — early engineers will define how quality is measured, how models and humans interact, where automation is trusted, and how the system compounds over time.

• The feedback loop is short: shipping a model behavior change directly and visibly affects what customers receive.

• You're working on a strategically central product area at Mercor at a moment when frontier AI companies have no good solution to the problem you're solving.

Day-to-Day

• Moving fast on a young, high-ownership codebase where your decisions have long-term architectural weight.

• Operating across models, data, backend systems, and product surfaces — context switching is the default, not the exception.

• Debugging production ML failures in live, long-running workflows where silent errors matter.

• Working closely with backend engineers on a stack of Python, Temporal, Postgres, AWS, and LiteLLM.

• Balancing automation confidence with human review — knowing when to defer is as important as knowing when to ship.

What We're Looking For

• Track record of shipping ML systems that improved a real product, workflow, or business metric.

• Strong instincts for model quality, evaluation design, error analysis, and production failure modes.

• Comfort operating in ambiguous problem spaces where labels are imperfect and correctness evolves.

• Sound judgment about when to reach for prompting, fine-tuning, heuristics, retrieval, human review, or a simpler product constraint.

• Solid engineering fundamentals across the full ML stack — not just modeling.

• Familiarity with LLM applications, model-assisted workflows, evaluation frameworks, or human-in-the-loop ML is a strong plus.

You're likely someone who:

• Defaults to simple, inspectable ML systems that improve quickly and fail in understandable ways — not the most impressive architecture.

• Gets uncomfortable when a model ships without a clear evaluation story.

• Can hold ambiguity without paralysis and make reasonable bets with incomplete information.

• Cares about the real-world output of the system, not just the benchmark.

Benefits

• Bi-annual performance bonus structure.

• Generous equity grant vested over 4 years.

• Up to $15k Relocation bonus.

• $10K housing bonus (if you live within 0.5 miles of our office).

• $1.5K monthly stipend for meals.

• Free Equinox membership.

• $200 monthly laundry reimbursement.

• $200 monthly personal wellness reimbursement.

• Health, Dental, Vision insurance.

Machine Learning Engineer, Frontier Data Products

What it’s like to work at Mercor

What employees love

What could be better

About the Role

About Mercor

Similar Roles

Frequently Asked Questions