HomeJobsMercor › Engineering

Machine Learning Engineer, Frontier Data Products

Mercor San Francisco FullTime Engineering Posted 3w+ ago
Apply Now →

What it’s like to work at Mercor

AI Hiring Platform · San Francisco

3.9
Employee Rating
3.9
Work-Life Balance
55
Open Roles
ship-fasteng-drivenflatequity

What employees love

  • Massive impact at an AI company growing at breakneck speed
  • Top-tier compensation with strong equity at $10B valuation

What could be better

  • Intense 9/9/6 work schedule — long hours are the norm
  • Work-life balance is a real sacrifice at this stage
View full Mercor culture profile →

About the Role

About Mercor

Mercor's mission is to organize human intelligence to power the AI economy. We partner with leading AI labs and enterprises to provide the human intelligence essential to AI development. Our vast talent network trains frontier AI models in the same way teachers teach students: by sharing knowledge, experience, and context that can't be captured in code alone. Today, more than 30,000 experts in our network collectively earn over $3 million a day.

Mercor is creating a new category of work where expertise powers AI advancement. Achieving this requires an ambitious, fast-paced and deeply committed team. You’ll work alongside researchers, operators, and AI companies at the forefront of shaping the systems that are redefining society. Mercor is a profitable Series C company valued at $10 billion. We work in-person five days a week in our San Francisco, NYC, or London offices.

About the Role:

Frontier AI companies are increasingly bottlenecked on expert judgment — capturing it reliably, validating it at scale, and turning it into durable model behavior. This role sits at the center of that problem.

You'll build the ML systems that power Mercor's Frontier Data Products: the infrastructure that scores, validates, and improves complex work products where correctness is rarely binary and labels are often noisy, delayed, or disputed. A single job can stay live for days, interleaving model inference, automated checks, expert review, disagreement resolution, and feedback loops. Your work determines how models reason over ambiguous inputs, when they should defer to humans, how quality is measured, and how feedback compounds into better systems over time.

This is applied ML product engineering under real production constraints — incomplete ground truth, shifting requirements, latency and cost tradeoffs, and workflows where a silent model failure corrupts the final output. It is not an offline benchmarks role.

What You'll Do

• Build ML systems that score, validate, and improve complex work products where correctness is nuanced and labels are imperfect.

• Design evaluation frameworks for ambiguous tasks where ground truth is partial, delayed, or disputed.

• Build feedback loops that turn review, disagreement, correction, and adjudication into measurable model and system improvements.

• Own production ML behavior end-to-end: precision/recall tradeoffs, regression detection, drift, latency, cost, and explainability.

• Improve model quality using the right tool for the job — prompting, fine-tuning, retrieval, active learning, heuristics, and error analysis.

• Partner with backend engineers to integrate inference into durable, long-running workflows without sacrificing debuggability or human oversight.

What Makes This Role Different

• The architecture is not set — early engineers will define how quality is measured, how models and humans interact, where automation is trusted, and how the system compounds over time.

• The feedback loop is short: shipping a model behavior change directly and visibly affects what customers receive.

• You're working on a strategically central product area at Mercor at a moment when frontier AI companies have no good solution to the problem you're solving.

Day-to-Day

• Moving fast on a young, high-ownership codebase where your decisions have long-term architectural weight.

• Operating across models, data, backend systems, and product surfaces — context switching is the default, not the exception.

• Debugging production ML failures in live, long-running workflows where silent errors matter.

• Working closely with backend engineers on a stack of Python, Temporal, Postgres, AWS, and LiteLLM.

• Balancing automation confidence with human review — knowing when to defer is as important as knowing when to ship.

What We're Looking For

• Track record of shipping ML systems that improved a real product, workflow, or business metric.

• Strong instincts for model quality, evaluation design, error analysis, and production failure modes.

• Comfort operating in ambiguous problem spaces where labels are imperfect and correctness evolves.

• Sound judgment about when to reach for prompting, fine-tuning, heuristics, retrieval, human review, or a simpler product constraint.

• Solid engineering fundamentals across the full ML stack — not just modeling.

• Familiarity with LLM applications, model-assisted workflows, evaluation frameworks, or human-in-the-loop ML is a strong plus.

You're likely someone who:

• Defaults to simple, inspectable ML systems that improve quickly and fail in understandable ways — not the most impressive architecture.

• Gets uncomfortable when a model ships without a clear evaluation story.

• Can hold ambiguity without paralysis and make reasonable bets with incomplete information.

• Cares about the real-world output of the system, not just the benchmark.

Benefits

• Bi-annual performance bonus structure.

• Generous equity grant vested over 4 years.

• Up to $15k Relocation bonus.

• $10K housing bonus (if you live within 0.5 miles of our office).

• $1.5K monthly stipend for meals.

• Free Equinox membership.

• $200 monthly laundry reimbursement.

• $200 monthly personal wellness reimbursement.

• Health, Dental, Vision insurance.

Similar Roles

More at Mercor
Site Reliability Engineer
San Francisco
Infrastructure Engineer
San Francisco
Software Engineer, Fraud
San Francisco or NYC
Software Engineer, Agents
San Francisco or NYC
Web/Brand/Visual Designer
San Francisco
Similar roles at other companies
Anthropic Fellows Program, ML Systems & Performance
Anthropic · London, UK; Ontario, CAN; Remote-Friendly, United States; San Francisco, CA
Forward Deployed Engineer, Agentic Platform (Korea)
Cohere · Korea
Member of Technical Staff (Machine Learning Engineer, Search)
Perplexity AI · Belgrade
AI Developer Advocate
Mistral AI · Paris
AWS Cloud Partner Solutions Architect — EMEA
Databricks · Paris, France

Frequently Asked Questions

What is the work-life balance like at Mercor?
Mercor has a work-life balance score of 3.9/5 based on employee reviews. This is about average for the AI/tech industry.
What is Mercor’s culture like?
Mercor is characterized by these culture values: ship-fast, eng-driven, flat, equity. Based on employee reviews, the company has an overall rating of 3.9/5. Massive impact at an AI company growing at breakneck speed
How many open roles does Mercor have?
Mercor currently has 55 open roles across departments including engineering, product, sales, and more. Roles are refreshed daily from their careers page.
Is this role remote-friendly?
This role is located in San Francisco. Check the job description above for specific location and remote work details.
Apply for this role at Mercor →