Not Applicable
Posted March 14, 2026
Job link
Thinking about this job
Responsibilities
Commitments
Responsibilities
- You’ll work at the intersection of research, systems engineering, and product, collaborating with infrastructure and research teams to create environments that test and steer model behavior at scale.
- You’ll be responsible for turning conceptual specifications into working environments with verifiable reward structures, automated verifiers, task generation pipelines, and reproducible simulations.
- These environments span from API- and web-based tasks to multi-agent simulations, structured reasoning challenges, and knowledge-work environments used by customers and researchers alike.
- Design and implement RL environments that support large-scale agent evaluation and reinforcement learning experiments
- Build task generation pipelines, dynamic datasets, and scripted environments with controlled complexity and stochasticity
- Develop verifiers and reward models to automatically score trajectories and evaluate model reasoning
- Collaborate with infrastructure and systems engineers to ensure environments are scalable, reproducible, and instrumented for detailed telemetry
- Design APIs and orchestration frameworks for running, resetting, and evaluating agents across environments
- Partner with research and customer teams to translate open-ended specifications into verifiable, testable systems
- Optimize environment performance, logging, and reward reproducibility across distributed setups
Commitments
In-office private chef
Not Met Priorities
What still needs stronger evidence
Requirements
- 3+ years experience in software engineering, simulation systems, or ML infrastructure
- Strong command of Python and systems-level programming
- Deep understanding of ML concepts.
- RL concepts are a plus - reward modeling, environment dynamics, verifiability, evaluation, and agent interaction loops
- Experience designing scalable data pipelines, browser or API simulations (e.g.
- Playwright, Selenium), or distributed compute frameworks
- Familiarity with instrumentation, metrics, and data pipelines for evaluation
- Proven ability to translate research or product goals into robust, maintainable systems
- Curiosity and conviction around building environments that steer AGI
Preferred Skills
- Above All, We Look For An Eagerness To Learn, Passion For Research, Creativity In Problem Solving And a Proactive Mindset.
- RL concepts are a plus - reward modeling, environment dynamics, verifiability, evaluation, and agent interaction loops
- Experience designing scalable data pipelines, browser or API simulations (e.g.
- Playwright, Selenium), or distributed compute frameworks
- Familiarity with instrumentation, metrics, and data pipelines for evaluation
- Proven ability to translate research or product goals into robust, maintainable systems
- Curiosity and conviction around building environments that steer AGI
About Patronus AI
Patronus AI is a frontier lab developing simulation research and infrastructure to accelerate progress toward human-aligned AGI. We are on a mission to simulate all of the world’s intelligence.
We are the team behind some of the earliest and most influential research in AI evaluation like FinanceBench, Lynx, SimpleSafetyTests, CopyrightCatcher, Humanity’s Last Exam, and more. We are formerly AI researchers and engineers from companies like Meta AI, Amazon AGI, and Google. Our customers include foundation model labs and Fortune 500 enterprises like Adobe. We are backed by top-tier investors like Lightspeed Venture Partners, Notable Capital, Stanford University, Noam Brown, Gokul Rajaram, and more.
Responsibilities
We’re looking for strong engineers - builders who can design and implement complex RL environments end-to-end. You’ll work at the intersection of research, systems engineering, and product, collaborating with infrastructure and research teams to create environments that test and steer model behavior at scale.
You’ll be responsible for turning conceptual specifications into working environments with verifiable reward structures, automated verifiers, task generation pipelines, and reproducible simulations. These environments span from API- and web-based tasks to multi-agent simulations, structured reasoning challenges, and knowledge-work environments used by customers and researchers alike.
In This Role, You Will
Design and implement RL environments that support large-scale agent evaluation and reinforcement learning experiments
Build task generation pipelines, dynamic datasets, and scripted environments with controlled complexity and stochasticity
Develop verifiers and reward models to automatically score trajectories and evaluate model reasoning
Collaborate with infrastructure and systems engineers to ensure environments are scalable, reproducible, and instrumented for detailed telemetry
Design APIs and orchestration frameworks for running, resetting, and evaluating agents across environments
Partner with research and customer teams to translate open-ended specifications into verifiable, testable systems
Optimize environment performance, logging, and reward reproducibility across distributed setups
Qualifications
"The number one qualification to succeed in this machine learning course is gumption” - John Lafferty, CS Professor at Yale
Above All, We Look For An Eagerness To Learn, Passion For Research, Creativity In Problem Solving And a Proactive Mindset. You Are a Great Fit If You Have a Background In The Following
3+ years experience in software engineering, simulation systems, or ML infrastructure
Strong command of Python and systems-level programming
Deep understanding of ML concepts. RL concepts are a plus - reward modeling, environment dynamics, verifiability, evaluation, and agent interaction loops
Experience designing scalable data pipelines, browser or API simulations (e.g. Playwright, Selenium), or distributed compute frameworks
Familiarity with instrumentation, metrics, and data pipelines for evaluation
Proven ability to translate research or product goals into robust, maintainable systems
Curiosity and conviction around building environments that steer AGI
Benefits
Competitive salary and equity packages
Health, dental, and vision insurance plans
401(k) plan + matching
In-office private chef
Sponsored personal tax accounting
Whoop band, Oura ring, Function Health
Monthly meal stipend
Monthly health and wellness stipend
Equinox membership
Fun global offsites!
Patronus AI is an equal opportunity employer. We celebrate diversity in our workplace, and all qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or other legally protected characteristics.
Patronus AI is a frontier lab developing simulation research and infrastructure to accelerate progress toward human-aligned AGI. We are on a mission to simulate all of the world’s intelligence.
We are the team behind some of the earliest and most influential research in AI evaluation like FinanceBench, Lynx, SimpleSafetyTests, CopyrightCatcher, Humanity’s Last Exam, and more. We are formerly AI researchers and engineers from companies like Meta AI, Amazon AGI, and Google. Our customers include foundation model labs and Fortune 500 enterprises like Adobe. We are backed by top-tier investors like Lightspeed Venture Partners, Notable Capital, Stanford University, Noam Brown, Gokul Rajaram, and more.
Responsibilities
We’re looking for strong engineers - builders who can design and implement complex RL environments end-to-end. You’ll work at the intersection of research, systems engineering, and product, collaborating with infrastructure and research teams to create environments that test and steer model behavior at scale.
You’ll be responsible for turning conceptual specifications into working environments with verifiable reward structures, automated verifiers, task generation pipelines, and reproducible simulations. These environments span from API- and web-based tasks to multi-agent simulations, structured reasoning challenges, and knowledge-work environments used by customers and researchers alike.
In This Role, You Will
Design and implement RL environments that support large-scale agent evaluation and reinforcement learning experiments
Build task generation pipelines, dynamic datasets, and scripted environments with controlled complexity and stochasticity
Develop verifiers and reward models to automatically score trajectories and evaluate model reasoning
Collaborate with infrastructure and systems engineers to ensure environments are scalable, reproducible, and instrumented for detailed telemetry
Design APIs and orchestration frameworks for running, resetting, and evaluating agents across environments
Partner with research and customer teams to translate open-ended specifications into verifiable, testable systems
Optimize environment performance, logging, and reward reproducibility across distributed setups
Qualifications
"The number one qualification to succeed in this machine learning course is gumption” - John Lafferty, CS Professor at Yale
Above All, We Look For An Eagerness To Learn, Passion For Research, Creativity In Problem Solving And a Proactive Mindset. You Are a Great Fit If You Have a Background In The Following
3+ years experience in software engineering, simulation systems, or ML infrastructure
Strong command of Python and systems-level programming
Deep understanding of ML concepts. RL concepts are a plus - reward modeling, environment dynamics, verifiability, evaluation, and agent interaction loops
Experience designing scalable data pipelines, browser or API simulations (e.g. Playwright, Selenium), or distributed compute frameworks
Familiarity with instrumentation, metrics, and data pipelines for evaluation
Proven ability to translate research or product goals into robust, maintainable systems
Curiosity and conviction around building environments that steer AGI
Benefits
Competitive salary and equity packages
Health, dental, and vision insurance plans
401(k) plan + matching
In-office private chef
Sponsored personal tax accounting
Whoop band, Oura ring, Function Health
Monthly meal stipend
Monthly health and wellness stipend
Equinox membership
Fun global offsites!
Patronus AI is an equal opportunity employer. We celebrate diversity in our workplace, and all qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or other legally protected characteristics.