Job Description
Overview
We are hiring experienced machine learning engineers and researchers to serve as human baseliners for evaluations of open-ended machine learning research tasks. These evaluations measure how well AI agents perform on realistic AI R&D problems. To interpret agent performance, we also need strong human reference points: skilled practitioners attempting the same tasks under the same time and compute constraints. As a baseliner, you will complete self-contained ML research tasks in a sandboxed environment, working independently with your preferred tools and workflow. Your performance will be used as a benchmark against which frontier-model agents are evaluated.
What You’ll Do
- Attempt open-ended machine learning research tasks under a fixed time and compute budget (work trial)
- Work independently in a sandboxed Linux environment with internet access
- Use your preferred tooling, including IDEs and AI coding assistants such ...