Job Description
**Role Number:** 200649482-3337
**Summary**
AI systems are only as trustworthy as the methods used to evaluate them. At Apple, where AI powers experiences for billions of people, getting evaluation right is not a support function—it is a foundational science. Our team, part of Apple Services Engineering, is building that scientific foundation: rigorous, scalable evaluation methodology for LLMs, agentic systems, and human-AI interaction.
What makes this team unusual is its interdisciplinary core. You will work alongside measurement scientists (psychometrics, validity theory), ML researchers, and platform engineers—bringing together ML research, statistical rigor, and production engineering.
We are looking for a Research Scientist who treats evaluation methodology itself as a first-class research problem—someone with deep technical fluency in preference learning, reward modeling, or calibration theory, and the drive to advance the field while solving real problems at s...
**Summary**
AI systems are only as trustworthy as the methods used to evaluate them. At Apple, where AI powers experiences for billions of people, getting evaluation right is not a support function—it is a foundational science. Our team, part of Apple Services Engineering, is building that scientific foundation: rigorous, scalable evaluation methodology for LLMs, agentic systems, and human-AI interaction.
What makes this team unusual is its interdisciplinary core. You will work alongside measurement scientists (psychometrics, validity theory), ML researchers, and platform engineers—bringing together ML research, statistical rigor, and production engineering.
We are looking for a Research Scientist who treats evaluation methodology itself as a first-class research problem—someone with deep technical fluency in preference learning, reward modeling, or calibration theory, and the drive to advance the field while solving real problems at s...