🎯
Full-Time Opportunity: This is a permanent, full-time position with a competitive package and real career growth potential.
Job Description
Responsibilities:
Design, implement, and evolve internal platform capabilities that make AI Efficiency services easier to build, ship, observe, secure, and operateBuild and maintain self-service workflows, reusable platform abstractions, and golden paths that improve developer productivity while preserving reliability, security, and governanceImprove platform reliability through better monitoring, alerting, observability, deployment safety, release practices, and incident readinessDefine and operationalize service health indicators, SLIs, SLOs, and related reliability metrics that help teams make informed tradeoffs between reliability, velocity, and costBuild automation that reduces operational toil and improves mean time to detect, respond, and recover from incidentsPartner with engineers throughout the software development lifecycle to embed operability, production readiness, and maintainabil...