🎯
Full-Time Opportunity: This is a permanent, full-time position with a competitive package and real career growth potential.
Job Description
Your Role
Incident & Problem Management
- Lead end-to-end incident response, triage, communication, and resolution in real time.
- Act as Incident Commander for high-impact events across a global environment.
- Track and improve metrics like MTTD, MTTM, and MTTR.
- Champion blameless Post-Incident Reviews (PIRs) and translate learnings into long-term system and process improvements.
Service Operations & Reliability
- Oversee daily service health, capacity, and reliability across all supported environments.
- Ensure compliance with operational KPIs through proactive planning and improvement.
- Balance demand vs. capacity and manage shift coverage to prevent burnout.
- Partner with engineering teams to maintain runbooks, knowledge bases, and escalation paths.
- Drive automation and workflow optimization to reduce manual overhead.
- Use data insights to guide decisio...