Job Description
Role Description
We are seeking a highly skilled Site Reliability Engineer (SRE) to manage, optimize, and ensure the reliability of production applications.
Responsibilities
- Reliability & Performance: Implement best practices to ensure high availability, scalability, and performance of production applications.
- Monitoring & Incident Response: Set up monitoring, troubleshoot issues, and lead incident resolution.
- Capacity Planning & Optimization: Analyze resource usage and optimize infrastructure costs and performance.
- Operations Automation: Perform automation of day-to-day runbooks, operational workflows, and integrations between various operational tools such as PagerDuty, ServiceNow, etc.
Qualifications
- Tools and Tech: Dynatrace, Elastic Search, ELK Stack, PagerDuty, ServiceNow, Logic Monitor, PowerShell, Shell Scripting, Python.
- Experience Areas: Monitoring and observabi...