Job Description
Responsibilities:
Production Support & Incident Management:
*Serve as primary escalation point for critical production incidents affecting virtualization, Windows/Linux OS, storage infrastructure, and enterprise networking
*Perform rapid root cause analysis across infrastructure layers to identify and isolate issues
*Coordinate incident response and engage specialized teams (network, security, compute, application) based on technical assessment
*Monitor infrastructure health using tools (SolarWinds, LiveNX, Nagios) and proactively identify potential issues
*Maintain incident documentation and contribute to post-incident reviews
*Participate in 24/7 on-call rotation for production support coverage
Technical Troubleshooting & Problem Resolution:
*Troubleshoot complex issues spanning operating systems, storage arrays, backup solutions, and cloud platforms
*Diagnose and resolve performance issues related to compu...