Job Description
Position Overview
As an SRE Engineer for the TELUS Health Data Office, you will be the operational backbone for our AI Automation team. You will support high-stakes GenAI solutions—built by our AI/ML Developers—ensuring they transition from prototype to mission-critical production services. Your focus is LLMOps: managing the specialized stack required for Large Language Models, from vector search performance to prompt observability and cost-governance.
Key Responsibilities
- Reliability for GenAI: Maintain the uptime and performance of AI applications, specifically focusing on API latency for LLM inferences and Vector DB query performance.
- MLOps & CI/CD: Automate the deployment of AI microservices (Python/NodeJS) and infrastructure using Terraform or similar IaC tools, ensuring rapid but safe delivery cycles.
- Scaling & Optimization: Optimize cloud resource consumption (GCP/Azure) for high-compute AI workloads to ba...