Job Description
Job De ion
The role is for an experienced HPC (High-Performance Computing) Architect or Engineer to take a lead in designing, deploying, enhancing, and managing our HPC infrastructure. This infrastructure includes GPU clusters (B200/H200/A40), liquid-cooled CPU cluster, and a cloud-based HPC system, with future expansions planned to support cutting‑edge NUS research.
Duties and Responsibilities
- Lead the administration and operation of our HPC infrastructure (both on‑premise and/or cloud), including hardware, software, and networking components.
- Lead the development of HPC infrastructure together with lead architect and management team.
- Perform project management activities and vendor management on strategic HPC infrastructure projects.
- Develop, implement, and document standard operation procedures and best practices for HPC operations, including system monitoring, performance tuning, and security.
- Ensure th...