Job Description
Description
Annapurna Labs is an integral part of AWS and develops hardware and software components that are critical building blocks for EC2 infrastructure. We specialize in designing software, systems and chips that optimize the AWS customer experience.
The AWS Neuron Collectives team is seeking a Software Engineer to optimize collective operations for AWS Trainium. Trainium is one of Amazon's highest priority initiatives, powering the frontier AI models being trained today.
Collectives are the critical operations that scale AI compute across the data center. You'll work in depth to optimize compute for the specific topologies used to train modern LLMs. Working closely with the hardware team, you'll push for maximum performance using C/C++, interfacing with DMA and firmware and investigating detailed topologies.
You'll analyze current collective algorithms using publicly accessible tools like Neuron Explorer and optimize these to fully utilize compute and bus bandwi...
Annapurna Labs is an integral part of AWS and develops hardware and software components that are critical building blocks for EC2 infrastructure. We specialize in designing software, systems and chips that optimize the AWS customer experience.
The AWS Neuron Collectives team is seeking a Software Engineer to optimize collective operations for AWS Trainium. Trainium is one of Amazon's highest priority initiatives, powering the frontier AI models being trained today.
Collectives are the critical operations that scale AI compute across the data center. You'll work in depth to optimize compute for the specific topologies used to train modern LLMs. Working closely with the hardware team, you'll push for maximum performance using C/C++, interfacing with DMA and firmware and investigating detailed topologies.
You'll analyze current collective algorithms using publicly accessible tools like Neuron Explorer and optimize these to fully utilize compute and bus bandwi...