💼 Full-Time Position

Software Development Manager, AWS Neuron SDK - Distributed Training

🏢
Amazon
📍 Cupertino, CA, United States
📍
Location
Cupertino, United States
📅
Posted
June 29, 2026
Type
Full-Time
🎯

Full-Time Opportunity: This is a permanent, full-time position with a competitive package and real career growth potential.

Job Description

Description
Job description
AWS Neuron is a software stack for the Annapurna Inferentia and Trainium machine
learning accelerators hosted inside AWS EC2 Trn1/2 and Inf1 servers.

As the Principal Engineer for the Neuron Distributed Training team, you will be responsible for working hands-on with a strong team of engineers to help design and optimize ML on Neuron devices. Specifically focus on bringing up a coherent solution across the stack to increase the training resiliency for ultra clusters with thousands of nodes. You will Scale and Optimize the application stack for LLMs that leverage multi-modal modes of input/output-generation such as Text, Vision, Video, Audio etc. You will be responsible for the full development life cycle of providing Distributed Training support for multi-modal transformer models such as MM-Llama3.2, DiT/Pixart, CLIP etc. You will develop scalability features and performance optimizations in the Neuron ML Framework components to enable them...