💼 Full-Time Position

Senior Consultant Specialist (Model Hosting/Inference Optimization)

🏢
HSBC Global Services Limited
📍 Guangzhou, Guangdong, China
📍
Location
Guangzhou, China
📅
Posted
June 19, 2026
Type
Full-Time
🎯

Full-Time Opportunity: This is a permanent, full-time position with a competitive package and real career growth potential.

Job Description

Some careers have more impact than others.

If you’re looking for a career where you can make a real impression, join HSBC and discover how valued you’ll be.

 

We are currently seeking an experienced professional to join our team in the role of Senior Consultant Specialist.

 

Business: CTO

Location: Guangzhou

Job ID: 48324

 

Principal responsibilities

  • Design, build, and operate scalable, reliable model hosting platforms for LLMs, embeddings, and STT/TTS across heterogeneous hardware. 
  • Drive inference optimisation for latency, throughput, and cost (quantisation, KV-cache optimisation, dynamic/continuous batching). 
  • Evaluate, integrate, and tailor inference frameworks (e.g., vLLM, TensorRT-LLM, SGLang) to maximise performance on target hardware. 
  • Own inference health and performance monitoring: lat...