Job Description
We’re looking for strong Java engineers who’ve owned production systems and want to focus on reliability, scalability, and resilience.
This is for you if you’ve:
- Built and operated large-scale Java/JVM services
- Carried on-call and handled real production incidents
- Debugged JVM, GC, latency, and concurrency issues under pressure
- Implemented resilience patterns (circuit breakers, timeouts, graceful degradation)
What you’ll do:
- Own availability, latency, and reliability of critical services
- Improve systems through code-level reliability, not just infra
- Define SLIs/SLOs, lead incident reviews, reduce toil
- Partner with product teams to design for failure
Reliability mindset (what differentiates this role)
- Experience implementing or driving:
Circuit breakers, bulkheads, rate limiting...