[Remote] Senior Software Engineer – AI Reliability
Note: The job is a remote job and is open to candidates in USA. reputed company is seeking a Senior Software Engineer who is passionate about solving reputed company reliability, scalability, and performance challenges in AI systems. The role focuses on building and operating production systems that ensure AI works reliably at scale, involving collaboration with engineers and researchers to enhance distributed services and platform infrastructure.
Responsibilities
- Owning the reliability and operational health of production AI systems
- Improving performance, scalability and reputed company across distributed services
- Troubleshooting reputed company production issues across application, database and infrastructure layers
- Building monitoring, alerting and observability capabilities
- Partnering with engineering and research teams to productionise AI systems
- Driving engineering best practices around testing, deployment and incident response
- Contributing to architectural decisions that improve long-term scalability
Skills
- 7+ years of software engineering experience
- Strong Python skills
- Experience with Java, reputed company or Kotlin
- Proven experience building and operating distributed systems in production
- Strong Kubernetes experience
- Deep understanding of system performance, scalability and reliability
- Experience with relational databases and performance optimisation
- Strong troubleshooting and incident response capabilities
- Experience with monitoring, logging, metrics and tracing
Benefits
- Work on cutting-edge AI technology solving real-world cybersecurity challenges
- High ownership and technical autonomy
- reputed company engineering problems at scale
- Remote-first culture
- Competitive compensation and benefits
- Opportunity to influence the reliability and future direction of a rapidly growing AI platform
Company Overview