See all roles

Senior Site Reliability Engineer

Work from home Full-time role Hiring
This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

This role involves joining an Identity Security Cloud software development team as a Senior Site Reliability Engineer (SRE). You will work closely with software engineers, infrastructure platform services, engineering managers, and other stakeholders to ensure the reliability, scalability, and performance of the team's services.

  • Work with development and service owners to solve performance issues and ensure system scalability.
  • Design, develop, and implement solutions to improve reliability, availability, performance, and scalability of systems.
  • Develop alerts and dashboards in collaboration with technical leaders and infrastructure platform services.
  • Own and improve key operational metrics (SLIs, SLOs, Error Budgets, monitoring and alerting).
  • Drive continuous improvement through post-incident reviews and blameless postmortems of non-functional issues.
  • Develop and maintain comprehensive monitoring and alerting to proactively identify and resolve issues.
  • Create and maintain dashboards, conducting ongoing reviews to optimize gaps.
  • Collaborate with technical leads, DevOps/SRE, and infra teams for capacity planning.
  • Identify and address production performance bottlenecks through profiling, tuning, and optimization.
  • Automate repetitive tasks and processes to improve efficiency.
  • Work closely with Software, Performance, and Test Engineers to influence system design and architecture.
  • Review and contribute to documentation for systems, processes, runbooks, and procedures.
  • Participate in a 24/7 on-call rotation to gain subject matter expertise.
  • Lead incident postmortem efforts, ensuring timely compilation of reports.
  • Utilize excellent diagnostic and problem-solving skills to analyze complex systems and data.

Qualifications

  • Bachelor’s degree in computer science, a related field, or equivalent practical experience.
  • Proven 5+ years of SRE experience.
  • Strong understanding of SRE principles and practices.
  • Experience with cloud platforms (AWS, GCP, or Azure).
  • Proficiency in at least one scripting language (e.g., Python, Bash, Go).
  • Experience with monitoring and logging tools (e.g., Prometheus, Grafana, Honeycomb, OpenSearch).
  • Level of coding experience beyond simple scripts with programming languages such as Go, Java, or Python.
  • Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes).
  • Understanding of network protocols and security best practices.
  • Familiarity with DevOps culture and practices and experience with CI/CD toolchains (Jenkins, ArgoCD, SpaceLift).
  • Experience with Incident Response tools and processes (PagerDuty).
  • Experience with Infrastructure as Code (Terraform, Helm).
  • Strong problem-solving and troubleshooting skills.
  • Excellent communication and collaboration skills.
  • Ability to work independently and as part of a team.

Preferred Qualifications

  • Technology experience: Kafka, relational databases, performance tuning (JVM, Go).
  • Experience with Grafana K6 – Continuous Performance Tool.

Onboarding Timeline

  • In the first 30 days you will:
    • Meet team, understand the team’s mission and vision.
    • Gain clarity on various roles and expectations.
    • Complete development environment setup.
    • Read guides, documentation, perform mandatory training.
    • Learn company processes, benefits.
  • By 6 months you should:
    • Understand team goals and OKRs for the quarter and beyond.
    • Complete initial analysis and implementation of SRE team assignments.
    • Be comfortable with tools, systems, and processes used on a day-to-day basis.
    • Complete project work, both supervised and unsupervised.
Apply To This Job

You might like

Operations - Billing & Pricing Analyst

Work from home Full-time role

Technical Project Manager

Work from home Full-time role

Senior Product Manager

Work from home Full-time role

Senior Product Manager, Mozilla Accounts

Work from home Full-time role

Account Manager

Work from home Full-time role

Emerging Enterprise Account Executive

Work from home Full-time role

NetCredit Customer Service Representative

Work from home Full-time role

Crisis Navigator

Work from home Full-time role

Learning & Development Manager

Work from home Full-time role

Digital Campaign Coordinator

Work from home Full-time role

Experienced Remote Data Entry Specialist – Flexible Full-Time or Part-Time Opportunities with Competitive Weekly Pay

Work from home Full-time role

Databricks Strategic Alliance Director

Work from home Full-time role

Flatbed *HOME WEEKLY* CDL-A Truck Driver - HIRING NOW!

Work from home Full-time role

Data Modeler IV

Work from home Full-time role

Experienced Customer Service Representative – Retail Banking

Work from home Full-time role

Director of Customer Success – Strategic Leadership in Digital Marketing Solutions for Enhanced Client Growth and Retention

Work from home Full-time role

Customer Serv Agent - LIT (Part-Time)

Work from home Full-time role

Google DialogFlow Engineer (Python)

Work from home Full-time role

Walmart Customer Care Specialist - Remote Work

Work from home Full-time role

Entry Level Remote Content Writer (No Experience, Flexible Hours)

Work from home Full-time role