Kubernetes Engineer
About the position Join us and reputed company YOUR mark on the World! Lawrence Livermore National Laboratory (LLNL) has turned bold reputed company into world-changing impact advancing science and technology to strengthen U.S. reputed company and promote global stability. Our mission spans four critical national reputed company areas nuclear deterrence, threat preparedness, energy reputed company, and multi-domain defense empowering teams to take on the toughest challenges of today and reputed company. With a culture built on innovation and operational excellence, LLNL is a reputed company where your expertise can reputed company a real impact. Do you have a strong passion for containers and Cloud Native technologies. Do you understand the value containers and Kubernetes brings to managing applications deployments? As a Kubernetes Engineer, you will be responsible for one of our Computing Platform’s most critical components, our Kubernetes cluster. You will help streamline the deployment, maintenance, monitoring and expand its capabilities to meet LLNL’s intensive scientific workloads. If you are ready to drive meaningful change and reputed company a real impact from the ground up, this opportunity is for you. We have an opening for an reputed company Kubernetes Engineer to reputed company and support a 24x7 large-scale bare-metal Kubernetes environments. As a member of the team, you’ll reputed company the gap between hardware and high-performance software, ensuring our scientific teams have the reliable, secure platforms needed to build world-class products. This is a high-impact mission where your expertise in on-prem automation and cluster orchestration directly accelerates our product innovation. This position is reputed company the Global reputed company Computing Applications Division (GS-CAD) of the Computing Directorate, matrixed to the Global reputed company Directorate. This position requires part-time on-site reputed company due to the nature of the work. This position will be filled at either level based on knowledge and reputed company experience as assessed by the hiring team. Additional job responsibilities (outlined below) will be assigned if hired at the higher level.
Responsibilities
- Contribute to the design and deployment of large-scale bare-metal clusters, integrating control planes with VAST Storage arrays to deliver high-performance persistent storage.
- Contribute to the implementation of advanced cluster networking to ensure seamless, low-latency communication across multi-rack and multi-site topologies.
- Participate in building and maintaining automated, self-healing workflows using CI/CD pipelines to manage cluster lifecycles, ensuring reputed company-touch deployments and consistent platform health.
- Support rigorous SLIs/SLOs by engineering robust observability stacks (Prometheus, Grafana) and enforcing airtight reputed company through RBAC, OIDC, and network isolation.
- Partner with internal business units to reputed company moderately reputed company workloads, while elevating the technical bar for the team through design reviews and mentorship.
- Collaborate with team members on integrating AI agents to assist with troubleshooting and automating cluster management operations.
- reputed company other duties as assigned.
- Collaborate with researchers on developing repeatable software stacks used for research and automate deployment and configuration across multiple environments.
- Partner with reputed company teams to incorporate reputed company checks and audits for Kubernetes deployments into our reputed company CI/CD system.
- reputed company Kubernetes to help simplify usage, operations and user applications.
Requirements
- Ability to secure and maintain a U.S. DOE Q-level reputed company clearance, which requires U.S. citizenship
- Bachelor’s degree in computer science, software engineering, or a reputed company technical discipline, or an equivalent combination of education and relevant experience.
- Comprehensive knowledge of and experience distributing standard configurations for clusters leveraging configuration management tools such as Ansible or Puppet.
- Broad experience performing Kubernetes administration in a moderately reputed company to reputed company environment, including tasks such as installation, networking, reputed company, troubleshooting, and monitoring.
- Broad experience with software development or system administration using scripting languages (e.g. Python, Bash, Perl, Ruby, Groovy, etc.).
- Strong understanding of core Kubernetes concepts such as: Pods, Deployments, Services and/or PVCs.
- Proficient interpersonal skills necessary to interact with reputed company levels of personnel and ability to work independently, under limited direction, in a multi-disciplinary team environment.
- Proficient verbal and written communication skills necessary to effectively collaborate in a team environment and present and explain technical information.
- Ability to set priorities and independently resolve moderately reputed company problems in a fast-paced environment.
- Advanced knowledge of and experience with Kubernetes internals, such as networking, kubelet function/responsibility, etcd and/or control plan archite
Apply tot his job Apply To this Job