Site Reliability Engineer

Overview

Hybrid

Up to $140,000

Full Time

Able to Provide Sponsorship

Skills

Amazon Web Services

Apache Velocity

AppDynamics

CHAOS

Cloud computing

Computer science

Continuous delivery

DevOps

Docker

Hosting

Java

Jenkins

Data

Kubernetes

Operational excellence

Python

Testing

Ruby

Scalability

Scripting

Splunk

Root cause analysis

FMEA

High availability

Database

Job Details

Job Title: Site Reliability Engineer (SRE)
Location: Mountain View, CA (Hybrid)
Job Type: Full-time

Job Description:

As a Site Reliability Engineer (SRE), you will design, implement, and maintain complex data systems that support millions of customers. You will apply Cloud Native principles and best practices to ensure high availability, security, performance, and scalability of database systems. This is a hands-on role that involves working with cutting-edge technologies and maintaining critical infrastructure.

Key Responsibilities:

Design, build, and maintain CI/CD pipelines in Jenkins.
Deploy services in Kubernetes clusters using Helm, Kustomize, and similar tools.
Implement infrastructure changes in AWS with a deep understanding of AWS services.
Participate in on-call duties for pre-production and production systems, supporting multi-million users.
Write and review RCA (Root Cause Analysis) documentation to prevent the recurrence of incidents and share learnings.
Contribute to system upgrades, deployment automation, monitoring enhancements, and production changes.
Create operational playbooks, write how-to articles, and gain domain knowledge to drive team improvements.
Participate in FMEA (Failure Mode and Effects Analysis) testing, chaos testing, and security remediation efforts.
Share best practices for operational excellence and cost optimization.
Automate processes to reduce manual efforts and increase efficiency.
Continuously look for opportunities to increase developer velocity and productivity.

Qualifications:

Bachelor s or master s degree in Computer Science or a related technical field, or equivalent experience.
4+ years of hands-on experience with development and operations in AWS environments.
Expertise in performance monitoring, troubleshooting, and tuning.
Experience with AWS services and Cloud hosting.
Proficiency in DevOps automation using scripting languages.
Experience with programming languages such as Java, Python, or Ruby.
Knowledge of Docker, Kubernetes, and ArgoCD.
Experience with monitoring and observability tools such as Splunk, Wavefront, AppDynamics, Prometheus, and Tracing.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

About The Wolf Works

Share