Site Reliability Engineer

Overview

Hybrid

$120,000 - $130,000

Full Time

Able to Provide Sponsorship

Skills

Documentation

Computer science

Continuous delivery

Continuous integration

DevOps

Amazon Web Services

Apache Velocity

AppDynamics

Cloud computing

Database

Docker

CHAOS

FMEA

High availability

Hosting

Jenkins

Kubernetes

Operational excellence

Java

Root cause analysis

Performance monitoring

Productivity

Programming languages

Python

Ruby

Scalability

Scripting

Splunk

Testing

Optimization

Job Details

Site Reliability Engineer (SRE) Location: Mountain View, CA (Hybrid)
Job Type: Full-time

Job Description:

As a Site Reliability Engineer (SRE), you will design, implement, and maintain complex data systems that support millions of customers. You will apply Cloud Native principles and best practices to ensure high availability, security, performance, and scalability of database systems. This is a hands-on role that involves working with cutting-edge technologies and maintaining critical infrastructure.

Key Responsibilities:

Design, build, and maintain CI/CD pipelines in Jenkins.
Deploy services in Kubernetes clusters using Helm, Kustomize, and similar tools.
Implement infrastructure changes in AWS with a deep understanding of AWS services.
Participate in on-call duties for pre-production and production systems, supporting multi-million users.
Write and review RCA (Root Cause Analysis) documentation to prevent the recurrence of incidents and share learnings.
Contribute to system upgrades, deployment automation, monitoring enhancements, and production changes.
Create operational playbooks, write how-to articles, and gain domain knowledge to drive team improvements.
Participate in FMEA (Failure Mode and Effects Analysis) testing, chaos testing, and security remediation efforts.
Share best practices for operational excellence and cost optimization.
Automate processes to reduce manual efforts and increase efficiency.
Continuously look for opportunities to increase developer velocity and productivity.

Qualifications:

Bachelor s or master s degree in Computer Science or a related technical field, or equivalent experience.
4+ years of hands-on experience with development and operations in AWS environments.
Expertise in performance monitoring, troubleshooting, and tuning.
Experience with AWS services and Cloud hosting.
Proficiency in DevOps automation using scripting languages.
Experience with programming languages such as Java, Python, or Ruby.
Knowledge of Docker, Kubernetes, and ArgoCD.
Experience with monitoring and observability tools such as Splunk, Wavefront, AppDynamics, Prometheus, and Tracing.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Job Details

About The Wolf Works

Share