SRE LEAD

  • Minnetonka, MN
  • Posted 2 days ago | Updated 2 days ago

Overview

Hybrid
Depends on Experience
Contract - W2
Contract - Independent

Skills

Change Management
Cloud Computing
Collaboration
Documentation
Dynatrace
Communication
Amazon Web Services
Application Development
Artificial Intelligence
Failover
Capacity Management
Continuous Delivery
Continuous Integration
Database
Docker
GitHub
Grafana
Incident Management
Java
Management
Microservices
Microsoft Azure
Reliability Engineering
Software Development
Software Performance Management
MongoDB
Orchestration
Supply Chain Management
Performance Tuning
ROOT
React.js
ServiceNow
Splunk
Terraform
UI
User Experience
WAR

Job Details

Hello,

Role: Lead SRE Engineer

Hybrid - Minnetonka, MN

Exp: 10+yrs

Any Visa type is okay

Responsibilities:

  • System Reliability and Performance: Lead and drive end to end (Supply Chain) reliability, availability, and performance of applications in Digital Experience.
  • Monitoring and Alerting: Design, implement, and maintain robust monitoring and alerting systems to proactively identify and resolve issues.
  • Infra Capacity Planning: Drive capacity planning, ensuring that systems can handle current and future workloads.
  • Incident Response: Lead and guide Org level application teams in incident response efforts, ensuring quick and effective resolution of issues.
  • Performance Tuning: Drive and implement best practices and controls to identify the bottlenecks and support performance tuning before production rollout
  • Post-Incident Reviews: Drive and support post-incident(P1/P2) reviews to identify root causes and prevent future incidents.
  • Security: Lead application teams to adopt industry standard best practices in managing security certs, Secrets and Non-User Id s to avoid any issues and outages.
  • Change Management: Implement robust change management processes to ensure that changes to the system are deployed safely and reliably.
  • Peak Season Readiness: Support Digital teams to get prepared for peak season in terms of overall E2E system resiliency and redundancy to handle expected peak usage volumes.
  • War room Playbooks: Support teams in preparation of playbook with War room scenarios.
  • Auto Failover & Auto Scaling: Lead and Support application teams in adopting best auto failover and auto scaling strategies to maintain overall system resiliency.
  • Collaboration with Engineers: Work with application development teams to understand their needs, identify potential reliability issues, and improve the software development lifecycle.
  • Cloud: Define and develop Cloud strategy for the enterprise, focusing on AWS, aligned with IT requirements

Requirements:

  • A solid expertise in driving SRE principles and at least 5 years of leading experience to guide SRE engineers.
  • Experience in supporting Digital web products
  • Strong experience in Java, Web, Microservices, Springboot, React, UI/UX, Security, Stability, Production Operations
  • Monitoring & Observability APM tools like Dynatrace Clod, Splunk, Elastic APM, Interlink and Grafana.
  • Hands-on experience with cloud platforms (AWS, Azure) and their services.
  • Experience with containers (Docker) and container orchestration (Kubernetes
  • AI tools like GitHub Copilot, Chat Playground.
  • Incidents management tools like Service now.
  • Understanding of CI/CD and using GitHub actions.
  • Strong communication and collaboration skills to work effectively with cross-functional teams.
  • DB: MongoDB and MySQL
  • Proficiency in automation technologies and tools like Terraform.
  • Good Documentation skills.

Outcomes:

  • Increased system Reliability and 99.999% availability
  • End-to-end (Supply chain) Resiliency and Redundancy.
  • Improved Customer Satisfaction
  • Faster Incident Detection, Engagement, and Restoration.
  • Advanced automated controls to reduce human touchpoints in monitoring workflows
  • Leverage AI to drive improvements
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

About Fynbosys Inc