Overview
Skills
Job Details
Hello All,
Greetings from Rootshell Inc.
Rootshell Enterprise Technologies Inc. is a recognized provider of professional IT Consulting services in the US. We are actively seeking Site Reliability Engineer with RCA Experience Consultant for one of our client, Please share your resume with current location & full contact info
Job Title:Site Reliability Engineer with RCA Experience
Location:Boston, MA(Onsite)
Job Description:
Responsibilities:
You will be part of the SRE team who are focused on Root Cause Analysis of critical production outages to improve resiliency. You are responsible for analyzing various sources of critical incident information and articulating that to an actionable Root Cause Analysis investigation plan to lead a group of Subject Matter Experts teams to find the actual cause. Host RCA calls as a chair and drive the RCA process to conclusion. Lead problem tickets and improvements to major software components, systems, and features to improve the availability, scalability, latency, and efficiency of the Client system. Engage in and improve the service lifecycle from inception and design to deployment, operation, and refinement based on lessons learned through deep dives. Hands-on troubleshooting of VMware, Kubernetes, custom software, hardware, and infrastructure performance incidents. Be a trusted technical advisor who leads complex root cause analysis investigations from beginning to end until improvement implementation. Demonstrate sound knowledge of gathering logs and facilitating the root cause analysis with cross-functional teams. Assist internal teams with corrective actions and improvement tickets and influence the completion goals. Flexibility to work during occasional out-of-hours including weekends may be required depending on the criticality and workload demands.
Qualifications:
Bachelor s degree in software engineering, Information systems, computer science, or a related field. 12+ years of experience working on ITSM tools such as Jira or equivalent tools. 8+ years of infrastructure engineering experience, with a record demonstrating hands-on troubleshooting large-scale solutions, on-prem distributed systems, and custom-developed software applications. 8+ years of experience in operating production systems, including troubleshooting, testing, and automation. 5+ years of experience leading technical Root Cause Analysis (Software and/or industrial focus is a plus).
Soft skills: Ability to prioritize parallel RCA investigations and tasks by influencing cross-functional teams to complete actions on time with demanding quality. Experience with executive incident communication, RCA report writing, and written communication skills to nontechnical audiences. Ability to transfer vast technical background to projects through excellent problem-solving and competence to work with other technical teams. Efficiently read and understand Gitlab technical documentation. Experience in the advanced use of tools like Prometheus, Grafana, Logic Monitor, Elastic, VMware, and use of CLI (Kube or Linux). PowerBi is a plus.
Thanks & Regards
Naveen
Rootshell Enterprise Technologies Inc.