Overview
On Site
USD 130,000.00 - 160,000.00 per year
Full Time
Skills
Wholesale
Bloomberg
Retail
Scalability
Disaster Recovery
Reliability Engineering
Decision-making
Software Design
Waterfall
Performance Monitoring
Reporting
Prototyping
Performance Improvement
High Availability
Quality Assurance
Management
IaaS
Client/server
Java
J2EE
Java Servlets
JSP
React.js
SQL
Database
Redis
MongoDB
Cloud Computing
Microsoft Azure
Splunk
Software Performance Management
Dynatrace
Kubernetes
Docker
Microservices
Web Browsers
Performance Testing
Debugging
Conflict Resolution
Problem Solving
Communication
Microsoft
Microsoft Outlook
Microsoft Excel
Microsoft PowerPoint
Privacy
Pharmacy
Health Care
Insurance
Life Insurance
Recruiting
Authorization
Employment Authorization
Job Details
Costco IT is responsible for the technical future of Costco Wholesale, the third largest retailer in the world with wholesale operations in fourteen countries. Despite our size and explosive international expansion, we continue to provide a family, employee centric atmosphere in which our employees thrive and succeed.
This is an environment unlike anything in the high-tech world and the secret of Costco's success is its culture. The value Costco puts on its employees is well documented in articles from a variety of publishers including Bloomberg and Forbes. Our employees and our members come FIRST. Costco is well known for its generosity and community service and has won many awards for its philanthropy. The company joins with its employees to take an active role in volunteering by sponsoring many opportunities to help others.
Come join the Costco Travel IT family. Costco IT is a dynamic, fast-paced environment, working through exciting transformation efforts. We are building the next generation retail environment where you will be surrounded by dedicated and highly professional employees.
The Site Reliability Engineer (SRE) will be responsible for maintaining and improving the availability, performance, scalability, and maintainability of applications at Costco Travel. They will translate Costco's goals and strategies for system availability, performance, and capacity into designs and plans for technical solutions. The SRE will also work with other Costco teams to identify upcoming events that could affect demand on system performance and prepare mitigation plans. They will work with teams and System Architects to implement, maintain, and validate disaster recovery plans and other solutions to avoid or mitigate service interruptions. The SRE will monitor Incidents and their resolution and support to identify trends and concerns. The SRE will create and disseminate system reliability reports to Costco management in support of planning and decision making.
If you want to be a part of one of the worldwide BEST companies "to work for", simply apply and let your career be reimagined.
ROLE
Applies their deep understanding of software design to detect and diagnose issues before they cause outages or performance degradation. Anticipates potential performance issues based on design decisions, code patterns, and expected system load.
Tracks system performance, capacity, and uses experience to create effective strategies for maintaining and improving system performance and availability.
Uses APM/monitoring tools such as Dynatrace/Splunk and browser tools to perform request purepath/waterfall analysis to identify bottlenecks and suggest improvements. Sets up comprehensive performance monitoring systems to track key metrics, identify trends, and proactively address performance regressions.
Identifies deficiencies within a product/application's code base and identifies opportunities to improve overall code quality. Experience tuning and maintaining the performance of systems is desirable.
Drives engineering best practices to deliver higher-quality and scalable solutions.
Identifies breaking points and works with Infrastructure and Feature teams to ensure stability and proactively scale.
Identifies, designs, develops, and deploys tools and processes to monitor, maintain, and report site performance and availability.
Prototypes and demonstrates mechanisms for performance improvement, high-availability, and system scaling.
Works closely with Infrastructure teams, Architects, Dev/QA, and Engineers to design, implement, manage, and secure scalable and reliable cloud infrastructure environments.
Enhances the confidence and safety of deploying changes across the applications in Costco Travel.
Enables teams to better understand and prepare for sudden spikes in traffic and other load scenarios, both at the application level and system level.
Develops new approaches to confidently assess whether a change causes negative impacts. Improves the efficacy of existing validation approaches.
Increases the depth and breadth of our performance testing tooling.
REQUIRED
7 years' demonstrated experience as a Site Reliability Engineer.
7 years' experience with design and development of client/server and/or web-based applications.
5 years' experience with Core Java, Java EE technologies (Servlet and JSP).
5 years' experience working as a React.js developer.
5 years' experience working with relational and no-SQL databases (Redis, MongoDB, and Graph).
5 years' experience working with cloud providers like Azure.
5 years of hands-on experience with APM tools such as Splunk APM and Dynatrace.
Experience in container technology; including Kubernetes (Tanzu, AKS, GKE) and Docker.
Strong understanding of CS fundamentals, distributed architectures, and microservice patterns.
Experience with browser-based debugging and performance testing software.
Experience with distributed systems including how to debug them.
Strong troubleshooting and problem-solving skills; strong interpersonal, verbal, and written communication skills; strong relationship builder in cross-functional teams.
Recommended
Curious and enjoys working on ambiguous problems where the solutions are not yet well-defined.
Enjoys collaborating with multiple teams and uses their communication skills to influence product direction.
Proficient in Microsoft Workspace applications, including Microsoft Teams, Outlook, Word, Excel, PowerPoint, etc.
Successful internal candidates will have spent one year or more on their current team.
Required Documents
Cover Letter
Resume
California applicants, please click here to review the Costco Applicant Privacy Notice.
Pay Ranges:
Level 3 - $130,000 - $160,000
Senior - $150,000 - $190,000, Bonus and Restricted Stock Unit (RSU) eligible
We offer a comprehensive package of benefits including paid time off, health benefits - medical/dental/vision/hearing aid/pharmacy/behavioral health/employee assistance, health care reimbursement account, dependent care assistance plan, short-term disability and long-term disability insurance, AD&D insurance, life insurance, 401(k), stock purchase plan to eligible employees.
Costco is committed to a diverse and inclusive workplace. Costco is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or any other legally protected status. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request to
If hired, you will be required to provide proof of authorization to work in the United States. Applicants and employees for this position will not be sponsored for work authorization, including, but not limited to H1-B visas.
This is an environment unlike anything in the high-tech world and the secret of Costco's success is its culture. The value Costco puts on its employees is well documented in articles from a variety of publishers including Bloomberg and Forbes. Our employees and our members come FIRST. Costco is well known for its generosity and community service and has won many awards for its philanthropy. The company joins with its employees to take an active role in volunteering by sponsoring many opportunities to help others.
Come join the Costco Travel IT family. Costco IT is a dynamic, fast-paced environment, working through exciting transformation efforts. We are building the next generation retail environment where you will be surrounded by dedicated and highly professional employees.
The Site Reliability Engineer (SRE) will be responsible for maintaining and improving the availability, performance, scalability, and maintainability of applications at Costco Travel. They will translate Costco's goals and strategies for system availability, performance, and capacity into designs and plans for technical solutions. The SRE will also work with other Costco teams to identify upcoming events that could affect demand on system performance and prepare mitigation plans. They will work with teams and System Architects to implement, maintain, and validate disaster recovery plans and other solutions to avoid or mitigate service interruptions. The SRE will monitor Incidents and their resolution and support to identify trends and concerns. The SRE will create and disseminate system reliability reports to Costco management in support of planning and decision making.
If you want to be a part of one of the worldwide BEST companies "to work for", simply apply and let your career be reimagined.
ROLE
Applies their deep understanding of software design to detect and diagnose issues before they cause outages or performance degradation. Anticipates potential performance issues based on design decisions, code patterns, and expected system load.
Tracks system performance, capacity, and uses experience to create effective strategies for maintaining and improving system performance and availability.
Uses APM/monitoring tools such as Dynatrace/Splunk and browser tools to perform request purepath/waterfall analysis to identify bottlenecks and suggest improvements. Sets up comprehensive performance monitoring systems to track key metrics, identify trends, and proactively address performance regressions.
Identifies deficiencies within a product/application's code base and identifies opportunities to improve overall code quality. Experience tuning and maintaining the performance of systems is desirable.
Drives engineering best practices to deliver higher-quality and scalable solutions.
Identifies breaking points and works with Infrastructure and Feature teams to ensure stability and proactively scale.
Identifies, designs, develops, and deploys tools and processes to monitor, maintain, and report site performance and availability.
Prototypes and demonstrates mechanisms for performance improvement, high-availability, and system scaling.
Works closely with Infrastructure teams, Architects, Dev/QA, and Engineers to design, implement, manage, and secure scalable and reliable cloud infrastructure environments.
Enhances the confidence and safety of deploying changes across the applications in Costco Travel.
Enables teams to better understand and prepare for sudden spikes in traffic and other load scenarios, both at the application level and system level.
Develops new approaches to confidently assess whether a change causes negative impacts. Improves the efficacy of existing validation approaches.
Increases the depth and breadth of our performance testing tooling.
REQUIRED
7 years' demonstrated experience as a Site Reliability Engineer.
7 years' experience with design and development of client/server and/or web-based applications.
5 years' experience with Core Java, Java EE technologies (Servlet and JSP).
5 years' experience working as a React.js developer.
5 years' experience working with relational and no-SQL databases (Redis, MongoDB, and Graph).
5 years' experience working with cloud providers like Azure.
5 years of hands-on experience with APM tools such as Splunk APM and Dynatrace.
Experience in container technology; including Kubernetes (Tanzu, AKS, GKE) and Docker.
Strong understanding of CS fundamentals, distributed architectures, and microservice patterns.
Experience with browser-based debugging and performance testing software.
Experience with distributed systems including how to debug them.
Strong troubleshooting and problem-solving skills; strong interpersonal, verbal, and written communication skills; strong relationship builder in cross-functional teams.
Recommended
Curious and enjoys working on ambiguous problems where the solutions are not yet well-defined.
Enjoys collaborating with multiple teams and uses their communication skills to influence product direction.
Proficient in Microsoft Workspace applications, including Microsoft Teams, Outlook, Word, Excel, PowerPoint, etc.
Successful internal candidates will have spent one year or more on their current team.
Required Documents
Cover Letter
Resume
California applicants, please click here to review the Costco Applicant Privacy Notice.
Pay Ranges:
Level 3 - $130,000 - $160,000
Senior - $150,000 - $190,000, Bonus and Restricted Stock Unit (RSU) eligible
We offer a comprehensive package of benefits including paid time off, health benefits - medical/dental/vision/hearing aid/pharmacy/behavioral health/employee assistance, health care reimbursement account, dependent care assistance plan, short-term disability and long-term disability insurance, AD&D insurance, life insurance, 401(k), stock purchase plan to eligible employees.
Costco is committed to a diverse and inclusive workplace. Costco is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or any other legally protected status. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request to
If hired, you will be required to provide proof of authorization to work in the United States. Applicants and employees for this position will not be sponsored for work authorization, including, but not limited to H1-B visas.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.