As a Site Reliability Engineer (SRE), you'll help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems.
Much of our support and software development focuses on optimizing existing systems, building infrastructure and reducing work through automation.
You’ll join a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks. In this environment, you’ll take the lead on relevant projects, supported by an organization that provides the support and mentorship you need to learn and grow.
As an SRE, you’ll be focused on running better production applications and systems.
Develop, test, and debug automated tasks (Apps, Systems, Infrastructure)
Troubleshoot priority incidents, facilitate blameless post-mortems
Work with development teams throughout the software life cycle ensuring sustainable software releases
Perform analytics on previous incidents and usage patterns to better predict issues and take proactive actions
Build and drive adoption for greater self-healing and resiliency patterns
Lead and participate in performance tests; identify bottlenecks, opportunities for optimization, and capacity demands
Participate in the 24x7 support coverage as needed
Bachelor’s degree or equivalent experience in a software engineering discipline
Mastery in at least two or more software languages (e.g. Python, Java, Go, etc.) with respect to designing, coding, testing, and software delivery
Adept in the development of automated tools, systems, and services in multiple technology domains
Advanced knowledge of one or more infrastructure components (e.g. networking, cloud services, orchestration tools, containerization, compute, and storage systems)
Proficiency in service-level changes to a system and troubleshooting components