Toast is driven by building the restaurant platform that helps restaurants adapt, take control, and get back to what they do best: building the businesses they love.
At Toast, our Site Reliability Engineers (SREs) are responsible for keeping all customer-facing services and other Toast production systems running smoothly. SREs are a blend of pragmatic operators and software craftspeople who apply sound software engineering principles, operational discipline, and mature automation to our environments and our codebase. Our decisions are based on instrumentation and continuous observability as well as through predictions and capacity planning.
About this roll* (Responsibilities)
- Provide technical leadership in incident resolution and production triage to maintain the world-class reliability and uptime of our platform (30%)
- Leverage a strong understanding of Cloud Architecture
- Knowledge of Java and the JVM (Java Virtual Machine) to triage and understand issues within services
- Build and own a world-class observability technology stack that allows rapid detection of issues in our system and enables root cause analysis (20%)
- Provide scalable metrics and dashboarding solutions for R&D
- Provide distributed tracing capabilities to visualize and track issues across our complex system
- Provide log aggregation and insights for R&D using best in class technology
- Provide a global view of the true customer experience through usage of Real-User Monitoring & external cloud-based solutions
- Build a platform to enable service resilience testing/chaos engineering to validate and test Toast’s architecture is resilient to failure. Build and own a performance testing framework/environment to enable our R&D teams to understand the constraints of their services and track over time degradations in performance (20%)
- Meet our uptime targets and strive to improve the way we measure the reliability of the system (10%)
- Mentor and coach peers & leaders in R&D on reliability concerns (20%)
Do you have the right ingredients*? (Requirements)
- Bachelor's degree in computer science, engineering, or related field
- Polyglot technologist/generalist with a thirst for learning
- Deep understanding of cloud and microservice architecture, and the JVM
- Experience with tools such as APM, Terraform, Ansible, GitHub, Jenkins, Docker
- Experience developing software or software projects, ideally using Go, Python or Java
- Extensive and broad industry experience with at least 5 years in SRE and/or DevOps roles
Our Spread of Total Rewards
- Uncapped PTO
- Sabbatical opportunity after five years
- Professional Development Reimbursement Program
- Commitment to Employee Wellness through resources such as a quarterly Wellness Stipend
- Various peer and company recognition programs
- Mental Health Benefits
*Bread puns encouraged but not required
We are Toasters
Diversity, Equity, and Inclusion is Baked into our Recipe for Success.
At Toast our employees are our secret ingredient. When they are powered to succeed, Toast succeeds.
The restaurant industry is one of the most diverse industries. We embrace and are excited by this diversity, believing that only through authenticity, inclusivity, high standards of respect and trust, and leading with humility will we be able to achieve our goals.
Baking inclusive principles into our company and diversity into our design provides equitable opportunities for all and enhances our ability to be first in class in all aspects of our industry.
Bready* to make a change? Apply today!
Toast is committed to creating an accessible and inclusive hiring process. As part of this commitment, we strive to provide reasonable accommodations for persons with disabilities to enable them to access the hiring process. If you need an accommodation to access the job application or interview process, please contact [email protected].