Senior Site Reliability Engineer

Job Description / Skills Required

Requisition ID: R14870
You became an engineer because you believed in technology’s ability to make a difference in the world. So why would you spend your days building things that don’t matter? At Groupon, we spend our days developing tools, platforms and experiences that help small businesses thrive in their local communities. We may look like an ordinary ecommerce app, but under the surface we’re using cutting edge technology to build products that regularly positively impact the lives of millions of our customers and merchants across the globe.

Of course, local merchants aren’t the only ones who will benefit from your work—you will too. We are looking for great engineers excited by helping us build and maintain Groupon’s multiple platforms. We are using great technologies and best practices, in a high-volume, highly virtualized Linux/KVM/Dockers/NGINX environment.

Our team services 62Million customers and merchants, integrating over 600 geo-distributed services/platforms serving North America, LATAM, EMEA, and APAC eCommerce websites and their supporting business/marketing services.

We’re currently on the lookout for a Senior Site Reliability Engineer, a pivotal technical leadership role that integrates Site Reliability Engineering and Solutions Architecture with ITIL-based change and incident management. In this role you’ll wear multiple hats and responsibilities as an Incident Commander, change manager, and a senior technical resource responsible for preventing, identifying, triaging, documenting, investigating, mitigating, and recovering from site/service impacting incidents across Groupon’s 600+ globally dispersed services.

You'll be responsible for assessing, approving and scheduling risky changes, load testing, and maintenance windows, and for coordinating and driving Incident Reviews, best practices, and overseeing Problem Management (Service Actions, Top Ops issues). You will be responsible for global site availability and reliability and for identifying and resolving all site/business impacting events worldwide.

We’re looking for an individual who enjoys solving complex problems, who can act independently, and who can stay calm and focused in high stress situations – while driving paths to mitigation and restoration utilizing SMEs and teams across the entire global Groupon development and operations environments.

Responsibilities Include:

Prioritize the focus of the Global Systems Engineering Center staff and SRE resources for both routine and significant site events, planned maintenance windows, or risky changes.
Review, approve and schedule all risky changes and maintenance window activities
Take ownership of all site or service impacting events until they are mitigated/recovered, or handed off, including all documentation and action items
During a crisis or service impacting event, lead the effort with SOC, SRE and OPS/Development SMEs to triage, investigate, mitigate, and recover
Manage real-time communications during service outages with both technical and non-technical audiences
Follow-up with service owners on Incident Review actions items, change approvals, and general requests for assistance
Help develop policies and procedures that improve overall production stability and evangelize Best Practices to the rest of the company
Drive and/or participate in daily Site Status meetings and Incident Reviews to prevent incidents and and improve overall product quality and stability
Foster relationships with development teams and technology leaders across the company
The ideal candidate:

5 years + experience in a similar position (desktop support / DevOps) , ideally within a large production eCommerce environments
Expert knowledge in a high-volume, highly virtualized Linux/KVM/Dockers/NGINX environment
A strong and persuasive personality, a broad scope of knowledge with expertise in two or more major knowledge areas (networking, storage, kernels, database/object-store, caching/proxying, SOA API, message buses)
Prior experience in project management and ticketing work
What you get in return

Experience working for one of the fastest growing ecommerce companies in the world with unparalleled career opportunities
Comprehensive induction to the business at our Sydney HQ and ongoing training
Work in our awesome break-out area, Free Groupon credits and discounts on deals, Birthday Leave, Weekly Friday night drinks & awards, Charity & Community initiatives, Social Club and much, much more!
Interested? We would love to hear from you….

Groupon provides a global marketplace where people can buy just about anything, anywhere, anytime. We’re enabling real-time commerce across an expanding range of categories including local businesses, travel destinations, consumer products, and live or lively events. At the same time, we are providing advertising options and tools that merchants can use to grow and manage their businesses. Culturally, we believe that great people make great companies and that starting with the customer and working backward moves us forward. Community matters to us on an internal, local and global scale—it’s fundamental to our company’s growth and to the well-being of the world at large. We also value self-awareness, candor, lunch and WiFi. If we match with you, please apply to join us.