Lead, Site Reliability Engineer


Job Description / Skills Required

At Aera Technology, we are helping the largest enterprises in the world transform how they make and execute decisions with Decision Intelligence. Aera understands how your business works, makes real-time recommendations, predicts outcomes, and takes action autonomously. Our​ ​platform​ delivers the business agility required to respond to today’s ever-changing environment.
The SRE team supports the development, enhancement, and maintenance of the cloud infrastructure that our applications and services run on. Aera's SRE group manages the architecture and engineering of all environments from production and acceptance to sandbox and sales. The team develops infrastructure as code, monitoring solutions for the health, performance, and reliability of the Aera stack, and in general, “keeps the lights on” by providing tier III support for our 24/7 Platform Operations team. The SRE team is also on the front line of adopting and developing state-of-the-art infrastructure to continuously evolve the platform. 
The primary responsibilities for this role will be to use your background as an expert in Kubernetes and cloud-native infrastructure to work closely with our product engineering teams from the early stages of design through deployment as well as identification and resolution of production issues that relate to infrastructure. You will be responsible for working with our security teams to develop solutions that adequately protect Aera intellectual property and customer data and you will function as an escalation point for others to consult with and trust as well as a mentor for other team members. 
We are interested in considering every qualified candidate eligible to work in the United States. However, at present, we are not able to sponsor visas.


    • Design and development for the running and monitoring of Aera's production infrastructure including acting as a primary engineering contributor for our transformation into a Gitops driven, Kubernetes based platform
    • Explore, evaluate and integrate the latest, best of breed tooling and components used for modern Kubernetes deployment and management
    • Triaging and troubleshooting complex production issues to ensure reliability and performance
    • Identifying and automating manual processes
    • Continuously evolving our monitoring tools and platform
    • Promoting and applying best practices for building scalable and reliable services across engineering
    • Developing and maintaining technical documentation, runbooks, and procedures
    • Tier III support for a 24×7 online environment as part of an on-call rotation providing response to production incidents and participating in root cause analysis and problem management

About You

    • Bachelor degree or higher in Computer Science or related field is desired but not required
    • 7+ years of SRE/DevOps/infrastructure experience
    • 7+ years of experience deploying, operating and/or debugging server software on Linux at scale
    • 2+ years of hand on experience deploying, configuring and troubleshooting Kubernetes in production workloads
    • Hands on experience using Crossplane, Kustomize and/or Helm for Kubernetes deployment and management and/or vCluster for deploying virtual cluster is highly desirable
    • Experience automating and running large scale production Java services in AWS, Azure or other cloud providers
    • Advanced knowledge of configuration management and orchestration tools (Ansible, Terraform) and automating and streamlining tasks in an SRE/Operations engineering context using scripting languages such as Python, Go, Ruby, etc…
    • Experience with the use, maintenance and configuration of monitoring, metrics and logging infrastructure (ELK, Promethius/Grafana, Nagios, etc.)
    • Comfortable working with modern databases and big data platforms (SQL, etc.) MySQL automation a big plus
At Aera, we're on a mission to solve the biggest, most intractable challenges in the world of enterprise software. We envision the rise of the Self-Driving Enterprise: a more autonomously functioning business with a central operating system that connects and orchestrates business operations. Our Cognitive Operating System is increasingly used by the world's largest companies to fundamentally transform their organizations and how work is done.
If you share our passion for building the next generation of enterprise software, and deploying it for the most sophisticated customers in the world, you’ve met your match. Headquartered in Mountain View, California, we're growing fast, with teams in Mountain View and San Francisco (California), Bucharest and Cluj-Napoca (Romania), Paris (France), Munich (Germany), London (UK), Pune and Bangalore (India), Sydney (Australia) and Singapore.  So join us, and let’s build the future of work together!