Senior Site Reliability Engineer

Trust and Security
Boston, Massachusetts
Full-Time

Senior Site Reliability Engineer

Trust and Security
Boston, Massachusetts
Full-Time

Who we are

Stavvy is transforming how business is conducted remotely by making complex legal and financial transactions easier, safer, and more accessible to all. Whether we are working to enable title companies to facilitate remote closings in a safe way, better connecting lenders with the businesses they use during the home buying process, or building the next set of tools for the platform, Stavviators (our employees) are disruptors at heart. Our team is constantly iterating, solving problems, and working together to simplify life's defining moments. If you want to help power the paperless revolution, join us at Stavvy!

Who you are

You are a curious and skilled Site Reliability Engineer with a proven track record of successfully managing and scaling a live production environment. You have proficient hands-on experience working with centralized logging or Elasticsearch, standing up new services, scripting, and diagnosing server and application issues with monitoring tools such as Prometheus or Grafana. You’re fluent in scripting with Python and Bash and possess a strong knowledge of working with Cloud providers and Infrastructure as Code (AWS and Terraform). You also have DevSecOps experience, such as CICD pipelines with CircleCI, React, Cypress , Docker, ECS, Kubernetes, or HashiCorp Vault. You’re collaborative and team-oriented with at least 4 years of demonstrated experience as a Site Reliability Engineer. We’d be super impressed if you also have a background in Security or Corporate IT!

What the role is

In this Senior Site Reliability Engineer role, you will play an integral role in helping Stavvy achieve operational excellence for engineering. You’ll be responsible for building out infrastructure required to support a Public API and accompanying developer documentation for our existing products, as well as building out a framework for deploying and rolling back service. You’ll be working closely with the Product team, backend engineering, and tech leadership to achieve increased observability of all production errors and a stable deployment and rollback/recovery process. You’ll have the unique opportunity to make things scalable and do it right early on.

In this role you will:

  • Own the build out of infrastructure required to support a Public API
  • Help build a self-service platform for developers to trigger their own deployments and respond to failures
  • Improve monitoring and alerting to detect outages and problems
  • Increase visibility into our system with observability tools
  • Participate in design reviews and take ownership of projects
  • Triage issues on deployment and post-release
  • Expand CICD pipelines to facilitate the development of more automated testing
  • Identify and develop areas for automated remediation to system and security events and vulnerabilities

Our office is also currently located in Newton, MA with an anticipated move date to Boston this coming November. We currently have limited capacity in-office due to the Covid-19 pandemic. We are currently accepting candidates who live near or are looking to move within the commutable vicinity of Boston, MA. This job will start remote but be expected to be in office when the new location opens to foster collaboration with team members and fellow Stavviators.

What We Offer:

Apply to PositionBack to Careers