A Successful Health care Tech Company that is a leader in cancer research is looking for a SRE to join their team and help them accomplish their mission to improve lives by learning from the experience of every cancer patient.
What You'll Do
In this role, you'll work with the TechOps organization to accelerate their mission to improve cancer care and learn from patient experiences by ensuring that their technical infrastructure and staff maintain the highest levels of reliability, performance, and agility. You'll provide best practice guidance on reliability and scalability to their engineering teams. As a member of their SRE teams you will have a key role in scaling their technology platforms and empowering our development teams to consume them frictionlessly. In addition, you'll also:
- Design and build infrastructure & systems that provide high levels of scalability, reliability, and performance, while balancing security, maintainability, and operational excellence.
- Interface across teams to codify and reliably test infrastructure changes using their software development lifecycle
- Partner with product and application teams to provide guidance and best practices around scalability, reliability, and performance of our productions systems, infrastructure, and software
- Actively participate in code and configuration reviews
- Craft solid and clearly explained designs, playbooks, and documentation, for consumption by teammates and the larger engineering organization
- Improve operational efficiency through automation and deployment or development of new tools
- Be proactive in performance & availability monitoring; provide remediations for systemic issues
- Ingest requirements, scope work, produce estimates and help define deliverables with project timelines
- Actively participate in on-call duties
- Work as a team on escalations, resolving critical issues that impact our high SLA production systems
Who You Are
You're a Site Reliability Engineer with 4+ years of experience working in a devops or software engineering role. You're excited by the prospect of rolling up your sleeves to tackle meaningful problems each and every day. You’re a kind, passionate and collaborative problem-solver who seeks and gives candid feedback, and values the chance to make an important impact.
- You have experience writing simple, readable, useful code, especially for operational tooling
- You have experience with cloud environments such as AWS, Azure, or GCP
- You have experience working with a production environment with high uptime requirements and measurable SLAs
- You are familiar with container technologies such as Docker, Kubernetes or Mesos
- You are proficient with configuration management, orchestration, and infrastructure-as-code tools such as Ansible and Terraform
- You have demonstrated the ability to deliver high-quality, on-time solutions that are reliable, scalable, and maintainable
- You are a strong communication skills and ability to work effectively across multiple business and engineering teams
- You prefer working in a dynamic environment, comfortable challenging the status quo
- You have the ability to adjust quickly to changing priorities and make quick decisions with limited information
- You believe that a team working well together is truly smarter than the single smartest person on that team