Site Reliability Engineer

Withings Boston , MA 02111

Posted 4 weeks ago

Company Overview

Withings revolutionized connected health by launching the world's first Wi-Fi scale in 2009. Our award-winning ecosystem includes beautifully designed, easy to use connected devices for monitoring blood pressure, weight, activity, sleep, temperature, and more.

Our devices are now used in diabetes prevention and weight-loss programs, remote patient monitoring, and university-led clinical studies. They are key enabling technologies to support our partners strategies, providing the accuracy, reliability, and portability they need in order for their programs to be successful. Join us in our mission of preventive health!

Job Summary

We are seeking a skilled and experienced candidate to join our Platform Operations team as Site Reliability Engineer (SRE). The Platform Operations team is responsible for ensuring that our platform is fast and stable for the millions of active devices it serves around the globe, while remaining agile and scalable in order to meet future demand. We accomplish this through adherence to principles of observability, automation, and choosing the right tool to tackle each problem.

To optimize performance and efficiency, we use a hybrid baremetal+cloud infrastructure, controlling as much of the stack as we reasonably can. We adapt our platform and database architecture very frequently to support and enable our growth.

Day-to-day, responsibilities and duties may include:

  • Supporting the availability and speed of our production applications
  • Solving alerts and decreasing manual tasks by increasing automation
  • Database management (debugging, upgrades)
  • Improvement of continuous integration pipelines
  • Web-services troubleshooting and performance improvement
  • Additional operational responsibilities

Requirements

  • Servers: Ubuntu (KVM, LXC and physical host)
  • Cloud: AWS, GCP and OVH
  • Databases: Cassandra/ScyllaDB, PostgreSQL, MySQL, Riak, Redis Cluster
  • Configuration Management: Ansible, Terraform
  • Languages: Python, PHP and Bash
  • Bachelors Degree or higher in Computer Science (or equivalent experience)
  • Must have a valid passport and be able to travel internationally up to 10% of the time

Leading candidates will understand and adhere to the principles of site reliability engineering (shared ownership, work reduction through automation, operations through software) and be ready to enthusiastically meet the challenges of supporting high performance, high availability applications in a 24/7 real-time, heavy traffic environment. If that sounds like the challenge you are looking for, please get in touch!

Benefits

  • Health Care Plan (Medical, Dental & Vision)
  • Retirement Plan (401k)
  • Life Insurance (Basic, Voluntary & AD&D)
  • Paid Time Off (Vacation, Sick & Public Holidays)
  • Family Leave (Maternity, Paternity)
  • Short Term & Long Term Disability
  • Training & Development
  • Free Food & Snacks
  • Wellness Reimbursement
  • Healthcare & Dependent Care FSA
  • Commuter FSA
  • Bike-to-work benefit
icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Site Reliability Engineer

Shift Media

Posted 7 days ago

VIEW JOBS 11/22/2021 12:00:00 AM 2022-02-20T00:00 <p>**This position may be fully remote or work out of our Boston, MA Headquarters**</p><p><br></p><p>As an SHIFT Site Reliability Engineer, you will work with the software engineers and security team to build reliable, high capacity and high-performance infrastructure in support of our business-critical applications.</p><p>If you know AWS/GCP services inside out, have solid networking experience, and you like engineering solutions to solve site reliability and operations problems, please apply!</p><p>Responsibilities:</p><ul><li>Hands-on design, analysis and troubleshooting of production systems</li></ul><ul><li>Ownership of reliability, uptime, capacity, and performance analysis thereof</li></ul><ul><li>Ensuring the repeatability, traceability, and transparency of our infrastructure automation</li></ul><ul><li>Identifying highest-impact opportunities to optimize existing systems</li></ul><ul><li>System design consulting for teams seeking to leverage or improve their production infrastructure</li></ul><ul><li>Anticipate, build and plan capacity for upcoming product/feature launches</li></ul><p>Skills:</p><ul> <li>Mastery of AWS services (IAM, EC2, S3, EBS/EFS, ELB/ALB, AutoScaling, RDS and replication techniques, VPC, Subnets, Elastic IP, Route53, CloudWatch, CloudFront, Lambda, CloudFormation, ECS, SNS, ElastiCache, EKS, NAT gateways)</li> <li>Expertise in container/container-fleet-orchestration technologies (like Docker, Kubernetes, AWS EKS, GKE)</li> <li>Expertise in designing and manage escalation response plans from monitoring, react, respond, remediate and retrospect in culturally aligned (proactive, customer focused, collaborative, data-driven and AUTOMATED) ways</li> </ul><ul><li>Strong skills in reading, understanding and writing code in at least two of: Javascript, Python, Bash, Java</li></ul><ul><li>Expertise with continuous-deployment software development lifecycles in the Cloud (GitLab Pipelines, Heroku Pipelines, HitHub Actions)</li></ul><ul><li>Cloud database operations and deployment experience (RDS MySQL/Postgres/Aurora), caching operations &amp; deployments (Redis, ElastiCache)</li></ul><ul><li>Familiarity with site and infrastructure monitoring systems (CloudWatch, Datadog, New Relic, Sumologic, Splunk)</li></ul><ul> <li>Expertise with SDLC branching, SCM, and code deployment systems (Git/Gitflow, GitLab, etc.)</li> <li>Experience with MongoDB and ElasticSearch databases</li> <li>Strong Linux knowledge and experience, OS level debugging (distributions: Alpine, Ubuntu, Debian, Amazon Linux 2)</li> </ul><p><strong>Requirements</strong></p><p>Skills:</p><ul> <li>Bachelors Degree and three years of experience preferred </li> <li>Mastery of AWS services (IAM, EC2, S3, EBS/EFS, ELB/ALB, AutoScaling, RDS and replication techniques, VPC, Subnets, Elastic IP, Route53, CloudWatch, CloudFront, Lambda, CloudFormation, ECS, SNS, ElastiCache, EKS, NAT gateways)</li> <li>Expertise in container/container-fleet-orchestration technologies (like Docker, Kubernetes, AWS EKS, GKE)</li> <li>Expertise in designing and manage escalation response plans from monitoring, react, respond, remediate and retrospect in culturally aligned (proactive, customer focused, collaborative, data-driven and AUTOMATED) ways</li> </ul><ul><li>Strong skills in reading, understanding and writing code in at least two of: Javascript, Python, Bash, Java</li></ul><ul><li>Expertise with continuous-deployment software development lifecycles in the Cloud (GitLab Pipelines, Heroku Pipelines, HitHub Actions)</li></ul><ul><li>Cloud database operations and deployment experience (RDS MySQL/Postgres/Aurora), caching operations &amp; deployments (Redis, ElastiCache)</li></ul><ul><li>Familiarity with site and infrastructure monitoring systems (CloudWatch, Datadog, New Relic, Sumologic, Splunk)</li></ul><ul> <li>Expertise with SDLC branching, SCM, and code deployment systems (Git/Gitflow, GitLab, etc.)</li> <li>Experience with MongoDB and ElasticSearch databases</li> <li>Strong Linux knowledge and experience, OS level debugging (distributions: Alpine, Ubuntu, Debian, Amazon Linux 2)</li> </ul> Shift Media Boston MA

Site Reliability Engineer

Withings