Site Reliability Engineer

Syrinx Boston , MA 02108

Posted Yesterday

Site Reliability Engineer
Boston, MA
12-month contract-to-hire
Our Opportunity:
Site Reliability Engineers are a cross between system and software engineers who are responsible for all operational aspects of our clients ecommerce platform. The team is responsible for designing, building, monitoring, and maintaining the infrastructure of our internet-facing and internal services. We're looking for engineers who want to be a part of developing infrastructure software, maintaining it, and scaling the clients technology stack.
Ideal candidates will possess the ability to discuss complex technical concepts with a diverse audience across all areas of the organization. They will remain calm under pressure and always strive to add structure to high-pressure, fast paced tasks or projects.
What you'll do:
  • Focus on service stability and reliability by working with application owners to set SLOs, "Error Budget" and backup and DR strategies
  • Define application monitoring and alerting strategy
  • Perform capacity planning and production readiness assessment
  • Embed with product teams during the design and requirements phase of new product development through to initial production launch
  • Identify requirements for other operational teams (release engineering, automation, etc.) during application development phase
  • Be a technology and Devops evangelist for the rest of the company
  • Participate in on-call rotation for level 3 support escalations
  • What you'll need:
  • At least 5 years of experience working in an SRE role or similar.
  • Hands on experience with orchestration and system configuration tools such as Ansible, Puppet, Chef, Terraform, etc.
  • Expert in building and maintaining highly available applications including redundancy, fail over, scalability, monitoring and performance.
  • Strong experience with virtualization, monitoring and automation.
  • Software development experience (both scripting and programming languages).
  • Experience working with open source community (troubleshooting, patch submission, etc.).
  • Demonstrated 5+ years of Linux System Administration.
  • Experience with CI tools such as Bamboo, Jenkins, Hudson.
  • Ability to organize, troubleshoot and continuously learn.
  • Previous experience working within controls such as SOX, PCI, etc.
  • This position requires travel.
icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Site Reliability Engineer

Withings

Posted 4 days ago

VIEW JOBS 12/2/2019 12:00:00 AM 2020-03-01T00:00 <p><strong>Company Overview</strong></p><p>Withings revolutionized connected health by launching the world's first Wi-Fi scale in 2009. Since then, we’ve become known for innovative devices which pair timeless design and advanced sensing capabilities. Our award-winning ecosystem includes the world’s first activity tracking analog wristwatch, an advanced sleep-tracking mat, and medically accurate devices for precise and effortless blood pressure and body temperature monitoring. Our mission is to bring the power of health and activity data into your everyday life, so you can stick around longer for your loved ones.</p> <p><strong>Job Summary</strong></p> <p>We are seeking a well qualified, highly motivated candidate to join our DevOps team as Site Reliability Engineer (SRE). The DevOps team is responsible for ensuring that our platform is fast and stable for the millions of active devices it serves around the globe, while remaining agile and scalable in order to meet future demand. We accomplish this through adherence to principles of observability, automation, and choosing the right tool to tackle each problem.</p> <p>To optimize performance and efficiency, we use a hybrid baremetal+cloud infrastructure, controlling as much of the stack as we reasonably can. We adapt our platform and database architecture very frequently to support and enable our growth.</p> <p>Day-to-day, responsibilities and duties may include:</p><ul> <li>Supporting the availability and speed of our production applications</li> <li>Solving alerts and decreasing manual tasks by increasing automation</li> <li>Database management (debugging, upgrades)</li> <li>Improvement of continuous integration pipelines</li> <li>Web-services troubleshooting and performance improvement</li> <li>Additional operational responsibilities</li> </ul><p><strong>Requirements</strong></p><ul> <li>Servers: Ubuntu (KVM, LXC and physical host)</li> <li>Cloud: AWS, GCP and OVH</li> <li>Databases: Cassandra/ScyllaDB, PostgreSQL, MySQL, Riak, Redis Cluster</li> <li>Configuration Management: Ansible, Terraform</li> <li>Languages: Python, PHP and Bash</li> <li>Bachelor’s Degree or higher in Computer Science (or equivalent experience)</li> <li>Must have a valid passport and be able to travel internationally up to 10% of the time</li> </ul> <p>Leading candidates will understand and adhere to the principles of site reliability engineering (shared ownership, work reduction through automation, operations through software) and be ready to enthusiastically meet the challenges of supporting high performance, high availability applications in a 24/7 real-time, heavy traffic environment. If that sounds like you, please get in touch!</p><p><strong>Benefits</strong></p><ul> <li>Health Care Plan (Medical, Dental &amp; Vision)</li> <li>Retirement Plan (401k)</li> <li>Life Insurance (Basic, Voluntary &amp; AD&amp;D)</li> <li>Paid Time Off (Vacation, Sick &amp; Public Holidays)</li> <li>Family Leave (Maternity, Paternity)</li> <li>Short Term &amp; Long Term Disability</li> <li>Training &amp; Development</li> <li>Free Food &amp; Snacks</li> <li>Wellness Reimbursement</li> <li>Healthcare &amp; Dependent Care FSA</li> <li>Commuter FSA</li> <li>Bike-to-work benefit</li> </ul> Withings Boston MA

Site Reliability Engineer

Syrinx