Site Reliability Engineer

Kensho Cambridge , MA 02138

Posted 3 weeks ago

At Kensho, we hire talented people and give them the autonomy and support needed to build amazing technology and products. To do this, we look for people who insist on a bias towards action to minimize unhelpful hierarchy and process. We collaborate using our teammates' diverse perspectives to solve hard problems. Our communication with one another is open, honest and efficient. We dedicate time and resources to explore new ideas and, as a result, we produce technology that is scalable, robust, and useful.

As a Site Reliability Engineer (SRE) at Kensho, you are a thoughtful, collaborative, and dynamic technologist who loves building the infrastructure that helps others do their jobs more effectively and efficiently. In this role you will have to ensure that Kensho's services both internally critical and external facing ones have reliability and uptime based on user's expectations. You will be working closely with our team of Infrastructure and Application engineers to come up with scalable solutions.

What You'll Do

  • Run and stabilize Kensho's production services that support critical financial applications and backend processes.

  • Monitor, maintain and help scale services that are integrated into S&P's platform.

  • Manage end-to-end availability and performance of critical services and build automation to prevent problem recurrence. Add, tune and maintain alert configurations and documentation as needed.

  • Design and build advanced automated operational and deployment frameworks alongside tooling and infrastructure to help engineering teams measure and increase their velocity.

  • Cultivate full-team participation in high quality, thoughtful software.

  • Participate in on-call / production support rotation as required.

What We Look For

  • 3+ years of experience

  • Experience engineering and supporting production services in a modern, containerized cloud environment

  • Expertise in scalable testing, automation, continuous integration frameworks and best practices

  • Desire to build a strong, operationally minded engineering culture

  • Practical understanding of algorithms, data structures, and design patterns

  • Advocate best practices for documentation, communication, testing and configuration

  • Thoughtful and collaborative code reviewer and teammate

How To Really Get Our Attention

  • Major technical contributor at a top 10 software company

  • Your open source projects show innovation and initiative

  • Experience with supporting a high throughput production platform

  • Research, publications, and patents

Technologies We Like

  • Kubernetes, HAProxy, Jenkins, Git, Docker

  • Prometheus, AlertManager, Kibana, Grafana

  • Elasticsearch, Postgres, Kafka

Benefits & Perks

  • Medical, Dental, and Vision insurance - 100% company paid premiums

  • Unlimited Paid Time Off

  • 18 weeks of 100% paid Parental Leave (paternity and maternity)

  • 401(k) plan with 6% employer matching

  • Generous company matching on donations to non-profit charities

  • Up to $20,000 tuition assistance

  • Plentiful snacks, drinks, and regularly catered lunches

  • Dog-friendly office (CAM office)

  • In-office gyms and showers (CAM, DC) or Equinox membership (LA, NYC)

  • Stipend towards commuter or gym reimbursement

  • Bike sharing program memberships

  • Compassion leave and elder care leave

  • Mentoring and additional learning opportunities

  • Opportunity to expand professional network and participate in conferences and events

About Kensho

Kensho uses machine learning, artificial intelligence, natural language processing and data visualization techniques to solve some of the hardest analytical problems and create breakthrough financial intelligence solutions for our parent company, S&P Global.

Kensho was founded in 2013 by Harvard & MIT alums and was acquired by S&P Global in 2018 for $550M, the largest FinTech AI acquisition ever. Kensho continues to operate as a startup in order to maintain our distinct, independent brand and to promote our breakthrough, innovative culture. Our team of Kenshins enjoy a dynamic and collaborative work environment that runs autonomously from S&P, while leveraging the unparalleled breadth and depth of data and resources available as part of S&P Global.

As Kenshins, we pride ourselves on maintaining an innovative culture that depends on diversity and inclusion. We are an equal opportunity employer that welcomes future Kenshins with all experiences and perspectives.

Kensho is headquartered in Cambridge, MA, with offices in New York City, Washington D.C. and Los Angeles.

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, or national origin.

icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Site Reliability Engineer


Posted 3 months ago

VIEW JOBS 10/7/2019 12:00:00 AM 2020-01-05T00:00 <p><strong>Company Overview</strong></p><p>Withings revolutionized connected health by launching the world's first Wi-Fi scale in 2009. Since then, we’ve become known for innovative devices which pair timeless design and advanced sensing capabilities. Our award-winning ecosystem includes the world’s first activity tracking analog wristwatch, an advanced sleep-tracking mat, and medically accurate devices for precise and effortless blood pressure and body temperature monitoring. Our mission is to bring the power of health and activity data into your everyday life, so you can stick around longer for your loved ones.</p> <p><strong>Job Summary</strong></p> <p>We are seeking a well qualified, highly motivated candidate to join our DevOps team as Site Reliability Engineer (SRE). The DevOps team is responsible for ensuring that our platform is fast and stable for the millions of active devices it serves around the globe, while remaining agile and scalable in order to meet future demand. We accomplish this through adherence to principles of observability, automation, and choosing the right tool to tackle each problem.</p> <p>To optimize performance and efficiency, we use a hybrid baremetal+cloud infrastructure, controlling as much of the stack as we reasonably can. We adapt our platform and database architecture very frequently to support and enable our growth.</p> <p>Day-to-day, responsibilities and duties may include:</p><ul> <li>Supporting the availability and speed of our production applications</li> <li>Solving alerts and decreasing manual tasks by increasing automation</li> <li>Database management (debugging, upgrades)</li> <li>Improvement of continuous integration pipelines</li> <li>Web-services troubleshooting and performance improvement</li> <li>Additional operational responsibilities</li> </ul><p><strong>Requirements</strong></p><ul> <li>Servers: Ubuntu (KVM, LXC and physical host)</li> <li>Cloud: AWS, GCP and OVH</li> <li>Databases: Cassandra/ScyllaDB, PostgreSQL, MySQL, Riak, Redis Cluster</li> <li>Configuration Management: Ansible, Terraform</li> <li>Languages: Python, PHP and Bash</li> <li>Bachelor’s Degree or higher in Computer Science (or equivalent experience)</li> <li>Must have a valid passport and be able to travel internationally up to 10% of the time</li> </ul> <p>Leading candidates will understand and adhere to the principles of site reliability engineering (shared ownership, work reduction through automation, operations through software) and be ready to enthusiastically meet the challenges of supporting high performance, high availability applications in a 24/7 real-time, heavy traffic environment. If that sounds like you, please get in touch!</p><p><strong>Benefits</strong></p><ul> <li>Health Care Plan (Medical, Dental &amp; Vision)</li> <li>Retirement Plan (401k)</li> <li>Life Insurance (Basic, Voluntary &amp; AD&amp;D)</li> <li>Paid Time Off (Vacation, Sick &amp; Public Holidays)</li> <li>Family Leave (Maternity, Paternity)</li> <li>Short Term &amp; Long Term Disability</li> <li>Training &amp; Development</li> <li>Free Food &amp; Snacks</li> <li>Wellness Reimbursement</li> <li>Healthcare &amp; Dependent Care FSA</li> <li>Commuter FSA</li> <li>Bike-to-work benefit</li> </ul> Withings Cambridge MA

Site Reliability Engineer