Senior Site Reliability Engineer

Datto Norwalk , CA 90650

Posted 2 months ago

As the world's leading provider of cloud-based software and technology solutions delivered by managed service providers (MSPs), Datto believes there is no limit to what small and medium businesses can achieve with the right technology. Datto offers Unified Continuity, Networking, and Business Management solutions and has created a one-of-a-kind ecosystem of MSP partners. These partners provide Datto solutions to over one million businesses across the globe. Since its founding in 2007, Datto continues to win awards each year for its rapid growth, product excellence, superior technical support, and for fostering an outstanding workplace. With headquarters in Norwalk, Connecticut, Datto has global offices in the United Kingdom, Netherlands, Denmark, Germany, Canada, Australia, China, and Singapore. Learn more at datto.com.

We're looking for a motivated, self-starting, Sr. Site Reliability Engineer to help pioneer this role at Datto. The Sr. Site Reliability Engineer attaches to our Core Products Team, which maintains and develops new features for all of Datto's backup appliances (~75K devices and growing quickly). The backup device is a physical or virtual appliance that takes block-level backups of Windows, Mac, and Linux machines, turns them into raw disk images and stores them on a local ZFS-based disk array. In the case of a disaster, our customers restore these backups/disk-images instantly as KVM-based virtual machines, iSCSI targets, Samba shares, and many other formats. We also offer a virtual VMware/Hyper-V-based appliance and integrate with their hypervisors. We write code in modern Symfony-based PHP (with some Python and C++ sprinkled in), and we strongly rely on our Ubuntu-based Linux stack. We do amazing and exciting things every day, such as detecting when a VM has booted successfully, injecting drivers into the Windows registry before boot, and generating vmdk files on the fly. On top of that, we work with many low-level technologies, such as hypervisors and the ZFS filesystem. This is not your average PHP webdev gig! You will report to the Sr. Director of Software Engineering.

Does This Describe You:

You're a technical expert!

A Look Inside the Job:

  • Collaborate with Product and Software Development teams to determine the Core products reliability strategy including Service Level Objectives (SLOs) and Indicators (SLIs)

  • Guide product reliability improvement through monitoring, alerting, and application of software development best practices

  • Collect SLI metrics and establish monitoring based on SLO thresholds and other product requirements

  • Establish and configure transaction volume, traffic, performance, and error rate monitoring including alert thresholds, capacity planning, and performance impact analysis

  • You will participate in SRE software engineering, writing code for the continuing reduction of human intervention in operational tasks and automation of processes

  • Troubleshoot complex issues quickly and effectively

  • Develop a balanced on-call program with appropriate staffing

  • Communicate with Users, Support, and Development teams in the event of an incident

  • Diagnose and develop root cause solutions for failures and performance issues in our production environment

About You:

  • Bachelor's degree in Computer Science or equivalent experience

  • Strong root cause analysis and troubleshooting competency

  • Experience working with automation and data-driven analysis

  • Experience with OOP languages such as Java, PHP, C#, or C++

  • Solid understanding of Objection Oriented Programming fundamentals

Bonus Points:

  • Experience with distributed systems, hypervisors or file systems

Benefits:

  • At Datto, we believe our employees are our greatest asset and offer all full-time employees a wide-ranging benefits package, including:

Summary of benefits not showing up? View a summary here: Datto Benefits

By submitting an application, you acknowledge we will process your data in order to consider you for the position you apply for and for other open positions within our company for which you may be suited. We collect and store your data in accordance with our Recruiting Privacy Practices.

Datto is an equal opportunity employer.


icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Site Reliability Engineer

Datto

Posted 4 months ago

VIEW JOBS 2/7/2020 12:00:00 AM 2020-05-07T00:00 As the world's leading provider of cloud-based software and technology solutions delivered by managed service providers (MSPs), Datto believes there is no limit to what small and medium businesses can achieve with the right technology. Datto offers Unified Continuity, Networking, and Business Management solutions and has created a one-of-a-kind ecosystem of MSP partners. These partners provide Datto solutions to over one million businesses across the globe. Since its founding in 2007, Datto continues to win awards each year for its rapid growth, product excellence, superior technical support, and for fostering an outstanding workplace. With headquarters in Norwalk, Connecticut, Datto has global offices in the United Kingdom, Netherlands, Denmark, Germany, Canada, Australia, China, and Singapore. Learn more at datto.com. About You More than someone who checks every box, we're looking for people who are excited to work and grow at Datto. If that's you we hope you apply for the role! You enjoy teamwork You come with new ideas and a unique point of view. You look forward to collaborating with a diverse team. You eagerly seek and give help. Transparency tops your list of values, and you contribute to a culture of respect and inclusion. You're inquisitive Inquisitive and focused, you see every challenge as an opportunity. You would rather create the future than wait for it. You're customer-focused and take pride in your work. You put extra attention into details with all you do. You care about the work you provide to customers and how it reflects on yourself and Datto. When you find or see something wrong, you attempt to resolve it. You look for opportunities to not only better yourself, but others around you. You aim to be the best that you can be and always do the right thing. What You'll Do * Ensure the overall system reliability, uptime, health, and performance of Datto's SaaS Protection offering * Develop, deploy, and maintain the appropriate systems, services, and tooling in Datto's production environment that provides constant feedback to stakeholders inclusive of the core development team * Implement best practices promoting service availability/reliability and fault tolerance * Add the appropriate monitoring to the SaaS Protection platform which detects and alerts along with potential remediation recommendations and/or taking corrective action * Scale the SaaS Protection platform and reduce human intervention as needed by automating any repetitive operational activities and measuring normal operation of the platform * Collaborate with the Product and Software Development teams to determine the core products reliability strategy, including Service Level Objectives (SLOs) and Indicators (SLIs); ensure that service reliability best practices are a core tenet of all new software design and development * Collect SLI metrics and establish monitoring based on SLO thresholds and other product requirements * Troubleshoot complex issues quickly and effectively; continually improve processes and reliability based on post-mortem analysis * Participate in a rotational on-call program and enhance troubleshooting techniques and utilities to ensure quick resolution to service impacting issues * Communicate with Users, Support, and Development teams in the event of an incident * Diagnose and develop root cause solutions for failures and performance issues in our production environment About You: * Bachelor's degree in Computer Science or equivalent experience * Experience in software development, automation, infrastructure as code, and data-driven analysis * Experience with OOP languages such as Java, PHP, C#, or C++ * Solid understanding of Object-Oriented Programming fundamentals * Strong root cause analysis and troubleshooting competency * Ability to operate in the fast pace environment * Self-motivated & willing to learn * Ability to work independently and as part of a team * Excellent Communication Skills Note: We are looking only for candidates willing to join us directly as W2 employees (No 3rd party candidates) Benefits: * At Datto, we believe our employees are our greatest asset and offer all full-time employees a wide-ranging benefits package, including: Summary of benefits not showing up? View a summary here: Datto Benefits By submitting an application, you acknowledge we will process your data in order to consider you for the position you apply for and for other open positions within our company for which you may be suited. We collect and store your data in accordance with our Recruiting Privacy Practices. Datto is an equal opportunity employer. Datto Norwalk CA

Senior Site Reliability Engineer

Datto