Site Reliability Engineer, SEA

Mixpanel Seattle , WA 98113

Posted 2 weeks ago

About Mixpanel:

Mixpanel is helping the world learn from its data by translating user behavior into actionable knowledge. As the leading user analytics platform, Mixpanel tracks interactions to surface information that provides valuable insights which enable businesses to make smarter decisions, break down knowledge silos and drive data-informed innovation. Mixpanel is headquartered in San Francisco with offices in New York, Seattle, Salt Lake City, London, and Singapore to serve their 26,000+ customers including 30% of the Fortune 100.

Site Reliability Engineering at Mixpanel is a team of hybrid software/systems engineers targeting large-scale advances in operational efficiency and maturity, reliability, automation, and a deeper understanding of the dependencies inherent in our infrastructure. With numerous large clusters in multiple regions across the planet, and with 50 Billion data points flowing in every month, the goal of SRE is to enable continuous growth while decreasing operational overhead and increasing performance, all while maintaining our high levels of reliability and data resilience.

About the role:

If you're the kind of person who flourishes with the challenge of writing automation, debugging errant services, going deep into the Linux kernel, end-to-end performance tuning, all while serving hundreds of thousands of events every second of every day, we'd love to get to know you.

  • Use your experience as a coder to build tools, services, automation.

  • Take charge of performance and reliability measurement, and drive improvements based on data.

  • Run critical operational portions of our ingestion, analytics, storage, and serving infrastructure.

  • Use your knowledge of open source tools and available services to maintain the highest-levels of uptime across our myriad services, and building the portions that don't yet exist.

  • Own configuration management, monitoring, alerting, data resilience, and critical measurement infrastructure.

  • Manage our fleet of clusters, distributed globally.

We're looking for someone who:

  • Solid experience with Linux (we run Ubuntu).

  • Solid coding experience, and a passion for solving hard problems through automation and resilient software.

  • A hunger for solving not just the symptoms, but the root cause of any issues that arise.

  • Industry experience in running a multi-node environment is useful (minimum 2 years is mandatory), but your attitude and proven abilities are more important.

Culture Values:

  • Be Open: When knowledge becomes open, we can come together as a team to collaborate around a shared purpose.

  • Customer Focus: Our customers' success is our success.

  • Lead Change: Everyone at Mixpanel has the capacity to make an impact on the business.

  • Results Oriented: Driving results in a measurable way ensures we stay focused on the highest impact initiatives.

  • One Team: We can't win without each other.

Why choose Mixpanel?

A clear market leader in the product analytics space, Mixpanel has raised $77M from world-renowned VC firms like Andreessen-Horowitz, Sequoia, and YC, and our revenue has grown significantly since then. Our ambitious, collaborative team makes it possible by finding creative solutions to new challenges with scaling, reliability, design, and customer service.

Mixpanel is an equal opportunity employer supporting workforce diversity. We actively encourage women, people with disabilities, veterans, underrepresented minorities, and LGBTQ+ people to apply. We do not discriminate on the basis of race, religion, color, national origin, gender, gender identity or expression, sexual orientation, age, marital status, veteran status, or disability status. Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Sr Site Reliability Engineer


Posted Yesterday

VIEW JOBS 4/24/2019 12:00:00 AM 2019-07-23T00:00 Site Reliability Engineer (Senior)<br />  <br /> McGraw-Hill Education is a digital learning company that draws on its more than 100 years of educational expertise to offer solutions which improve learning outcomes around the world. The Company has offices across North America, India, China, Europe, the Middle East and South America, and makes its learning solutions available in more than 65 languages. For additional information, visit<br />  <br /> The next generation of our digital products are delivering engaging, adaptive, and personalized learning experiences to optimally support every student. We are hiring a <strong>Site Reliability Engineer </strong>who will work with system and software engineers to build reliable, high capacity and high-performance systems in support of our mission to reimagine learning for millions of students and learners worldwide. This position will be located at our <strong>Seattle, WA </strong>office.<br />  <br /> We aim to break down walls between development and operations; participate in finding and building solutions which enable teams to deliver software updates in a way that is highly stable and operationally sound. We are strongly invested in the AWS Cloud, infrastructure-as-code, and monitoring-as-code. We favor the practical and pragmatic over the ideal, including finding right-sized solutions. We are anticipatory and forward-looking, reliable, and have a bias toward taking action. We understand that without our customers our efforts are worthless, and that operational changes are likely to have a direct impact on user experience. We understand that uptime is paramount, and we work backwards from there.<br /> <br /> <strong>Essential Accountabilities:</strong> <ul> <li style="padding: 0; margin: 0;">The ability to collaborate with product teams and technical principals to prioritize our efforts.</li> <li style="padding: 0; margin: 0;">Hands-on design, understanding, and troubleshooting of highly-distributed, large-scale production systems — both modern and legacy, monolithic and micro.</li> <li style="padding: 0; margin: 0;">Co-ownership with the development teams over reliability, uptime, capacity, and performance.</li> <li style="padding: 0; margin: 0;">Ensuring the repeatability, traceability, and transparency of our infrastructure automation including alignment with MHE standards and best practices for operational excellence.</li> <li style="padding: 0; margin: 0;">Identifying highest-impact opportunities to optimize existing systems; ensuring “right-sized” solutions in consideration of technical and business constraints.</li> <li style="padding: 0; margin: 0;">System design consulting for teams seeking to leverage or improve their production infrastructure.</li> <li style="padding: 0; margin: 0;">Anticipate, build, and plan capacity for upcoming product/feature launches.</li> <li style="padding: 0; margin: 0;">Working with application teams and product principals to fully operationalize software/systems projects (including security requirements), delivered on-time and within budget.</li> <li style="padding: 0; margin: 0;">Stay current on industry trends; conceive and present to management ways to improve current practices, to improve our standing in the marketplace, and remain on the cutting edge of technology.</li> <li style="padding: 0; margin: 0;">Mentor team members; foster growth by setting high-reaching goals; providing support as needed to achieve them.</li> </ul> <strong>Required:</strong> <ul> <li style="padding: 0; margin: 0;">3 years of experience as a software application engineer.</li> <li style="padding: 0; margin: 0;">3 years of experience as a system/release engineer.</li> <li style="padding: 0; margin: 0;">5 years of experience with the foundational AWS services: EC2, RDS, and S3.</li> <li style="padding: 0; margin: 0;">3 years of experience with the supporting AWS services (e.g., SQS, SNS, SES, CloudWatch, ElastiCache, Lambda).</li> <li style="padding: 0; margin: 0;">1 year of integrating continuous-integration and continuous-delivery software development lifecycles (i.e., CI/CD) into one or more applications (using Jenkins, Circle CI, or other modern CI tools).</li> <li style="padding: 0; margin: 0;">3 years of infrastructure and/or system configuration automation technologies (e.g., Terraform, AWS CodeDeploy, Puppet, Ansible, Chef).</li> <li style="padding: 0; margin: 0;">3 years of experience in container and orchestration technologies (e.g., Docker, Vagrant, etcd, Consul, Zookeeper).</li> <li style="padding: 0; margin: 0;">3 years of experience with Linux-in-the-cloud, with at least 1 year of “Enterprise Linux” distributions (e.g., RHEL, CentOS, Amazon Linux).</li> <li style="padding: 0; margin: 0;">1 year of experience with cloud database operations and deployment experience (e.g., RDS MySQL, RDS PostgreSQL, Amazon Aurora); caching operations & deployment experience (e.g., Memcache, Redis).</li> <li style="padding: 0; margin: 0;">3 years of experience with monitoring applications and infrastructure; familiarity with common monitoring systems (e.g., CloudWatch, Datadog, New Relic, Sumo Logic).</li> <li style="padding: 0; margin: 0;">Strong problem-solving, root cause understanding, and systems engineering skills.</li> <li style="padding: 0; margin: 0;">Ability to design and manage escalation response plans — from monitoring, to reaction/response/remediation, to retrospection/post-mortem in culturally-aligned (proactive, customer focused, collaborative, proven-with-data) ways.</li> <li style="padding: 0; margin: 0;">Demonstrated expertise building and managing highly-scaled production infrastructure in the cloud (AWS required; GCP, Azure, OpenStack a plus).</li> <li style="padding: 0; margin: 0;">Excellent presentation and communication skills.</li> <li style="padding: 0; margin: 0;">B.S. Degree in Computer Science (or related technical field, or equivalent industry experience).</li> </ul> <strong>Nice to Have:</strong> <ul> <li style="padding: 0; margin: 0;">Being able to translate between development, operations, security, product, and management dialects is a highly-sought skill.</li> <li style="padding: 0; margin: 0;">Ability to translate knowledge and ideas into written-word as documentation.</li> <li style="padding: 0; margin: 0;">Cloud and container-native Linux administration/build/management skills (e.g., AMIs, Packer).</li> <li style="padding: 0; margin: 0;">Expertise with Lean/Agile deployment processes (e.g., blue/green, zero downtime, canary, and DNS strategies).</li> <li style="padding: 0; margin: 0;">MHE is a polyglot organization. Being “conversational” in JavaScript/TypeScript, Python, PHP, Ruby, Golang, Java, Bash, Markdown, reStructuredText, HCL, JSON, YAML, and TOML would be valuable. Being fluent in 2-3 of them would be a huge plus.</li> <li style="padding: 0; margin: 0;">Expertise with software development lifecycle branching and distributed source code management systems (e.g., Git/Mercurial, Git-Flow, GitHub-Flow).</li> <li style="padding: 0; margin: 0;">A non-trivial background in open source is a huge plus.</li> </ul> Syrinx Seattle WA

Site Reliability Engineer, SEA