Senior Site Reliability Engineer

Istreamplanet Atlanta , GA 30301

Posted 2 months ago

While we have offices in Seattle & Las Vegas, we are seeking the best talent to join our team and will consider REMOTE/TELEWORK options for candidates located anywhere in the US.


The Principal Site Reliability Engineer is responsible for leading cross-team engineering discussions to achieve scalable, measurable, fault-tolerant, and cost-effective cloud services. Connects Product and Operations teams with Engineering teams to identify and exceed business-critical service KPIs. A core team member of one or more projects where work is based on wide range of complex problems and deliverables. Responsible for developing and planning own or project team's activities and establishing service level agreements across iStreamPlanet services teams. Independently determines and develops approach to solutions and timelines. A technical leader accountable for cross-team operational excellence program, end-to-end data awareness across solutions, active post-mortem culture with cross-team learning, and continuous improvement in iStreamPlanet's service reliability.


  • Partners with peer engineering teams throughout the full software development lifecycle.

  • Design, analyze, and troubleshoot fault-tolerant, distributed systems. Provide fellow engineering teams with systems design and scalability expertise.

  • Continually improve customer outcomes through quantitative service monitoring, alarming, and direct code improvements to our services.

  • Contribute expertise in Unix/Linux operating systems internals, administration, and TCP/IP networking.

  • Promote a cross-team culture of operational excellence and customer obsession within iStreamPlanet.

  • Provide technical oversight and mentoring to other team members.

  • Establish and implement cross-team disaster recovery contingencies with at least quarterly validation with external and internal stakeholders.

  • Establish and implement cross-team service availability key performance indicators in partnership with Product Management.

  • Establish and implement cross-team service error budgets, consisting of Service Level Indicators, Service Level Objectives, and Service Level Agreements in partnership with Engineering and Product Management leadership.

  • Partner with Engineering Directors and Product Management to develop quarterly service level agreements for iStreamPlanet's engineering teams.

  • Establish and implement a monthly lecture series for iStreamPlanet team members that targets distributed computing fundamentals, service analytics, and fault-tolerant design patterns.


  • Bachelor's degree in Computer Science or equivalent experience

  • 10+ years of commercial software development experience in distributed cloud services.

  • Strong understanding of one or more industry-standard languages (e.g. Go/C/C++/C#/Java/Swift/Python).

  • Experience working with Open Source solutions.

  • Strong experience with algorithms, data structures, complexity analysis, and software design.

  • Strong design skills, including design patterns and common software frameworks.

  • Passion for dev-ops, continuous improvement, nurturing a sustainable post-mortem culture, and commitment to driving down live-site overhead using a systematic problem-solving mindset, strong collaboration skills, and an eagerness to take ownership and drive.

  • Ability to obsess over customer needs and demonstrate customer empathy.

  • Proven ability to work and problem solve independently/collaboratively, to organize workload and priorities, high-quality execution, technical innovation/adoption, and initiative.

Nice to Have:

  • Master's degree in Computer Science or equivalent experience.

  • Experience with audio/video solutions using DirectX, DirectShow, MediaFoundation, or similar media pipelines


  • Flexible work hours and work from home options

  • Accessible and transparent leadership team

  • Paid time off every year to volunteer and generous parental leave

  • Medical, dental, vision benefits, 401(k) plan with a company match

  • Part of the WarnerMedia family of powerhouse brands


iStreamPlanet is one of the largest streaming platforms in the world for broadcasters; doing thousands of events a year such as March Madness, World Cup, and even the Olympics. You probably have not heard of us but we power some of the most well-known media brands in the world. Our mission is to provide a one stop platform for the media industry as they convert from traditional broadcast to all online streaming over the next decade. We are backed by giants such as WarnerMedia (owners of Warner Brothers, HBO, Turner, etc.) and are at the heart of changing how you get your sports and entertainment in the future.

iStreamPlanet Co., LLC is an equal employment opportunity employer. iStreamPlanet does not discriminate against any applicant or employee based on race, color, religion, national origin, gender, age, sexual orientation, gender identity or expression, marital status, mental or physical disability, and genetic information, or any other basis protected by applicable law.

icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Senior Site Reliability Engineer

NCR Corporation

Posted 2 weeks ago

VIEW JOBS 9/30/2019 12:00:00 AM 2019-12-29T00:00 About NCR NCR Corporation (NYSE: NCR) is a leading software- and services-led enterprise provider in the financial, retail, hospitality, telecom and technology industries. NCR is headquartered in Atlanta, Ga., with 34,000 employees and does business in 180 countries. NCR is a trademark of NCR Corporation in the United States and other countries. As a Site Reliability Engineer you own and are responsible for designing, building, improving and maintaining the pipeline that will involve our development, test, staging and production environments. You will play a key role in executing on our continuous integration & continuous delivery ecosystem vision including automated deployment and tear-down, automated testing, source control integration and lab environment management. Experience with Docker, Kubernetes, GCP and AWS is highly recommended. * Collaborate with developers to design, implement, evolve and support applications in our secure and highly-available, multi-tenant platform. * We take pride in the continuous delivery of high-quality, scalable and maintainable applications and infrastructure. * Bridge and Own the union between development, quality, security and operations. * Own systems administration and the pipeline from software development to production. * Be passionate about the security, quality and uptime of the systems that power our platform's infrastructure, as well as all aspects of configuration management and automation – from the code repos though deployment, to production uptime. * DevOps Blueprint/Cookbook Design, creation and execution. * Responsible for maintaining / patching servers supporting SaaS products. This includes Windows Servers, Linux Servers running in in-house Datacenters and/or using cloud PaaS providers (AWS, Azure) * Position responsible for performing application upgrades for multiple NCR SaaS products. * Helps develop standards, procedures and guides for managing servers and applications for security and high availability applications running in PaaS environments. * Develop / Monitor dashboards to detect problems related to application, infrastructure and potential security incidents on daily basis. * Participate in Disaster recovery planning and execution * Participate in Audit of working procedures and making changes to meet statutory regulations * Work with a geographically distributed software engineering teams to support the applications Qualification and Expectations: * 5+ years experience deploying and supporting high traffic, scalable web applications/services * 5+ years experience administering Linux, including shell-scripting & command-line tools * 3+ years experience with cloud virtualization and PaaS * 2+ years experience with AWS * 1+ years experience with Docker, Kubernetes and OpenShift * Experience with GCP * Experience architecting, reviewing, and supporting complex, highly-available infrastructure * Experience with orchestration, automation, and configuration management tools like Terraform, Ansible, Puppet, Chef, Spinnaker, or related technology) * Excellent analysis, debugging, root-cause identification, and troubleshooting skills * Experience managing application servers (Java, Python, nginx, apache, Redis, RabbitMQ, etc.) * Experience with administering relational databases in a production environment (esp. PostgreSQL) * Experience with server containerization (esp. Docker) * Skill, experience and interest in applying software development practices to systems administration * Experience in using secured remote access tools (e.g. RDP, SSH/PuTTY) in managing Windows or Linux servers remotely * Expertise in deploying Application Server such as Spring Boot, JBoss, or IIS in multi-tier architecture. * In-depth understanding of TCP/IP LAN/WAN networking technologies and troubleshooting techniques * A background in automating the management of a data center environment * Experience in Java application troubleshooting tools (AppDynamics, Dynatrace etc) * Experience with IT Frameworks (COBIT / ITIL / PCI Data Center Requirements etc) * Experience with hardware or software based firewalls, load balancers and proxy servers * Experience with intrusion detection systems and network and server security hardening * Excellent organizational skills * Customer-service oriented with good written/oral communication skills * Ability to work with minimal supervision, making decisions based upon priorities, schedules and an understanding of business initiatives. * Critical attention to detail, thoroughness and documentation * Providing rotational on call support EEO Statement Integrated into our shared values is NCR's commitment to diversity. NCR is committed to being a globally inclusive company where all people are treated fairly, recognized for their individuality, promoted based on performance and encouraged to strive to reach their full potential. We believe in understanding and respecting differences among all people. NCR does not discriminate in employment based on sex, age, race, color, creed, religion, national origin, disability, sexual orientation, veteran status, military service, genetic information, or any other characteristic or conduct protected by law. Every individual at NCR has an ongoing responsibility to respect and support a globally diverse environment. Statement to Third Party Agencies To ALL recruitment agencies: NCR only accepts resumes from agencies on the NCR preferred supplier list. Please do not forward resumes to our applicant tracking system, NCR employees, or any NCR facility. NCR is not responsible for any fees or charges associated with unsolicited resumes. NCR Corporation Atlanta GA

Senior Site Reliability Engineer