System Reliability Engineer - Compute Platform

Bloomberg New York , NY 10007

Posted 1 week ago

A Service Reliability Engineer (SRE) at Bloomberg is a hybrid of systems and software engineering who is trusted to improve the stability and availability of the production environment through automation. They are responsible for monitoring, provisioning / configuration / orchestration, capacity management, deployment and rollback, incident management, and SDLC practices.

The Compute Platform team is responsible for providing the bare metal infrastructure on which all of Bloomberg's applications and services reside. Our team is trusted to engineer a hardware platform which maximizes server performance on a standardized hardware configuration. We are also entrusted to architect the platform for tomorrow by partnering with industry leading vendors and thoroughly evaluating leading hardware for inclusion in Bloomberg's compute infrastructure. As a Compute Platform SRE you will solve challenging technology problems by building architecturally sound, high-quality platforms that enable Bloomberg to exceed critical business objectives.

What's in it for you?

You'll work with modern open-source tooling while maintaining mission-critical systems hosting a wide array of applications. We'll depend on you to advise on design, architecture, and scaling of Compute Platform Specifications for a wide array of internal customers and infrastructure platforms. In addition, you'll play a critical role in improving the stability of existing hardware platforms to ensure quality, stability, and scalability of Bloomberg's applications and services.

You'll Need to Have

  • Demonstrated experience programming and testing Python, Ruby, Go, or C/C++

  • Experience working in a 24/7 production engineering organization

  • Ability to listen, communicate, evaluate, problem solve, multi-task, and prioritize in a high-pressure, mission-critical, and rewarding team environment.

We'd Love to see

  • Deep expertise troubleshooting complex distributed systems

  • Experience with creating and improving documented procedures and/or playbooks

  • Working knowledge of Chef, Puppet, Ansible, or Salt

  • Familiarity with open source configuration, orchestration, and CI/CD tools

  • Deep understanding of TCP/IP and Unix networking

  • Knowledge of Linux or Windows internals

If this sounds like something you would be passionate about apply! We'll get in touch with you to let you know what the next steps are.

Bloomberg is an equal opportunities employer, and we value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
System Reliability Engineer Vault


Posted 1 week ago

VIEW JOBS 12/6/2019 12:00:00 AM 2020-03-05T00:00 The Bloomberg Vault Cloud team, whose platform processes over 300 million messages daily, and archives 90+ billion objects, is looking for a Site Reliability Engineer. You will be working to define and improve our entire compute and web infrastructure. As we are beginning to rearchitect our platform, you will have the opportunity to make a grassroots impact. We need your help ensuring our systems are reliable, which includes scaling with the ever-increasing flow of enterprise data. What we are working on: * Building a robust monitoring platform as we migrate from a legacy big data and web application platform to one built on top of Bloomberg managed cloud services (Kafka, Spark, Zookeeper, etc., all "as a service") * Supporting key Vault end-user applications with extensive end-to-end monitoring services that provide metrics against our Service Level Objectives * An overhaul of the greater Vault department to an SRE mindset, with a focus on customer-centric metrics while reducing KTLO and tech debt We'll trust you to: * Work with the development teams to highlight recurring issues; you will ensure these are addressed across all application teams in a consistent way * Use your excellent SDLC skills to identify and optimize development and engineering practices throughout the organization. * Automate away manual processes * Help us establish Service Level Objectivess and Service Level Indicators that we can use to measure our quality as an organization, and contribute to engineering projects aimed at ensuring we meet those standards You'll need to have: * Experience developing full-time and are comfortable with multiple languages * Experience with automation/configuration management systems like Chef, Puppet, or Ansible * Experience with monitoring and logging analysis metrics tools * Confidence working with Linux * Excellent communication skills and the ability to effectively collaborate with developers We'd love to see: * Prior SRE experience or are excited about the field * Experience with team transformation to focus on system reliability, with a focus on applying software engineering principles to systems management * Experience operating and deploying Continuous Integration and Continuous Deployment (CI/CD) environments * Knowledge of Cloud Native Applications and Infrastructure, including Docker and Kubernetes Bloomberg is an equal opportunities employer and we value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. Bloomberg New York NY

System Reliability Engineer - Compute Platform