Sorry, this job is no longer accepting applications. See below for more jobs that match what you’re looking for!

Site Reliability Engineer

Expired Job

IBM Corporation Cambridge , MA 02138

Posted 2 months ago

Job Description:

IBM maintains the largest corporate network in the world and it is highly critical to the success of our business. The CIO Network Engineering organization is looking for highly skilled and motivated individuals to further improve our network resiliency, quality, diversify our technology stack and embrace next generation networking. These changes will enable faster, smarter decision-making and deliver speed to value. The network engineering tools and automation squads are focused on driving automation, analytics and tooling into the greater organization to give the operation and deployment visibility and insights into the operations of the global network.

We're looking for a Site Reliability Engineer, in Cambridge, Massachusetts. Site Reliability Engineers take a different approach it comes to hosting and managing infrastructure, services, and applications. We approach this as a software problem instead of as an operations problem. Applying software engineering practices to hosting and managing enables our services to better adapt to all types of changes and failure scenarios. To help us accomplish this approach we use the latest techniques, practices and technology from the industry while hosting large-scale mission critical infrastructure. Some of the techniques, practices, and technology we use are: Docker, IaaS, PaaS, ChatOps, Continuous Delivery, Continuous Deployment, DevOps, and Immutable Infrastructure. We strive to keep learning and improving, and we work to share the knowledge we have learned throughout IBM.

Our project is a highly visible connectivity layer between IBM and the Cloud, used by many groups and service providers in IBM to connect their internal and cloud based assets and users. We are not just building a highly automated build process with regression testing, monitoring and deployment promotion, but also a service that customers can also use within their own DevOps pipeline to deploy their cloud solutions.

This is an important technical role that will require participation in an evolving culture, designed to deliver software solutions from different teams into a continually available environment. Ultimately, your work will decide if code drops are ready to deploy to production and help ensure that, if a deployment fails, that it "fails small and recovers quickly".

Job Duties:

  • Design and Implement automated solutions for rolling out our teams applications from test to stage to production

  • Extend the DevOps tools we have with custom written models, to adapt them to specific team needs

  • Drive requirements for our team's code base to made continuous deployment easier, and help implement those requirements

Must have the ability to work in the US without current/future need for IBM sponsorship

A day in the life at IBM

  • Throughout the day, you will collaborate with your teammates and interact with our product owners all while being based out of our Cambridge, MA office.

  • Participate and/or lead in our lunch and learn sessions.

  • Take a break and have fun by participating with other IBM'ers in collaborative video games.

  • Take advantage of our exercise room, which includes cardio and weights.

  • Work in an open environment where creativity is welcome and encouraged.

  • Staying relevant to emerging trends in areas related to DevOps and Cloud.

  • Seek recognition by attaining such awards as "Extremely Smart Person" and "Meritorious Coding before Caffeine."

Required Education


Employment Type


Preferred Education

Bachelor's Degree

See if you are a match!

See how well your resume matches up to this job - upload your resume now.

Find your dream job anywhere
with the LiveCareer app.
Download the
LiveCareer app and find
your dream job anywhere

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Systems Java Developer / Site Reliability Engineer

Boston Human Capital Partners, Inc.

Posted Today

VIEW JOBS 11/16/2018 12:00:00 AM 2019-02-14T00:00 <span style="font-size:18px;"><strong>Systems Java Developer / Site Reliability Engineer</strong></span><br /> <br /> Onshape ( is a well-funded mid-size startup developing a completely new Computer Aided Design (CAD) platform, delivered globally as a SaaS, for professional mechanical designers and engineers. Think SolidWorks meets GitHub meets Google Docs. Our office is at One Alewife Center, Cambridge and a very short walk from the Alewife “T” Station.<br /> <br /> We have a fantastic team here of ~100 people. You will be working on a one-of-a-kind 3D collaborative CAD application with cutting edge cloud, web, and mobile technologies. Join an exciting and growing startup with responsibilities for building, operating and scaling our global CAD service. Use your Java development and Linux systems skills to design and implement new functionality, ensure reliable performance and maintain security for the next generation of 3D designed products.<br /> <br /> <strong>Responsibilities:</strong> <ul> <li style="padding: 0; margin: 0;">Successful candidate will be a self-motivated software development professional, comfortable working in a complex code base on a fast-moving team.</li> <li style="padding: 0; margin: 0;">Work as part of the technical operations team responsible for deploying, operating and scaling our global SaaS CAD product.</li> <li style="padding: 0; margin: 0;">Design, implement, test, and deliver maintainable, performant Java code for the Onshape CAD product itself and internal developer/operations tools.</li> <li style="padding: 0; margin: 0;">Work closely with team members to review each other’s designs and implementations.</li> <li style="padding: 0; margin: 0;">Work as part of an agile engineering organization that uses continuous integration and agile methods to deploy new production releases every 3 weeks.</li> <li style="padding: 0; margin: 0;">Has the ability to find the balance between perfection and getting the job done.</li> <li style="padding: 0; margin: 0;">Actively seeks out problems to be solved and is willing to look for new ways of solving old or hard problems.</li> </ul> <br /> <strong>Preferred Skills and Experience:</strong> <ul> <li style="padding: 0; margin: 0;">5+ years experience in building large-scale, distributed systems in Java.</li> <li style="padding: 0; margin: 0;">Experience with deployment and troubleshooting on Linux platforms (Ubuntu specifically).</li> <li style="padding: 0; margin: 0;">Attention to detail with an eye for efficiency, scalability and maintainability.</li> <li style="padding: 0; margin: 0;">Substantial experience with Java 8+ and frameworks like Spring / Guice.</li> <li style="padding: 0; margin: 0;">Strong foundation in computer science, with strong competencies in data structures, algorithms and distributed computing.</li> </ul> <strong>Optional Skills and Experience:</strong> <ul> <li style="padding: 0; margin: 0;">Experience with technologies like MongoDB, RabbitMQ, Elasticsearch, ZooKeeper, AWS (EC2, S3, VPC, IAM, CloudFormation).</li> <li style="padding: 0; margin: 0;">Experience with security technologies like SAML, Google SSO and Spring Security.</li> <li style="padding: 0; margin: 0;">Understanding of SDLC, security vulnerability identification / remediation and attack surfaces.</li> <li style="padding: 0; margin: 0;">Experience with configuration management tools like Puppet and Ansible.</li> <li style="padding: 0; margin: 0;">Experience with monitoring and alerting platforms.</li> <li style="padding: 0; margin: 0;">Experience with metrics collection, analysis and performance tuning.</li> </ul> <br /> <strong>Location:</strong> <ul> <li style="padding: 0; margin: 0;">Cambridge, MA (Alewife)</li> </ul> Boston Human Capital Partners, Inc. Cambridge MA

Site Reliability Engineer

Expired Job

IBM Corporation