Sorry, this job is no longer accepting applications. See below for more jobs that match what you’re looking for!

Hadoop Site Reliability Engineering

Expired Job

Apple Inc. Austin , TX 78719

Posted 5 months ago

Job Summary

Imagine what you could do here. At Apple, new ideas have a way of becoming phenomenal products, services, and customer experiences very quickly.

Every single day, people do amazing things at Apple. Do you want to impact billions of users by developing an extraordinary product with a prime focus on accuracy, understandability and performance of the product? Bring passion and dedication to your job and there's no telling what you could accomplish.Come help us build the next generation cloud platform to support internet services across Apple.

Our platform server engineering team develops and deploys software which forms the foundation for some of our most exciting services, including iCloud, Maps, iTunes, and more. Our software ensures that Apple's services are reliable, scalable, fast, and secure. We support both open source and homegrown technologies to provide internal Apple developers with the best possible platform.

In this role you will have the rare opportunity to own and deliver some of the world's largest-scale cloud services.At Apple, we manage services that support millions of users. This brings an entirely new dimension to the word "Scale". If you have dreamt about controlling, managing, and scaling thousands of servers, then read on!

Key Qualifications

Experience supporting hosted services in a high-volume customer facing environment.Dev-ops skills in Java, Python, Ruby or UNIX shells, C, C++.Experience with Hadoop, Cloudera, Pig, Splunk or other large data frameworks.Experience with Puppet, Chef, or Ansible for configuration management.Experience with relational databases such SQL, DB2, HBase. Background building distributed, server-based infrastructure supporting a high volume of transactions in a high-demand environment.Superb communication skills and able to work closely with operations and development teams.You have demonstrated ability to work on small, focused teams to complete critical achievments with tight deadlines.You aim to take initiative and own issues.


Working in Apple's Hadoop SRE team is the definition of variety. The team provides infrastructure and support for large Internet facing applications by building and maintaining Hadoop clusters and application stack.

On any given day, you may find one member of the team fault-finding low-level driver issues, while another is analyzing a highly distributed workflow. As a team member, you will be empowered to work across many organizational units to improve and expand our current deployment. This may include:- Design and implement new software to streamline system automation.- You have the ability to plan future requirements, or a new project.- Triage production issues when they occur with other operational teams.- Conduct ongoing maintenance across our large scale deployments across the world.- Participate in new builds which back new releases.


BS or MS in CS or equivalent

upload resume icon
See if you are a match!

See how well your resume matches up to this job - upload your resume now.

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Intern Product Development VMC On AWS Site Reliability Engineering (Austin TX)

Vmware, Inc.

Posted 7 days ago

VIEW JOBS 1/10/2019 12:00:00 AM 2019-04-10T00:00 Business Summary: VMware is a global leader in cloud infrastructure and business mobility. VMware accelerates customers' digital transformation journey by enabling enterprises to master a software-defined approach to business and IT. With VMware solutions, organizations are creating exceptional experiences by mobilizing everything, responding faster to opportunities with modern data and apps hosted across hybrid clouds, and safeguarding customer trust with a defense-in-depth approach to cybersecurity. At the core of what we do are our people who deeply value execution, passion, integrity, customers, and community. Do you dare to do the stuff you've always dreamed about? Dare to explore at Job Role: Ensure that VMware Cloud on AWS operates at high reliability and performance at scale with minimum human touch for our customers. The VMC on AWS Site Reliability Engineering team is looking for quality software developers with a diverse set of experiences and skill sets to build and run the exciting new VMWare Cloud on AWS services. As a SDDCaaS SRE developer you will provide service insight, response, and service management to maintain high service reliability with low touch through extensible services/platforms, standardized processes, data insights, and product input. Intern Responsibilities: Responsibilities for Service Health, Service Management, Orchestration, and Remediation & Troubleshooting: As an intern you'll be joining our diverse team you will be responsible for the VMC on AWS service and all aspects of it in production including the user experience. This includes designing and developing solutions to improve service monitoring, availability, performance, and security. You will build services that enrich monitoring and automation through data analytics and applied tooling (ML, Clustering, anomaly detection, AI, etc.). Through Service Response (Incident management, problem management, and participation on the globally staffed Service Watch) you will use metrics and health systems to ensure performance, scalability, and reliability. You will ensure proper metrics are implemented to measure service health and drive error budgeting. Through partnerships you foster with the development teams you will support new features, services, releases, and become an authority in our services. You will focus on building solutions to better operate our services at scale: auto remediation, reducing manual intervention during production incidents, service metrics, monitoring, process automation, data integrity, and service turn-up/ turn-down. Requirements: * Experience engineering, operating, troubleshooting, administrating and scaling online services * Proficient in at least one of the following languages: Java, Python, Go, or Ruby * At least one of the following specialties: storage, networking, systems or nonlinear distributed systems * Strong communication and interpersonal skills * Excellent troubleshooting, critical thinking, and data analysis skills * Systematic problem solving approach, coupled with a strong sense of ownership and drive * Able to balance multiple tasks and projects effectively and quickly adapt to new variables * Professional and open-minded attitude * Be part of a 7x24 service watch rotation, using a follow the sun model Minimum Qualifications: * Pursuing a BS or MS in Computer Science or related technical field * Experience in DevOps, Operations Engineer, or SRE (development for large online services) in projects or hand on work experience. * Experience building and operating highly available and scalable infrastructure solutions in projects or hand on work experience. * Experience in one or more programming languages like: Java, Python, Go * A tenacious ability to diagnose and fix performance and reliability problems Preferred qualifications: * Experience with container orchestrators (Kubernetes, Docker Native Orchestration, Mesos, Docker Swarm). * Experience with NoSQL technologies (e.g. Cassandra, MongoDB, Redis, etc.) and/or search-based datastores and libraries (Lucene, Solr, etc.) * Experience with configuration management tools such as Puppet, or Chef * Experience in VMware products, specifically Cloud related solutions such as: vSphere, vCenter, ESXi, vSAN or contending cloud solutions and products. * Experience with one or more of the following: Data Engineering, Machine Learning, Clustering, Anomaly detection, A/B testing * Operational experience with networking (WAN or LAN) and an understanding of network theory * Understanding of Unix/Linux systems from kernel to shell and beyond, taking in system libraries, file systems, and client-server protocols * You are able to document and version APIs We're looking for an intern in the following location for this opportunity: Austin, TX Vmware, Inc. Austin TX

Hadoop Site Reliability Engineering

Expired Job

Apple Inc.