Senior Database Reliability Engineer Dbre (Remote)

Fireeye Boston , MA 02298

Posted 1 week ago

FireEye is the leader in intelligence-led security-as-a-service. Working as a seamless, scalable extension of customer security operations, FireEye offers a single platform that blends innovative security technologies, nation-state grade threat intelligence, and world-renowned Mandiant consulting. With this approach, FireEye eliminates the complexity and burden of cyber security for organizations struggling to prepare for, prevent, and respond to cyber attacks. FireEye has over 7,000 customers across 67 countries, including more than 45 percent of the Forbes Global 2000.

FireEye is seeking a Database Reliability Engineer (DBRE) to help manage, operate and scale FireEye's Data Platform. Reporting directly to the Data Engineering Leader, the DBRE will be responsible for keeping the data layer systems that support user-facing services running smoothly 24/7/365.

DBREs are a blend of database engineering, administration gearheads and software crafters that apply best practice engineering principles, operational discipline and mature automation, specializing in databases (PostgreSQL and Cassandra in particular). In that capacity, DBREs are peers to SREs and bring data layer expertise to the SRE, Infrastructure and engineering teams.

The Cloud Data Engineering team's responsibilities include:

  • Provide Databases as a Service to Product Engineering teams

o RDBMS, Cassandra, Elastic Search, Kafka

  • Provide guidance and best practices on how to design DB Schemas for Cloud Scale

  • Maintain and support the Data Science Infrastructure

  • Populate the Data Science datalake from various product data sources

Responsibilities

  • Work on the data layer reliability and performance for FireEye' cloud eco-system

  • Work on observability of relevant database metrics and make sure we reach SLO

  • Work with peer SREs to migrate and to roll out changes to our production environment

  • Mitigate data layer-related production incidents and properly document them

  • Support and debug database production issues across services and levels of the stack.

  • OnCall support on rotation with the team.

  • Document every action so your learnings turn into repeatable actions and then into automation.

  • Provide data layer expertise to engineering teams

  • Work on automation of database infrastructure and help engineering succeed by providing self-service tools.

  • Make monitoring and alerting alert on symptoms and not on outages.

  • Working closely with other functional groups to define priorities, direction and timelines

  • Collaborate closely with Product Engineers and Product & Program Management teams in an agile engineering environment

  • You have excellent communication and interpersonal skills and above all, you are a team player!

  • Able to code in a modern high-level programming language (Python, Ruby, Groovy, etc)

  • 10+ years of experience in managing datastores in a Cloud scale environment

  • Deep domain knowledge in at least one of the data stores, listed above

  • Strong listening, communication, and organizational skills

  • Experience working in a distributed remote environment

  • Strong passion to understand, learn, and evaluate new technologies

All your information will be kept confidential according to EEO guidelines.


icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Senior Site Reliability Engineer

Datto

Posted 4 days ago

VIEW JOBS 3/19/2019 12:00:00 AM 2019-06-17T00:00 Datto, the world's leading provider of IT solutions delivered through managed service providers, is looking for a Senior Site Reliability Engineer to join a growing team. Datto is a creative company at its core and is an exciting and dynamic workplace. We're 100% focused on our managed service provider partners and believe that with the right technology, managed service providers can change how businesses around the world operate. Datto provides data protection, business continuity, networking, business management, and file backup and sync products that empower and protect the clients of our 14,000+ partners. We're headquartered in Norwalk, Connecticut and have 22 offices worldwide. We're looking for a motivated, self-starting, Sr. Site Reliability Engineer to help pioneer this role at Datto. The Sr. Site Reliability Engineer attaches to our Core Products Team, which maintains and develops new features for all of Datto's backup appliances (~75K devices and growing quickly). The backup device is a physical or virtual appliance that takes block-level backups of Windows, Mac, and Linux machines, turns them into raw disk images and stores them on a local ZFS-based disk array. In the case of a disaster, our customers restore these backups/disk-images instantly as KVM-based virtual machines, iSCSI targets, Samba shares, and many other formats. We also offer a virtual VMware/Hyper-V-based appliance and integrate with their hypervisors. We write code in modern Symfony-based PHP (with some Python and C++ sprinkled in), and we strongly rely on our Ubuntu-based Linux stack. We do amazing and exciting things every day, such as detecting when a VM has booted successfully, injecting drivers into the Windows registry before boot, and generating vmdk files on the fly. On top of that, we work with many low-level technologies, such as hypervisors and the ZFS filesystem. This is not your average PHP webdev gig! You will report to the Sr. Director of Software Engineering. Does This Describe You: You're a technical expert! A Look Inside the Job: * Collaborate with Product and Software Development teams to determine the Core products reliability strategy including Service Level Objectives (SLOs) and Indicators (SLIs) * Guide product reliability improvement through monitoring, alerting, and application of software development best practices * Collect SLI metrics and establish monitoring based on SLO thresholds and other product requirements * Establish and configure transaction volume, traffic, performance, and error rate monitoring including alert thresholds, capacity planning, and performance impact analysis * You will participate in SRE software engineering, writing code for the continuing reduction of human intervention in operational tasks and automation of processes * Troubleshoot complex issues quickly and effectively * Develop a balanced on-call program with appropriate staffing * Communicate with Users, Support, and Development teams in the event of an incident * Diagnose and develop root cause solutions for failures and performance issues in our production environment About You: * Bachelor's degree in Computer Science or equivalent experience * Strong root cause analysis and troubleshooting competency * Experience working with automation and data-driven analysis * Experience with OOP languages such as Java, PHP, C#, or C++ * Solid understanding of Objection Oriented Programming fundamentals Bonus Points: * Experience with distributed systems, hypervisors or file systems At Datto, we believe our employees are our greatest asset and offer all full-time employees a wide-ranging benefits package, including: * Comprehensive health-care benefits * Free lunch every Friday * Flexible working hours * Unlimited paid time off * Free food, drinks, and fresh organic fruit * Fitness reimbursement * Charity match program * Transit subsidy in select cities * Education reimbursement * And more! By submitting an application, you acknowledge we will process your data in order to consider you for the position you apply for and for other open positions within our company for which you may be suited. We collect and store your data in accordance with our Recruiting Privacy Practices. Datto is an equal opportunity employer. Datto Boston MA

Senior Database Reliability Engineer Dbre (Remote)

Fireeye