Data Engineer II, ML Ops

Generate Biomedicines Somerville , MA 02143

Posted 3 weeks ago

About Generate:Biomedicines

Generate:Biomedicines is a new kind of therapeutics company - existing at the intersection of machine learning, biological engineering, and medicine - pioneering Generative Biology to create breakthrough medicines where novel therapeutics are computationally generated, instead of being discovered. The Company has built a machine learning-powered biomedicines platform with the potential to generate new drugs across a wide range of biologic modalities. This platform represents a potentially fundamental shift in what is possible in the field of biotherapeutic development.

We pursue this audacious vision because we believe in the unique and revolutionary power of generative biology to radically transform the lives of billions, with an outsized opportunity for patients in need. We are seeking collaborative, relentless problem solvers that share our passion for impact to join us!

Generate:Biomedicines was founded in 2018 by Flagship Pioneering and has received nearly $700 million in funding, providing the resources to rapidly scale the organization. The Company has offices in Somerville and Andover, Massachusetts with over 300 employees.

The Role:

We are seeking a creative and motivated ML (Machine Learning) Ops Data Engineer to help us build a cutting-edge data platform that will empower Generate's machine learning research. As an integral member of the ML Ops group, you will play a key role in data warehousing, ETL, and optimizing data usage during model training across a diverse array of biological datasets. The successful candidate will collaborate closely with ML Scientists, Computational Biologists, and Informatics/IT Engineers to develop scalable data systems that rapidly advance our scientific programs. This role is based in our Somerville, MA office with flexibility for hybrid work.

Here's how you will contribute:

  • Assist with the design, implementation and maintenance of performant, scalable ETL pipelines

  • Expand and refine Generate's data warehousing capabilities

  • Engage with multidisciplinary research teams to develop and optimize data models tailored to accommodate diverse biological datasets

  • Support the management and improvement of the cloud infrastructure backing our data platform

  • Develop and integrate APIs to streamline data flow and support the automation of machine learning pipelines and data management tasks

  • Champion data engineering best practices, contributing to the development and adherence to standards that enhance data quality, system reliability and workflow efficiency

The Ideal Candidate will have:

  • 3+ years experience working in a data engineer role

  • Bachelor's or Master's degree in computer science or a similar field

  • Extensive experience with major Cloud Service Providers (CSPs) such as AWS, Azure, and Google Cloud Platform (GCP), with a strong understanding of cloud-based solutions and infrastructure service

  • Advanced knowledge and understanding of data warehouse technology such as Redshift or BigQuery

  • Demonstrated ability in constructing large-scale ETL pipelines using popular frameworks such as Apache Airflow or Prefect

  • Proficiency in Python and strong object-oriented design skills coupled with a solid understanding of data structures and algorithms

  • Experience designing, deploying and managing database systems

  • Strong understanding of machine learning fundamentals

  • A strong interest in leveraging data engineering skills to unlock insights from complex biological datasets

  • Exceptional communication skills, with the ability to articulate complex data concepts in a way that is accessible and compelling to both technical and non-technical stakeholders

#LI-HM1

Generate:Biomedicines is committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status.

COVID Safety:

Generate:Biomedicines enforces a mandatory vaccination policy for COVID-19. All employees must be fully vaccinated and have received a booster. The purpose of this policy is to safeguard the health of our employees, their families, and the community at large from infectious disease that may be reduced by vaccinations. The Company will make exceptions to this policy if required by applicable law and will consider requests for an exemption from this policy due to a medical reason, or because of a sincerely held religious belief, or any other exemptions that may be recognized by applicable.

Recruitment & Staffing Agencies: Generate:Biomedicines does not accept unsolicited resumes from any source other than candidates. The submission of unsolicited resumes by recruitment or staffing agencies to Generate:Biomedicines or its employees is strictly prohibited unless contacted directly by the Company's internal Talent Acquisition team. Any resume submitted by an agency in the absence of a signed agreement will automatically become the property of Generate:Biomedicines and the Company will not owe any referral or other fees with respect thereto.


icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove

Data Engineer II, ML Ops

Generate Biomedicines