Our mission is to make biology easier to engineer and make to help data to driving the research in partnership of Ginkgo Bioworks and Joyn Bio. Ginkgo is constructing, editing, and redesigning the living world in order to answer the globe's growing challenges in health, energy, food, materials, and more. Our bioengineers make use of an in-house automated foundry for designing and building new organisms. Today, our foundry is developing over 40 different organisms to make products across multiple industries. Joyn Bio is designing microbes for sustainable agriculture - Probiotics for Plants We're making the codebase, compiler, debugger, and data analysis for life. We're looking for an experienced Senior Software Data Engineer who is interested in architecting the software platform to support analytics and machine learning that will ultimately help to define how our bioengineering is performed at scale.
Ginkgo's programming languages of choice are Python and SQL, and DNA, but you must be someone who loves writing elegant code in any language. Most importantly, you should be passionate about making biology the next engineering discipline. The 20th century was all about bits and the awesome technology of computers. The 21st century is all about atoms and the awesome technology of biology, and Ginkgo is at the forefront of this revolution. As an experienced data pipeline builder and data wrangler who enjoys building data systems from the ground up, you're excited by the prospect of optimizing (or even redesigning) Ginkgo's data architecture to support our next generation of products and data initiatives. You'll be responsible for expanding and optimizing our data and data pipeline architecture, as well as optimizing data flow and collection for cross functional teams. You'll also support our software developers, database architects, data analysts, and data scientists on data initiatives and ensure optimal data delivery architecture is consistent throughout ongoing projects.
You will work in close collaboration with a data science team at Joyn Bio to address data needs of the shared projects
Create and maintain optimal data pipeline architecture
Identify, design, and implement internal process improvements, including: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, and more
Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS 'big data' technologies
Use appropriate tools to analyze the data pipeline and provide actionable insights into customer acquisition, operational efficiency, and other key business performance metrics
Work with stakeholders including the Executive, Product, Data Science, Design, Computational Biology teams to assist with data-related technical issues and support their data infrastructure needs
Keep our data secure
Desired Experience and Capabilities
Master's degree in Computer Science, Statistics, Informatics, Information Systems or related quantitative field
At least five years of data engineering experience
Advanced knowledge of database design best practices, as well as experience working with relational databases, data warehouses, and big data platforms
Proven capability of building and optimizing 'big data' data pipelines, architectures, and data sets
Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement
Strong analytical skills in relation to working with unstructured datasets
Experience building processes that support data transformation, data structures, metadata, dependency, and workload management
Working knowledge of message queuing, stream processing, and highly scalable 'big data' data stores
Strong project management and organizational skills
High level of comfort with supporting the data needs of multiple teams, systems, and products
Strong level of motivation and self-direction
Desired Software Tools/Expertise
Big data tools: Hadoop, Hive, Spark, Kafka, etc.
Relational SQL databases, including Redshift
Data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
AWS cloud services: EC2, EMR, RDS, Redshift
Object-oriented/object function scripting languages: Python, Java, C++, Scala, etc.
Linux (working knowledge)
To learn more about Ginkgo, check out some recent press:
Ginkgo Bioworks Is Turning Human Cells into On-Demand Factories (WIRED)
Can This Company Convince You to Love GMOs? (The Atlantic)
Hundreds Of Millions Of Dollars Pour Into Hacking Microbes (Forbes)
This food tech startup just raised $90 million to make it easier to invent new plant-based meats (Fast Company)
We also feel it's important to point out the obvious here there's a serious lack of diversity in our industry and it needs to change. Our goal is to help drive that change. Ginkgo is deeply committed to diversity, equality, and inclusion in all of its practices, especially when it comes to growing our team. We hope to continue to build a company whose culture promotes inclusion and embraces how rewarding it is to work with engineers from all walks of life. Making biology easier to engineer is a tough nut to crack we can't afford to leave any talent untapped.