As a not-for-profit organization, Partners HealthCare is committed to supporting patient care, research, teaching, and service to the community by leading innovation across our system. Founded by Brigham and Women's Hospital and Massachusetts General Hospital, Partners HealthCare supports a complete continuum of care including community and specialty hospitals, a managed care organization, a physician network, community health centers, home care and other health-related entities. Several of our hospitals are teaching affiliates of Harvard Medical School, and our system is a national leader in biomedical research.
We're focused on a people-first culture for our system's patients and our professional family. That's why we provide our employees with more ways to achieve their potential. Partners HealthCare is committed to aligning our employees' personal aspirations with projects that match their capabilities and creating a culture that empowers our managers to become trusted mentors. We support each member of our team to own their personal developmentand we recognize success at every step.
Our employees use the Partners HealthCare values to govern decisions, actions and behaviors. These values guide how we get our work done: Patients, Affordability, Accountability & Service Commitment, Decisiveness, Innovation & Thoughtful Risk; and how we treat each other: Diversity & Inclusion, Integrity & Respect, Learning, Continuous Improvement & Personal Growth, Teamwork & Collaboration.
Principal Duties and Responsibilities:
Design, create, build, integrate, maintain and optimize multiple ETL data pipelines.
Aggregate and transform raw data coming from a variety of data sources to fulfill the functional & non-functional requirements (e.g., Microsoft SQL, Apache Hive, Apache HBase, Enterprise Data Warehouse, bedside monitors (HL7), EEG recordings (waveforms), web services, and others).
Design, create, optimize and maintain conceptual/physical data models, data catalogues and data architecture diagrams.
Actively Involved in all facets of data lake development: business analysis, requirements gathering, functional and technical specification, infrastructure definition, data architecture design, development, implementation, testing, deployment, and support of new applications.
Create and maintain related documentation on Confluence including data models, dataflow diagrams, integration schemas, interoperability relationships, etc.
Uses the Partners HealthCare values to govern decisions, actions and behaviors. These values guide how we get our work done: Patients, Affordability, Accountability & Service Commitment, Decisiveness, Innovation & Thoughtful Risk; and how we treat each other: Diversity & Inclusion, Integrity & Respect, Learning, Continuous Improvement & Personal Growth, Teamwork & Collaboration.
Perform other duties as assigned or required by the situation and circumstances
Bachelor's Degree in Computer Science, or other technical degree. Master's degree strongly preferred.
5 years in-depth experience with Python and at least one more programming language (R, MATLAB, C , Java, Scala, Spark).
5 years of experience designing, coding, testing and debugging multiple ETL integration interfaces of varying size and complexity.
3 years of experience with schema design, data architecting and dimensional data modeling (star schema).
3 years of experience writing complex SQL statements to extract data from data lakes.
Hands-on experience of Hadoop-based technologies for distributed near-real-time processing is highly desired.
Previous experience in healthcare industry is highly desired.
Proficiency in Python, shell scripting (Bash) and SQL is required.
Experience working with large volumes of structured, semi-structured & unstructured data.
Hands-on experience with Big Data frameworks/Hadoop-based technologies (Spark, Kafka, Hive, Hbase, Sqoop, Ranger, HDFS) is required.
In-depth experience designing, developing, and implementing pipelines to perform various ETL, cleaning, integration and scrubbing tasks.
Hands-on experience creating and maintaining multidimensional data models, conceptual/physical data models, data catalogues and data architecture diagrams.
Strong understanding of Object-oriented Programming (OOP), DevOps principles, design patterns, CI/CD (GitLab CI, Jenkins), code version control (GitLab, BitBucket), and industry software development best practices.
Exposure to the entire software development life cycle, supporting planned releases to different environments, such as QA, staging and production environments.
Experience working in an Agile/Scrum environment and familiarity with Jira and Confluence tools.
Strong verbal and written communication, ability to write clear technical documentation.
Hands-on experience with near-real-time data processing streaming (Kafka, Storm, Apache NiFi) is a plus.
Experience with healthcare interoperability standards including HL7 messaging, DICOM or FHIR is a plus.
Experience with Azure, GCP, AWS or other cloud providers is a plus.
Partners Healthcare System