Data Engineer - Personalization

Spotify Somerville , MA 02143

Posted 7 months ago

The Personalization team makes deciding what to play next on Spotify easier and more enjoyable for every listener. We seek to understand the world of music and podcasts better than anyone else so that we can make great recommendations to every individual person and keep the world listening.

Everyday, hundreds of millions of people all over the world use the products we build which include destinations like "Home" and "Search" as well as original playlists such as "Discover Weekly" and "Daily Mix." We're a team of technologists, product insight experts, designers, and product managers in Boston, New York, Stockholm, and London.

We are looking for data engineers that will build data-driven solutions to deliver music and digital media experiences to our 100 million active users by analyzing our on-platform usage data, understanding our data from an off-platform perspective and improving the accuracy and precision of our data. Above all, your work will impact the way the world experiences music.

What you'll do

  • Continuously design, develop, and test data-driven solutions

  • Work with state-of-the-art data processing frameworks and technologies

  • Help drive the company-wide advancement of Spotify's data infrastructure, tooling and processes

  • Improve data quality through testing, tooling and continuously evaluating performance

  • Collaborate with software engineers, ML experts and others

  • Work in cross-functional agile teams to regularly experiment, iterate, and deliver on new product objectives with an end-to-end responsibility for your squad's mission

  • Work from our office in Boston, Massachusetts

Who you are

  • You have a proven record of personally taking large data projects from ideation to implementation

  • You have experience architecting and operating large data pipelines

  • You know how to work with high volume heterogeneous data with distributed systems such as Hadoop

  • You are an expert in one or more higher level JVM-based data processing frameworks like Crunch, Scalding, Storm, Spark and Dataflow (not just Pig/Hive/BigQuery/other SQL-like abstractions)

  • You are knowledgeable about data modeling, data access, and data storage techniques

  • You care about agile software processes, data-driven development, reliability, and responsible experimentation

  • You are passionate about creating clean code and have a strong foundation in coding and building data pipelines

  • You preferably have worked on open source or other data related projects

We are proud to foster a workplace free from discrimination. We strongly believe that diversity of experience, perspectives, and background will lead to a better environment for our employees and a better product for our users and our creators. This is something we value deeply and we encourage everyone to come be a part of changing the way the world listens to music.

icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Principal Data Engineer

Partners Healthcare System

Posted 6 days ago

VIEW JOBS 11/6/2019 12:00:00 AM 2020-02-04T00:00 As a not-for-profit organization, Partners HealthCare is committed to supporting patient care, research, teaching, and service to the community by leading innovation across our system. Founded by Brigham and Women's Hospital and Massachusetts General Hospital, Partners HealthCare supports a complete continuum of care including community and specialty hospitals, a managed care organization, a physician network, community health centers, home care and other health-related entities. Several of our hospitals are teaching affiliates of Harvard Medical School, and our system is a national leader in biomedical research. We're focused on a people-first culture for our system's patients and our professional family. That's why we provide our employees with more ways to achieve their potential. Partners HealthCare is committed to aligning our employees' personal aspirations with projects that match their capabilities and creating a culture that empowers our managers to become trusted mentors. We support each member of our team to own their personal development—and we recognize success at every step. Our employees use the Partners HealthCare values to govern decisions, actions and behaviors. These values guide how we get our work done: Patients, Affordability, Accountability & Service Commitment, Decisiveness, Innovation & Thoughtful Risk; and how we treat each other: Diversity & Inclusion, Integrity & Respect, Learning, Continuous Improvement & Personal Growth, Teamwork & Collaboration. Overview * We are looking for a self-motivated Principal Data Engineer to join our data engineering team. * Design, Develop, construct, test and maintain architectures such as Data Lake, large-scale data processing systems * Big data ecosystem related Tool selection and POC analysis * Gather and process raw data at scale that meet functional / non-functional business requirements (including writing scripts, REST API calls, SQL Queries, etc.) * Develop data set processes for data modeling, mining and production * Integrate new data management technologies (Collibra, Informatica DQ..) and software engineering tools into existing structures * The candidate will be responsible for participating in building new Data Lake in Azure, Hadoop, expanding and optimizing our data platform and data pipeline architecture, as well as optimizing data flow and collection for cross functional teams. * The ideal candidate is an experienced data pipeline builder who enjoys optimizing data systems and building them from the ground up. * The Data Engineer will support our Software Developers, Database Architects, Data Analysts and Data Scientists on data initiatives and will ensure optimal data delivery architecture is consistent throughout ongoing projects. * They must be self-directed and comfortable supporting the data needs of multiple teams, systems and products. * The right candidate will be excited by the prospect of optimizing and/or re-designing our data architecture to support next generation of products and data initiatives. Principal Duties and Responsibilities * Create and maintain optimal data pipeline architecture, assemble large, complex data sets that meet functional / non-functional business requirements on Cloud based data platforms (Azure) and relational data systems (SQL Server, SSIS) * Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, etc. * Build the data infrastructure required for optimal extraction, transformation, and loading of data from traditional/legacy data sources. * Work with stakeholders including the Management team, Product owners, and Architecture teams to assist with data-related technical issues and support their data infrastructure needs. * Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader. * 5 plus years of experience architecting and building Data Lake, Azure Big Data architecture, Enterprise Analytics Solutions, and optimizing 'big data' data pipelines, architectures and data sets. * Advanced hands-on SQL, USQL, Python, C#, Java, pySpark (2 of these) knowledge and experience working with relational databases for data querying and retrieval. * Experience with Design and Architecture of Azure big data frameworks/tools: Azure Data Lake, Azure Data Factory, Azure Data Bricks, Azure ML, SQL Data Warehouse, HDInsight.. * Experience with Design, ETL engineering and Architecture of MS SQL Server, Cosmos DB * Experience with Design and Architecture of SQL Server data security and Azure security, VM, Vnet * Experience with building processes supporting data transformation, data structures, metadata, dependency and workload management. * Experience working with cross-functional teams in a dynamic environment. * Experience building Big data pipeline with Java and/or Python a plus. * Strong SQL skills on multiple platform (preferred MPP systems) * Leading development of Data Lake Architectures from scratch * Data Modeling tools (e.g. Erwin, Visio) * 3-5 years of Programming experience in Python, and/or Java * Experience with Continuous integration and deployment * Strong Unix/Linux skills * Experience in petabyte scale data environments and integration of data from multiple diverse sources * Cloud advanced analytics – Azure ML, machine learning, text analysis, NLP * Healthcare experience, most notably in Clinical data, Epic, Payer data and reference data is a plus but not mandatory Skills Required * Expertise in SQL Server a must, Azure Data Lake and relational Data Warehouse platforms preferred * Demonstrated experience in Azure and Hadoop big data technologies (Cloudera, Hortonworks), Data Lake development is a plus * Experience with real time data processing and analytics products is a plus * Experience with Azure Big data technologies (Azure Data Lake, Azure Data Factory, Azure Data Bricks, Azure ML, SQL Data Warehouse, HDInsight..) is preferred * Large data warehousing environments in at least two database platforms (Oracle, SQL Server, DB2, etc) * Programming experience in Python, Java, SQL, good to have .Net, C# * ETL, data processing expertise in Azure (Azure Data Factory, Data Bricks..), Hadoop (map-reduce, spark, sqoop) and SSIS, HealthCatalyst, Informatica * Familiarity with data governance and data quality principles, good to have experience with data quality tools * Ability to independently troubleshoot and performance tune in large scale data lake, enterprise systems * Knowledge of data architecture principles, data lake, data warehousing, agile development, DevOps concepts and methodologies * Understanding of change management techniques, and the ability to apply them * Excellent verbal and written communication skills, problem solving and negotiation skills * Act as an effective, collaborative team member Working Conditions * Office setting, with some local travel between Partners Healthcare System sites * May require occasional travel for training Partners Healthcare System Somerville MA

Data Engineer - Personalization