Data Science Specialist (2019140)

Management Science Associates, Inc. Pittsburgh , PA 15201

Posted 3 months ago

Overview: Management Science Associates, Inc. (MSA) is a diversified information management company that for nearly half a century has given market leaders the competitive edge in data management, analytics and technology. We are currently seeking a Data Science Specialist (Intern) to work within MSA's Corporate Business Development Group which works with market-leading client companies to address their most challenging business issues and opportunities. If you're a highly talented, creative analyst looking to rapidly advance your skills and career, MSA has an opportunity for you. Our business is applying data and analytics in innovative ways to solve complex problems, often inventing tools and technology that become industry-standard solutions. You'll work with leading technology and business experts in their respective fields, both within MSA and our clients. Because of the diversity of MSA businesses and clients, you'll gain highly valuable experience in a number of global industries deploying leading customer-focused analytics, such as consumer products, retailing, pharmaceuticals, healthcare, media, and others. Join our team to contribute to creating these leading practices.

Responsibilities:

  • Develop predictive and prescriptive models, and perform data mining on very large data sets using machine learning techniques

  • Test and debug analytical code. Resolve unexpected problems resulting from unusual data and from limitations inherent in software applications and data processing resources.

  • Create visualizations for presentations to business users

  • Conduct independent and collaborative analytical work on client projects led by higher-level team members

Required Skills

  • Bachelor's degree in computer science, statistics, mathematics, physics, or related discipline that includes statistics, machine learning and advanced computational concepts and techniques

  • Exposure to Data Science methods, tools and software. Some experience, classroom or real-world, with languages such as R or Python

  • Exposure to one or more commonly used machine learning libraries such as those found in R, Python's scikit-learn or Spark's MLlib

  • Experience with one or more programming languages such as Java, C#, C++, Scala, etc., scripting languages such as Python or Perl and visualization tools such as Tableau

  • Exposure to relational database technologies and SQL

  • Strong work ethic and commitment to getting the job done

  • Accepting of change and uncertainty

  • Ability to take direction and specifications and create machine learning or artificial intelligence (AI) applications quickly, efficiently

  • Ability to troubleshoot and solve routine problems and issues as they occur in the development of software and systems

  • Ability to learn and utilize new languages, tools and techniques

  • High quality of work and attention to detail with good organizational ability

  • Self-motivated for continued learning and growth

icon no score

See how you match
to the job

Find your dream job anywhere
with the LiveCareer app.
Mobile App Icon
Download the
LiveCareer app and find
your dream job anywhere
App Store Icon Google Play Icon
lc_ad

Boost your job search productivity with our
free Chrome Extension!

lc_apply_tool GET EXTENSION

Similar Jobs

Want to see jobs matched to your resume? Upload One Now! Remove
Data Scientist Log Analytics Computer Science Department

Carnegie Mellon University

Posted 4 days ago

VIEW JOBS 1/21/2020 12:00:00 AM 2020-04-20T00:00 Job Function: The Technology for Effective and Efficient Learning (TEEL) Lab is seeking a Data Scientist who will be one of our team to develop a project-based Online Social Learning platform for workforce training at scale. The goal of this project in the TEEL lab is to develop a portable and interoperable online learning ecosystem that enables effective and efficient learning that leverages social interactions between students as a substantial learning resource. Furthermore, in addition to large scale software development, the lab conducts studies of student learning and evaluates innovative approaches for incorporating social learning as a driver for developing cognitive skills and motivation through reflection, interaction, and cohort building. Core responsibilities will include: * Design and build the infrastructure to describe, collect and exchange learning activity data and Technology Enhanced Learning (TEL) tool usage data from various resources to enable research experiments; * Support these research-driven endeavors, one of the team's goals is to use evidence-based learning science to create effective online learning. This approach requires ongoing data collection and analysis of the learning process; * Requirements gathering, design, and implementation of a data pipeline to support learning research. Such a data pipeline is composed of data ingestion, data persistence technologies (in the form of data lake, data warehouse, etc.), ETL, and analytics tools to aggregate and consume various data streams (Spark, Hadoop, etc.). You will also develop and automate reports, iteratively build and prototype dashboards to provide insights at scale and test hypotheses; * Design and build a logging collection, storage and analytics solution to support online education platforms that collect course-based learning logs from both Learning Management System (LMS) microservices and external cloud platform logs. The framework will provide function-specific interactive services for data querying and visualization, report generation and notification; * Design and adopt a logging standard (e.g., IMS Caliper) to model learning activity events with relevant context from various sources that help facilitate learning; * Establish a common vocabulary for describing learning interactions (including social interactions between students in a variety of channels, beginning but not limited to text-based interactions) to prompt data interoperability and sharing; * Collect heterogeneous logs from various sources and move into a data lake. The logs include but are not limited to: * Submission attempts and grading outcomes evaluated by an auto-grading service; * Student enrollment and demographic data; * Student pageview logs on the LMS platform; * Chat contributions; * Discussion Forum contributions; * Interaction data from Blogs (including editing behavior logs); * Cloud resource usage logs on cloud platforms such as Azure; * Logs from other microservices. * Perform Extract, Transform and Load (ETL) to transform various logs into the common logging standard and store them into a data lake using the fully managed ETL service; * Define the data schema and store structural and operational metadata using a metadata repository; * Embed data hooks into the LMS platform and the TEL tools to feed the data to the new data pipeline; * Analyze and visualize the transformed logs to answer research questions which may depend on the insight to: * Annotate textual data; * Apply and extend text mining tools; * Validate through statistical significance which behaviors and content correlate with performance and consistently produce the desired learning outcomes; * Compare the effectiveness of different content or interaction types through A/B testing; * Establish predictive measures to support early intervention systems; * Select and implement appropriate statistical models; * Critically evaluate statistical models and communicate analyses to stakeholders. Qualifications: * At least a Bachelor's degree or higher in statistics or computer science; * At least two (2) years of professional experience with at least 1 year spent building, deploying and troubleshooting logging pipelines for production systems using the ELK stack (ElasticSearch, Logstash, and Kibana) and related technologies (Fluentd, Kafka, etc.); * At least one (1) years of experience applying basic text mining or NLP tools; * At least one (1) years of experience with commercial cloud services including Amazon Web Services (AWS), Google Cloud Platform (GCP) or Microsoft Azure; * Demonstrated skills in agile development for scalable ETL pipelines; * Experience with Test Driven Development; * Experience with RESTful web services; * Experience with RESTful API specification and the toolset (OpenAPI, Swagger); * Experience with functional programming and parallel computation (Standard ML); * Experience with statistical data analysis such as linear models, multivariate analysis, stochastic models, and sampling methods. Preferred qualifications: * Experience working in an agile development environment; * Experience building LMS and/or TEL tools and embedding data hooks to predict student performance and evaluate the efficacy of educational experiments to improve student learning; * Experience of quantitative and/or qualitative research methods and educational data mining, especially in the area of Discourse Analytics; * Exposure to logging standards such as IMS Caliper and xAPI to produce learning analytics; * Familiarity with CI/CD tools (Jenkins, Travis CI), containerized microservices (Docker, Kubernetes), serverless, and infrastructure automation tools (Terraform); * Familiarity with data lakes and data warehousing; * Strong track record of developing machine learning and analytics workflows using cloud-based services. How we work in the TEEL Lab: * Learner-centered decision making; * Fast-paced research-based environment; * Ability to work independently, take ownership of tasks and deliver high-quality work; * Effective collaboration within a team environment; * Effective project and time management skills; * Ability to respond to urgent requests for deployed services; * Ability to communicate with engineers, researchers, students, and CSP partners. Requirements: * Background check More Information: Please visit "Why Carnegie Mellon" to learn more about how we challenge the curious and passionate to imagine and deliver work that matters. Our benefits philosophy encompasses three driving priorities: Choice, Control and Well-being. Learn more about our outstanding benefits here . Carnegie Mellon University is an Equal Opportunity Employer/Disability/Veteran. FT/PT Status: Full Time Minimal Education Level Master's Degree or equivalent Salary: 96302 Description: The Technology for Effective and Efficient Learning (TEEL) Lab is seeking a Data Scientist who will be one of our team to develop a project-based Online Social Learning platform for workforce training at scale. The goal of this project in the TEEL lab is to develop a portable and interoperable online learning ecosystem that enables effective and efficient learning that leverages social interactions between students as a substantial learning resource. Furthermore, in addition to large scale software development, the lab conducts studies of student learning and evaluates innovative approaches for incorporating social learning as a driver for developing cognitive skills and motivation through reflection, interaction, and cohort building. Core responsibilities will include: * Design and build the infrastructure to describe, collect and exchange learning activity data and Technology Enhanced Learning (TEL) tool usage data from various resources to enable research experiments; * Support these research-driven endeavors, one of the team's goals is to use evidence-based learning science to create effective online learning. This approach requires ongoing data collection and analysis of the learning process; * Requirements gathering, design, and implementation of a data pipeline to support learning research. Such a data pipeline is composed of data ingestion, data persistence technologies (in the form of data lake, data warehouse, etc.), ETL, and analytics tools to aggregate and consume various data streams (Spark, Hadoop, etc.). You will also develop and automate reports, iteratively build and prototype dashboards to provide insights at scale and test hypotheses; * Design and build a logging collection, storage and analytics solution to support online education platforms that collect course-based learning logs from both Learning Management System (LMS) microservices and external cloud platform logs. The framework will provide function-specific interactive services for data querying and visualization, report generation and notification; * Design and adopt a logging standard (e.g., IMS Caliper) to model learning activity events with relevant context from various sources that help facilitate learning; * Establish a common vocabulary for describing learning interactions (including social interactions between students in a variety of channels, beginning but not limited to text-based interactions) to prompt data interoperability and sharing; * Collect heterogeneous logs from various sources and move into a data lake. The logs include but are not limited to: * Submission attempts and grading outcomes evaluated by an auto-grading service; * Student enrollment and demographic data; * Student pageview logs on the LMS platform; * Chat contributions; * Discussion Forum contributions; * Interaction data from Blogs (including editing behavior logs); * Cloud resource usage logs on cloud platforms such as Azure; * Logs from other microservices. * Perform Extract, Transform and Load (ETL) to transform various logs into the common logging standard and store them into a data lake using the fully managed ETL service; * Define the data schema and store structural and operational metadata using a metadata repository; * Embed data hooks into the LMS platform and the TEL tools to feed the data to the new data pipeline; * Analyze and visualize the transformed logs to answer research questions which may depend on the insight to: * Annotate textual data; * Apply and extend text mining tools; * Validate through statistical significance which behaviors and content correlate with performance and consistently produce the desired learning outcomes; * Compare the effectiveness of different content or interaction types through A/B testing; * Establish predictive measures to support early intervention systems; * Select and implement appropriate statistical models; * Critically evaluate statistical models and communicate analyses to stakeholders. Qualifications: * At least a Bachelor's degree or higher in statistics or computer science; * At least two (2) years of professional experience with at least 1 year spent building, deploying and troubleshooting logging pipelines for production systems using the ELK stack (ElasticSearch, Logstash, and Kibana) and related technologies (Fluentd, Kafka, etc.); * At least one (1) years of experience applying basic text mining or NLP tools; * At least one (1) years of experience with commercial cloud services including Amazon Web Services (AWS), Google Cloud Platform (GCP) or Microsoft Azure; * Demonstrated skills in agile development for scalable ETL pipelines; * Experience with Test Driven Development; * Experience with RESTful web services; * Experience with RESTful API specification and the toolset (OpenAPI, Swagger); * Experience with functional programming and parallel computation (Standard ML); * Experience with statistical data analysis such as linear models, multivariate analysis, stochastic models, and sampling methods. Preferred qualifications: * Experience working in an agile development environment; * Experience building LMS and/or TEL tools and embedding data hooks to predict student performance and evaluate the efficacy of educational experiments to improve student learning; * Experience of quantitative and/or qualitative research methods and educational data mining, especially in the area of Discourse Analytics; * Exposure to logging standards such as IMS Caliper and xAPI to produce learning analytics; * Familiarity with CI/CD tools (Jenkins, Travis CI), containerized microservices (Docker, Kubernetes), serverless, and infrastructure automation tools (Terraform); * Familiarity with data lakes and data warehousing; * Strong track record of developing machine learning and analytics workflows using cloud-based services. How we work in the TEEL Lab: * Learner-centered decision making; * Fast-paced research-based environment; * Ability to work independently, take ownership of tasks and deliver high-quality work; * Effective collaboration within a team environment; * Effective project and time management skills; * Ability to respond to urgent requests for deployed services; * Ability to communicate with engineers, researchers, students, and CSP partners. Requirements: * Background check More Information: Please visit "Why Carnegie Mellon" to learn more about how we challenge the curious and passionate to imagine and deliver work that matters. Our benefits philosophy encompasses three driving priorities: Choice, Control and Well-being. Learn more about our outstanding benefits here . Carnegie Mellon University is an Equal Opportunity Employer/Disability/Veteran. Carnegie Mellon University Pittsburgh PA

Data Science Specialist (2019140)

Management Science Associates, Inc.