AWS Machine Learning Blog
DeepLearning.AI, Coursera, and AWS launch the new Practical Data Science Specialization with Amazon SageMaker
Amazon Web Services (AWS), Coursera, and DeepLearning.AI are excited to announce Practical Data Science, a three-course, 10-week, hands-on specialization designed for data professionals to quickly learn the essentials of machine learning (ML) in the AWS Cloud. DeepLearning.AI was founded in 2017 by Andrew Ng, an ML and education pioneer, to fill a need for world-class AI education. DeepLearning.AI teamed up with an all-female team of instructors including Amazon ML Solutions Architects and Developer Advocates to develop and deliver the three-course specialization on Coursera’s education platform. Sign up for the Practical Data Science Specialization today on Coursera.
Moving data science projects from idea to production requires a new set of skills to address the scale and operational efficiencies required by today’s ML problems. This specialization addresses common challenges we hear from our customers and teaches you the practical knowledge needed to efficiently deploy your data science projects at scale in the AWS Cloud.
Specialization overview
The Practical Data Science Specialization is designed for data-focused developers, scientists, and analysts familiar with Python to learn how to build, train, and deploy scalable, end-to-end ML pipelines—both automated and human-in-the-loop—in the AWS Cloud. Each of the 10 weeks features a comprehensive, hands-on lab developed specifically for this specialization and hosted by AWS Partner Vocareum. The labs provide hands-on experience with state-of-the-art algorithms for natural language processing (NLP) and natural language understanding (NLU) using Amazon SageMaker and Hugging Face’s highly-optimized implementation of the BERT algorithm.
In the first course, you learn foundational concepts for exploratory data analysis (EDA), automated machine learning (AutoML), and text classification algorithms. With Amazon SageMaker Clarify and Amazon SageMaker Data Wrangler, you analyze a dataset for statistical bias, transform the dataset into machine-readable features, and select the most important features to train a multi-class text classifier. You then perform AutoML to automatically train, tune, and deploy the best text classification algorithm for the given dataset using Amazon SageMaker Autopilot. Next, you work with Amazon SageMaker BlazingText, a highly optimized and scalable implementation of the popular FastText algorithm, to train a text classifier with very little code.
In the second course, you learn to automate an NLP task by building an end-to-end ML pipeline using BERT with Amazon SageMaker Pipelines. Your pipeline first transforms the dataset into BERT-readable features and stores the features in the Amazon SageMaker Feature Store. It then fine-tunes a text classification model to the dataset using a Hugging Face pre-trained model that has learned to understand human language from millions of Wikipedia documents. Finally, your pipeline evaluates the model’s accuracy and only deploys the model if the accuracy exceeds a given threshold.
In the third course, you learn a series of performance-improvement and cost-reduction techniques to automatically tune model accuracy, compare prediction performance, and generate new training data with human intelligence. After tuning your text classifier using hyperparameter tuning, you deploy two model candidates into an A/B test to compare their real-time prediction performance and automatically scale the winning model using Amazon SageMaker Hosting. Lastly, you set up a human-in-the-loop pipeline to fix misclassified predictions and generate new training data using Amazon Augmented AI (Amazon A2I) and Amazon SageMaker Ground Truth.
“The field of data science is constantly evolving with new tools, technologies, and methods,” says Betty Vandenbosch, Chief Content Officer at Coursera. “We’re excited to expand our collaboration with DeepLearning.AI and AWS to help data scientists around the world keep up with the many tools at their disposal. Through hands-on learning, cutting-edge technology, and expert instruction, this new content will help learners acquire the latest job-relevant data science skills.”
Register today
The Practical Data Science Specialization from DeepLearning.AI, AWS, and Coursera is a great way to learn AI and ML essentials in the cloud. The three-course specialization is a great resource to start building and operationalizing data science projects efficiently with the depth and breadth of Amazon ML services. Improve your data science skills by signing up for the Practical Data Science Specialization today at Coursera!
About the Authors
Antje Barth is a Senior Developer Advocate for AI and Machine Learning at Amazon Web Services (AWS). She is co-author of the O’Reilly book – Data Science on AWS. Antje frequently speaks at AI / ML conferences, events, and meetups around the world. Previously, Antje worked in technical evangelism and solutions engineering at Cisco and MapR, focused on data center technologies, big data, and AI applications. Antje co-founded the Düsseldorf chapter of Women in Big Data.
Chris Fregly is a Principal Developer Advocate for AI and Machine Learning at Amazon Web Services (AWS). He is a co-author of the O’Reilly book – Data Science on AWS. Chris has founded multiple global meetups focused on Apache Spark, TensorFlow, and Kubeflow. He regularly speaks at AI / ML conferences worldwide, including O’Reilly AI & Strata, Open Data Science Conference (ODSC), and GPU Technology Conference (GTC). Previously, Chris founded PipelineAI, where he worked with many AI-first startups and enterprises to continuously deploy ML/AI Pipelines using Apache Spark ML, Kubernetes, TensorFlow, Kubeflow, Amazon EKS, and Amazon SageMaker.
Shelbee Eigenbrode is a Principal AI and Machine Learning Specialist Solutions Architect at Amazon Web Services (AWS). She holds 6 AWS certifications and has been in technology for 23 years spanning multiple industries, technologies, and roles. She is currently focusing on combining her DevOps and ML background to deliver and manage ML workloads at scale. With over 35 patents granted across various technology domains, she has a passion for continuous innovation and using data to drive business outcomes. Shelbee co-founded the Denver chapter of Women in Big Data.
Sireesha Muppala is an Enterprise Principal SA, AI/ML at Amazon Web Services (AWS) who guides customers on architecting and implementing machine learning solutions at scale. She received her Ph.D. in Computer Science from the University of Colorado, Colorado Springs, and has authored several research papers, whitepapers, blog articles. Sireesha frequently speaks at industry conferences, events, and meetups. She co-founded the Denver chapter of Women in Big Data.