By the end of this project, you will learn how to create machine learning pipelines using Python and Spark, free, open-source programs that you can download. You will learn how to load your dataset in Spark and learn how to perform basic cleaning techniques such as removing columns with high missing values and removing rows with missing values. You will then create a machine learning pipeline with a random forest regression model. You will use cross validation and parameter tuning to select the best model from the pipeline. Lastly, you will evaluate your model’s performance using various metrics.
Building Machine Learning Pipelines in PySpark MLlib
Taught in English
Instructor: Dr. Nikunj Maheshwari
3,770 already enrolled
Included with
Guided Project
Recommended experience
(59 reviews)
What you'll learn
Learn how to create a Random Forest pipeline in PySpark
Learn how to choose best model parameters using Cross Validation and Hyperparameter tuning in PySpark
Learn how to create predictions and assess model's performance in PySpark
Skills you'll practice
Details to know
Add to your LinkedIn profile
Guided Project
Recommended experience
(59 reviews)
See how employees at top companies are mastering in-demand skills
Learn, practice, and apply job-ready skills in less than 2 hours
- Receive training from industry experts
- Gain hands-on experience solving real-world job tasks
- Build confidence using the latest tools and technologies
About this Guided Project
Learn step-by-step
In a video that plays in a split-screen with your work area, your instructor will walk you through these steps:
Install Spark on Google Colab and load a dataset in PySpark
Describe and clean your dataset
Create a Random Forest pipeline to predict car prices
Create a cross validator for hyperparameter tuning
Train your model and predict test set car prices
Evaluate your model’s performance via several metrics
Recommended experience
The learner should be familiar with Python and basic machine learning algorithms.
6 project images
Instructor
Offered by
How you'll learn
Skill-based, hands-on learning
Practice new skills by completing job-related tasks.
Expert guidance
Follow along with pre-recorded videos from experts using a unique side-by-side interface.
No downloads or installation required
Access the tools and resources you need in a pre-configured cloud workspace.
Available only on desktop
This Guided Project is designed for laptops or desktop computers with a reliable Internet connection, not mobile devices.
Why people choose Coursera for their career
Learner reviews
Showing 3 of 59
59 reviews
- 5 stars
62.71%
- 4 stars
20.33%
- 3 stars
6.77%
- 2 stars
1.69%
- 1 star
8.47%
Reviewed on Sep 24, 2020
New to Machine Learning? Start here.
Open new doors with Coursera Plus
Unlimited access to 7,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Join over 3,400 global companies that choose Coursera for Business
Upskill your employees to excel in the digital economy
Frequently asked questions
By purchasing a Guided Project, you'll get everything you need to complete the Guided Project including access to a cloud desktop workspace through your web browser that contains the files and software you need to get started, plus step-by-step video instruction from a subject matter expert.
Because your workspace contains a cloud desktop that is sized for a laptop or desktop computer, Guided Projects are not available on your mobile device.
Guided Project instructors are subject matter experts who have experience in the skill, tool or domain of their project and are passionate about sharing their knowledge to impact millions of learners around the world.