What is Machine Learning and What's Inside The Course

Imagine you want to know the expected salary of a developer with 5 years of experience. Also imagine you have a CSV file with salaries of 100 developers, including the column of years experience.

Can we make a calculation/prediction from this? Sure, we can!

This is exactly the topic of this practical introduction course to Machine Learning. My goal here is to explain what ML is, showing you two first practical examples.

Prerequisites are quite minimal: you should know the basics of Python, but even if you don't, no worries, it will be mostly about calling Python libraries, so you should understand the code, anyway.

If you come from PHP background, you may read our course Python 101 for PHP Developers and learn the syntax differences.

In this intro lesson, I will explain what ML is and which part we will cover in this course.

What is Machine Learning?

It is a process when you train a so-called Model on the existing data to later make predictions or other actions for new future data.

Example 1: House Prices. You have a CSV file of house prices based on location, square footage, and a few more variables. You need to build/train a model to predict potential prices based on other locations/footage.

Example 2: Product Categories. You have a database of products with their categories and subcategories. And you need to build/train a model to auto-detect the category/subcategory for new products by their name and other parameters.

In other words, your goal is to train the algorithm(s) to work for you in the future.

So, you need to work on:

Collecting and preparing data
Choosing the correct models/algorithms/libraries
Training, evaluating and improving the models

In this course, I want to show you a simple example (but from a real-life scenario) of this process, so that you would understand it in practice.

Machine Learning Types

Generally, the machine learning field is divided into three types:

Supervised learning: when the data already contains some results/outputs and we can train the new data based on them. This is primarily suitable for beginners in ML.
Unsupervised learning: when the data doesn't contain the outputs or labels, and the model tries to predict the outcome.
Reinforced learning: the most complex type, when the model is learning based on the feedback from the environment, whether the outcome of a particular action is good or bad.

Then, each of those has different problem types inside.

For beginners, I would suggest to focus on supervised learning, with two kinds of problems:

Regression (topic of this course): with a linear relationship between input and output variables, like prices of products, salaries of employees, weather, etc.
Classification: with inputs needed to be automatically classified - which category, color, yes/no, male/female or other options they belong to.

What's inside this course

In this course, we will take a look at one of the models above: linear regression. We will try to predict employee salaries from various parameters based on two examples:

Artificially generated CSV file with "almost ideal" data
Real-life survey responses with "not so ideal" data that requires additional preprocessing

So, without further ado, let's dive in!