Introduction to Machine Learning

Machine learning is like teenage sex:

Everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…

The above statement was told by Prof. Dan Ariely, who is the James B. Duke Professor of Psychology and Behavioral Economics at Duke University^[1]. Actually, he sends this tweet as an introduction to big data. But I think that this is valid for machine learning also.

Most people who claim that they are doing machine learning are just copying someone else’s work. They search google for best machine learning models, put their data, train and obtain the result. But it is not such an easy task. You have to filter out unusable data (pre-processing), do some research and find the best method or algorithm to use and then train. This is not as simple as saying when there are lots of data.

When the dataset is so small, this procedure changes completely. But the use of this article is not to explain the training process of some algorithm or neural network. But to give an introduction to machine learning.

Artificial Intelligence (AI)

In the early days, people wanted to automate what they are doing, just because humans are just lazy. Then they invented machines. Then they wanted machines to think and make decisions. So they invented algorithms that can give a thing that looks like intelligence, to machines.

Programmers would code the algorithms, which were mainly based on simple logical functions to solve the problems they face. So people used to think that the intelligence of a machine has been always equal to the intelligence that is given by the programmer. And in those days, it was correct. If a dumb programmer writes that logic, the machine will also be dumb.

(Note: AI is not always this dumb. I wrote this to approach machine learning.)

Machine Learning

Then people tried to teach machines how to learn. Now programmer will not program how to make decisions. But to learn from examples and experience, without the logic being explicitly programmed. Instead of writing the decision-making code, you feed data to the algorithm, and it builds the logic of decision making based on the data given.

A formal definition of “Machine Learning” was given by Tom M. Mitchell in his book Machine Learning^[3].

“A computer program is said to learn from experience E with some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.”

I will try to explain this as I can with an example. Let’s say you wrote a program that will determine the gender of a given image of a person. In that context:

E = Experience gathered by guessing the gender.
P = An equation that will measure how well this program works
T = Guess the gender of the given image

First, you will need to gather some images of humans and you have to manually tag each image with its gender. Then you feed the set of images to your program and the program will also guess the gender. Then you can calculate how well your program performs with your performance metric “P”. Again, you can feed those images to the program and calculate the performance of the program. If the performance has increased now, then your program is a “machine learning program” according to this definition.

Machine learning is used almost everywhere to make decision making a lot easier. If we take Google as an example, it uses machine learning in

Identify spam emails
Based on the search results predict the things you like so they can show advertisements
Personalized search results
Find similar images in image search
Extract text from images in image search and google translate
Detect human faces in google photos and uniquely identify each person
Add filters automatically to images
Google assistant

And there are so many more ways that Google uses machine learning to improve the user experience.

Types of machine learning

For a machine learning program to learn, it will need data. As I said earlier, you will need to pre-process the data to increase the accuracy. Using those good data, you will have to teach your machine learning program. This teaching process is called “training”. Machine learning tasks can be separated into 3 main categories by the way it is trained.

Supervised learning
Unsupervised learning
Reinforcement learning

Supervised learning

This is a very popular and easily understandable way of training machine learning program. For this task, you will need labeled data. That means, suppose you need to classify a given image is an image of a dog or not. Then you will need to have a dataset with images and each image has to be labeled as a dog or not a dog.

Then you will have a set of images ($x$) and the labels ($y$). The machine learning program will find the function $f$ that can be used to give a label to a given image. Such that,

$$ y = f(x) $$

Supervised learning can be divided into two main categories as

Classification
Regression

Classification and Regression in Supervised learning

The above example I gave is an example of a classification problem. Spam detection, image classification, activity detection are some more examples. The main thing to focus here is that a classification problem will have a set of classes and our program have to tell whether the given data can be categorized into any of the classes.

Examples of regression problem are age detection, stock price prediction, etc. Unlike in classification problem, in regression, we have to give an exact number. If we take age detection, when an image of a person is given, the program has to predict the age in years.

Difference between classification and regression

Some of the learning algorithms that use supervised learning as it learning method are listed below.

Support Vector Machines
Linear regression
Logistic regression
Naive Bayes
Linear discriminant analysis
Decision trees
K-nearest neighbor algorithm
Neural Networks
Similarity learning

Unsupervised Learning

Unlike in supervised learning, here it learns from test data that has not been labeled, classified or categorized. In other words, here we have only the input data ($x$), but no output ($y$) if specified. Unsupervised learning identifies commonalities in the data and reacts based on the presence or absence of such commonalities in each new piece of data.

Unsupervised learning is mainly used in clustering, feature learning, and density estimation. The main idea of them is to learn the common features of the input and classify.

Classifying input using unsupervised learning

‍
Example for Unsupervised Learning Problem^[5]

If we take the above example, the task is to categorize ducks, rabbits and mice. While training we do not tell the algorithm the category of the images, but while learning the algorithm will automatically identify the category of the given image.

As this algorithm tries to learn the common features of the given data, we can use unsupervised learning algorithms to exploratory analysis and dimensionality reduction.

Main algorithms that are used in unsupervised learning is listed below.

Clustering
hierarchical clustering
k-means
mixture models
DBSCAN
OPTICS algorithm
Neural Networks
Autoencoders
Deep Belief Nets
Hebbian Learning
Generative Adversarial Networks
Self-organizing map

Reinforcement Learning

This method falls in between supervised learning and unsupervised learning. As in supervised learning, this algorithm needs input and output data. Most often, input data are called actions and output data is called rewards in this context. It is not strictly supervised as it does not rely only on a set of labeled training data, but is not unsupervised learning because we have a reward for each input action. The purpose of this algorithm is to find the right action to do in different situations which will maximize the final reward.

Here the mouse can do a set of actions as go left, right, front and back. For each action, the mouse will be given -1 reward. If the mouse gets the cheese somehow, it will be given a large reward, maybe 10000.

The goal is to get the cheese with a minimal number of actions. As each action costs -1 reward, the algorithm will try to get the best possible path to the cheese.

Basic reinforcement algorithms are based on Markov decision process.

‍