Machine learning terminology
Machine Learning Simplified: A terminology guide
Machine learning terminology
Have you ever wondered how Netflix knows what movies and TV shows to recommend to you, or how Siri on your iPhone can understand your voice commands? The answer is machine learning!
Machine learning is a type of artificial intelligence that allows computers to learn and make decisions based on data, without being explicitly programmed. It’s like teaching a computer to think for itself by giving it examples to learn from.
Basic Machine Learning Terminology
But don’t worry – understanding the basic terminology is the first step towards grasping the key concepts of machine learning. Whether you’re just starting out in machine learning or need a refresher on the basics, this section will help you feel more confident and comfortable with the terminology. So let’s dive in and explore the key concepts of machine learning!
What is Input?
Input is the data that you give to the machine learning model to make its predictions.
Example of machine learning input:
If you’re building a model to predict housing prices, the input might be features like the number of bedrooms, the size of the house, and the location.
What is Output:
This is the prediction or output that the machine learning model gives you based on the input data.
Example of machine learning output:
If you’re building a model to predict housing prices, the output might be the predicted price of the house.
What is machine learning training data?
This is the set of input-output pairs that you use to train the machine learning model. The model learns from this data to make better predictions on new, unseen data.
Example of how training data is used to train a machine learning model:
Machine learning training data is like the pictures in a coloring book. You have a bunch of pictures with outlines, and you want to color them in. But you don’t know what colors to use for each part of the picture.
So what do you do? You look at previously colored pictures and see what colors other people have used. You practice coloring inside the lines until you get good at it. Then, once you’re confident, you try coloring a brand new picture that you’ve never seen before.
In machine learning, the training data is like the pictures with outlines. You have a bunch of data points, each with some features (like the number of bedrooms in a house) and a label (like the price of the house). You use this data to “practice” making predictions, adjusting the model parameters until you get good at it.
Once you’ve trained the model on the training data, you can then use it to make predictions on new, unseen data that you haven’t seen before, just like coloring in a new picture in your coloring book.
What are machine learning “features”?
In machine learning, we use features to help us predict the labels (see below!) for new data that we haven’t seen before. For example, we might use features like the size and color of an animal to predict whether it’s a dog or a cat.
Example of features:
Let’s say you’re making sandwiches for a party, and you want to predict how many sandwiches you need to make based on some information about the guests. The information you have about each guest is the “features” of the problem. For example, you might know their age, gender, dietary restrictions, and how many parties they’ve attended before. (Read about labels for the rest of the story.)
What are machine learning “labels”?
Sure! In machine learning, labels are the output values or results that we are trying to predict based on the input features. In other words, they are the correct answers that we already know for a given set of data.
Example of labels:
Continuing the example from above, the “labels” in this case would be the actual number of sandwiches you need to make for each guest. This is the information you want your machine learning model to predict. For example, if you know that a guest is a 25-year-old male with no dietary restrictions who has attended 3 parties before, the label might be that they will eat 3 sandwiches.
What’s the difference between outputs and labels in machine learning?
Output is what the machine learning model thinks the answer is for a given input. For example, if the input is a picture of a cat, the output might be “cat”.
Labels are the correct answers for each input in our training data. For example, if we’re training a machine learning model to recognize cats and dogs in pictures, the labels for each picture would be “cat” or “dog”.
The goal of the machine learning model is to produce outputs that are as close as possible to the correct labels, so that it can make accurate predictions on new, unseen data.
What is a machine learning model?
The model is a central component of machine learning. It is essentially a mathematical function that takes in input data (the features) and produces an output (the prediction).
The goal of training the model is to optimize its parameters so that it can make accurate predictions on new, unseen data. This involves finding the right balance between underfitting (when the model is too simple and unable to capture the complexity of the data) and overfitting (when the model is too complex and memorizes the training data without being able to generalize to new data).
Once the model is trained and its parameters are optimized, it can be used to make predictions on new data, which is the ultimate goal of machine learning.
Here’s an example of a machine learning model:
Maybe you’re trying to build a machine learning model to predict whether a fruit is an apple or an orange based on its color and size. You start by collecting a dataset of fruits with labels indicating whether they are apples or oranges, along with features such as color and size.
You use this data to train your machine learning model, which learns to associate certain color and size combinations with either apples or oranges. Once your model is trained, you can test it by giving it new, unseen fruit samples with unknown labels.
Your model will predict whether each fruit is an apple or an orange based on its features, such as color and size. You can then compare these predictions to the true labels to see how accurate your model is.
If your model is accurate, you can use it to predict the types of fruit in new datasets or real-world situations, such as in a grocery store or on a fruit farm.
What is Testing data?
This is the set of input-output pairs that you use to test the performance of the machine learning model after it has been trained. It’s a different set of data than what’s used in training the model, so it can give you a good idea of how well the model will perform on new, unseen data.
Here’s an example of testing data:
Pretend you are learning to play a game, like soccer. Your coach teaches you how to play by showing you how to kick the ball and pass it to your teammates. You practice with your teammates, and your coach watches to see if you’re doing it correctly.
Once your coach thinks you’ve learned enough, they organize a practice game. This is the testing data. Your coach asks you to play the game without any help, to see if you can apply what you’ve learned to a real game situation.
If you play well and use the skills you’ve learned, your coach knows that you’re ready to play a real game. Similarly, in machine learning, we use testing data to see if our model can apply what it has learned to new situations and make accurate predictions.
What are model parameters?
Parameters are like the knobs that you can turn to adjust how the model makes its predictions.
Here’s a simple example of a model parameter:
For example, if you want to predict the price of a piece of land, one of the parameters may control how its area affects the price.
What are hyperparameters?
Hyperparameters are parameters in a machine learning model that are set manually by the user before the model is trained, rather than learned by the model during training. These parameters can have a significant impact on the performance of the model, and so they need to be carefully chosen and tuned.
These hyperparameters can affect how quickly the model learns, how complex the model is, and how much the model is penalized for fitting the training data too closely. The goal of tuning hyperparameters is to find the combination of values that produces the best performance on a validation set of data.
What is an example of hyperparameters?
Five year old you has a toy car that you want to make go as fast as possible. You can change the wheels, the weight, the motor, and the shape of the car to try to make it faster. The things you can change to try to make the car faster are like hyperparameters.
Just like you have to try different combinations of changes to make the car go faster, in machine learning you have to try different combinations of hyperparameters to get the best results from your model.
What is a cost function?
A cost function is like a game where you have to guess the right answer. The game tells you how close your guess is to the right answer. The closer you are, the better you’re doing. But if you’re really far off, you’re not doing very well.
When you’re training a machine learning model, you’re playing this game too. You’re trying to guess the right answer based on some inputs,+ like a picture of a dog or a sentence in English. But sometimes your guess is way off from the right answer, and you want to get better at the game.
To get better, you want to find the best way to guess the answer. This means finding the right set of rules that the machine learning model should follow to make good guesses. To find these rules, you need to keep playing the game and figuring out how to make your guesses better.
The cost function is like the score of the game. You want to get the lowest possible score, which means you’re guessing the answer really well. To get a lower score, you need to adjust the rules you’re following (model parameters) so that your guesses are better. The machine learning algorithm helps you figure out how to adjust the rules so that you can get a better score on the game.
The choice of cost function is an important aspect of designing a machine learning model, as it determines the type of errors that the model will try to minimize during training.
Here’s a simple example of a cost function:
Imagine you and your friends are trying to guess how much a house costs based on two things: how big it is and how many bedrooms it has. You each make your guesses, but sometimes you’re really close and sometimes you’re way off.
To figure out who’s doing the best job of guessing, you decide to use a special score called the mean squared error (MSE). Basically, the MSE is a way of measuring how good your guesses are on average. The lower your score, the better you are at guessing the price of houses.
To make your guesses better, you start adjusting your guesses based on how far off they were from the true price. This way, you can learn to make better guesses over time. The computer does the same thing, but instead of adjusting guesses by hand, it adjusts some special numbers (weights and biases) that help it make better guesses automatically.
So the goal of the computer is to find the best set of these special numbers (weights and biases) that give it the lowest possible MSE score when guessing the price of houses. Once it finds those numbers, it will be really good at guessing the price of houses it hasn’t seen before!
Want to explore more machine learning terms?
Congratulations, you’ve made it to the end of our crash course on machine learning terminology! We hope that by breaking down these key concepts in a simple and accessible way, you now have a better understanding of the foundational ideas behind machine learning.
If you’re ready to step it up a notch, we get into some more difficult definitions in Part 2- Next Level Machine Learning Terminology.
As you continue to explore the world of machine learning, you’ll undoubtedly come across more complex terms and concepts. But with a solid grasp of these basics, you’ll be well on your way to understanding more advanced topics.
Remember, the key to mastering machine learning (or any new subject) is to stay curious and keep learning. Don’t be afraid to ask questions, experiment with different tools and techniques, and seek out resources that can help you deepen your understanding.
With that said, we wish you the best of luck in your machine learning journey. Happy learning!