R - Decision Tree: A Beginner's Guide
Hello there, future data scientists! Today, we're going to embark on an exciting journey into the world of decision trees using R. Don't worry if you've never coded before – I'll be your friendly guide every step of the way. By the end of this tutorial, you'll be creating your own decision trees and feeling like a real data wizard!
What is a Decision Tree?
Before we dive into the code, let's understand what a decision tree is. Imagine you're trying to decide whether to go for a run or not. You might ask yourself:
- Is it raining?
- Do I have enough time?
- Am I feeling energetic?
Based on your answers, you make a decision. That's essentially what a decision tree does – it makes decisions based on a series of questions!
Installing the Necessary R Packages
First things first, we need to equip ourselves with the right tools. In R, these tools are called packages. For our decision tree adventure, we'll need two main packages: rpart
and rpart.plot
.
Let's install them:
install.packages("rpart")
install.packages("rpart.plot")
Now, let's load these packages:
library(rpart)
library(rpart.plot)
Great job! You've just taken your first steps in R programming. Pat yourself on the back!
Creating a Simple Dataset
Now that we have our tools ready, let's create a simple dataset to work with. Imagine we're trying to predict whether someone will buy ice cream based on the temperature and whether it's a weekend.
# Create a data frame
ice_cream_data <- data.frame(
temperature = c(68, 85, 72, 90, 60, 78, 82, 75, 68, 71),
is_weekend = c(0, 1, 0, 1, 0, 1, 1, 0, 1, 0),
buy_icecream = c(0, 1, 0, 1, 0, 1, 1, 0, 1, 0)
)
# View the data
print(ice_cream_data)
In this dataset:
-
temperature
is in Fahrenheit -
is_weekend
is 1 for weekend, 0 for weekday -
buy_icecream
is 1 if they bought ice cream, 0 if they didn't
Building Our First Decision Tree
Now for the exciting part – let's build our decision tree!
# Create the decision tree model
ice_cream_tree <- rpart(buy_icecream ~ temperature + is_weekend,
data = ice_cream_data,
method = "class")
# Plot the tree
rpart.plot(ice_cream_tree, extra = 106)
Let's break down what's happening here:
-
rpart()
is the function we use to create the decision tree. -
buy_icecream ~ temperature + is_weekend
tells R that we want to predictbuy_icecream
based ontemperature
andis_weekend
. -
data = ice_cream_data
specifies our dataset. -
method = "class"
tells R we're doing a classification task (predicting a category). -
rpart.plot()
creates a visual representation of our tree.
When you run this code, you'll see a beautiful tree diagram. Each node shows a decision rule, and the leaves show the predictions. It's like a flowchart of ice cream decisions!
Understanding the Tree
Let's interpret our ice cream decision tree:
- The top node (root) shows the first split. It might be something like "temperature < 76".
- If true (yes), it goes to the left branch; if false (no), it goes to the right.
- This process continues until it reaches a leaf node, which gives the final prediction.
The numbers in the nodes represent:
- The predicted class (0 or 1)
- The probability of that class
- The percentage of observations in that node
Making Predictions
Now that we have our tree, let's use it to make some predictions!
# Create new data
new_data <- data.frame(
temperature = c(70, 95),
is_weekend = c(1, 0)
)
# Make predictions
predictions <- predict(ice_cream_tree, new_data, type = "class")
# View predictions
print(predictions)
This code predicts whether someone will buy ice cream on a 70°F weekend and a 95°F weekday.
Evaluating the Model
To see how well our model performs, we can use a confusion matrix:
# Make predictions on our original data
predictions <- predict(ice_cream_tree, ice_cream_data, type = "class")
# Create confusion matrix
confusion_matrix <- table(Actual = ice_cream_data$buy_icecream, Predicted = predictions)
# View confusion matrix
print(confusion_matrix)
# Calculate accuracy
accuracy <- sum(diag(confusion_matrix)) / sum(confusion_matrix)
print(paste("Accuracy:", accuracy))
This gives us a quick view of how many predictions were correct and incorrect.
Conclusion
Congratulations! You've just built your first decision tree in R. From installing packages to making predictions, you've covered a lot of ground. Remember, practice makes perfect, so don't be afraid to experiment with different datasets and parameters.
Here's a quick recap of the methods we've used:
Method | Description |
---|---|
install.packages() | Installs R packages |
library() | Loads installed packages |
data.frame() | Creates a data frame |
rpart() | Builds a decision tree |
rpart.plot() | Visualizes the decision tree |
predict() | Makes predictions using the tree |
table() | Creates a confusion matrix |
Keep exploring, keep learning, and most importantly, have fun with data science!
Credits: Image by storyset