Decision Tree

Concept Overview

A decision tree is a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. A tree can be seen as a piecewise constant approximation of the data distribution.

Mathematical Definition

Decision trees recursively partition the feature space such that samples with the same labels or similar target values are grouped together. For classification, the quality of a split is evaluated using an impurity measure, commonly Gini Impurity or Entropy.

Gini Impurity

If a dataset T contains examples from n classes, Gini impurity measures the probability that a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset.

Gini(T) = 1 - Σ_i=1ⁿ (p_i)²

Where p_i is the fraction of items belonging to class i in the dataset T. The algorithm seeks to find the feature and threshold that minimizes the weighted sum of the Gini impurities of the resulting child nodes.

Key Concepts

Recursive Partitioning

The tree is constructed through a greedy, top-down approach known as recursive binary splitting. At each node, all features and possible split points are evaluated, and the one maximizing information gain (or minimizing impurity) is chosen.

Overfitting and Pruning

Decision trees are highly prone to overfitting, where they learn complex, highly specific rules that perfectly classify the training data but generalize poorly. This is mitigated by setting stopping criteria (like maximum depth or minimum samples per leaf) or by pruning the tree post-construction.

Feature Importance

The relative importance of a feature can be calculated as the total reduction in impurity brought by that feature, weighted by the proportion of samples routed to the nodes where the feature was used to split.

Historical Context

Decision trees as a formal algorithmic concept gained prominence with Leo Breiman and colleagues' introduction of the CART (Classification and Regression Trees) algorithm in 1984. Earlier related work included the ID3 algorithm developed by Ross Quinlan in 1986. They remain highly influential today, primarily as the foundational building blocks for powerful ensemble methods like Random Forests and Gradient Boosting Machines.

Real-world Applications

Medical Diagnosis: Interpretable models to trace symptoms and test results to potential diseases.
Customer Churn Prediction: Identifying user attributes and behaviors that lead to service cancellation.
Credit Scoring: Determining risk profiles of loan applicants based on financial history.
Fraud Detection: Flagging suspicious transactions by routing them through learned rule paths.

Related Concepts

Linear Regression — A contrasting parametric method for continuous targets.
K-Means Clustering — An unsupervised approach to partitioning data, contrasting the supervised splits of trees.

Decision Tree

Decision Tree

Concept Overview

Mathematical Definition

Gini Impurity

Key Concepts

Recursive Partitioning

Overfitting and Pruning

Feature Importance

Historical Context

Real-world Applications

Related Concepts

Experience it interactively

More in Machine Learning

Gradient Descent

Perceptron

K-Means Clustering