Binary Search

Binary Search: Divide and Conquer in Sorted Data

Concept Overview

Binary search is one of the most fundamental and elegant algorithms in computer science. Given a sorted collection of elements, it finds a target value by repeatedly dividing the search space in half. At each step, it compares the target to the middle element and eliminates the half that cannot contain the target. This simple strategy achieves logarithmic time complexity, making it exponentially faster than linear search for large datasets.

Mathematical Definition

Given a sorted array A[0..n-1] and a target value T, binary search operates as follows:

BinarySearch(A, T):

lo = 0, hi = n - 1

while lo ≤ hi:

mid = ⌊lo + (hi - lo) / 2⌋

if A[mid] = T: return mid

if A[mid] < T: lo = mid + 1

else: hi = mid - 1

return NOT_FOUND

Time Complexity: O(log n)

Space Complexity: O(1) iterative, O(log n) recursive

Maximum comparisons for n elements: ⌈log₂(n + 1)⌉

Key Concepts

Logarithmic Time Complexity

The power of binary search comes from its O(log n) time complexity. Each comparison eliminates half of the remaining elements, so the number of steps grows logarithmically with input size. Searching through 1 million elements requires at most 20 comparisons; searching through 1 billion requires at most 30. This dramatic efficiency is why binary search is ubiquitous in computing.

The Sorted Precondition

Binary search requires the input to be sorted. This precondition is crucial: without it, eliminating half the search space based on a single comparison would not be valid. In practice, this means data is often sorted once (O(n log n)) and then searched many times (O(log n) each), amortizing the sorting cost over multiple queries.

Integer Overflow Bug

A subtle and historically significant bug lurks in the naive midpoint calculation mid = (lo + hi) / 2. When lo and hi are large, their sum can overflow the integer type. The correct formula is mid = lo + (hi - lo) / 2. This bug went undetected in the Java standard library for nine years before being discovered by Joshua Bloch in 2006, affecting billions of deployed programs.

Variants and Extensions

Lower bound (bisect_left): Find the first position where a value could be inserted while maintaining sorted order.
Upper bound (bisect_right): Find the last position where a value could be inserted.
Exponential search: Combine exponential doubling with binary search when the array size is unknown.
Interpolation search: Estimate the position based on the target value, achieving O(log log n) on uniformly distributed data.

Historical Context

The idea of binary search dates back to 1946, when John Mauchly described it during the Moore School Lectures on computing. However, the first correct implementation was not published until 1962. As Jon Bentley noted in Programming Pearls, while the concept is simple enough to explain in a sentence, writing a bug-free implementation is surprisingly difficult—over 90% of professional programmers failed to write a correct binary search in a study he conducted.

The algorithm's importance extends beyond arrays. Binary search is the conceptual foundation for binary search trees, B-trees (used in databases and file systems), and the bisection method in numerical analysis for finding roots of equations. The principle of halving the problem space appears throughout computer science and mathematics as one of the most powerful problem-solving strategies.

Real-world Applications

Database indexing: B-trees and B+ trees use binary search at each node to navigate indexes containing billions of records. Every SQL query using an index performs multiple binary searches.
Version control (git bisect): Git uses binary search to find the exact commit that introduced a bug, reducing the search from thousands of commits to roughly log₂(n) tests.
System libraries: Standard library functions like C's bsearch(), Python's bisect module, and Java's Arrays.binarySearch() are used throughout production software.
Network routing: IP routing tables use longest-prefix matching based on binary search through sorted prefix tables.
Numerical methods: The bisection method for root finding applies binary search to continuous functions, guaranteed to converge whenever a root exists in the interval.
Competitive programming: "Binary search the answer" is a powerful technique where the search space of possible answers is binary-searched using a feasibility check as the comparison function.

Related Concepts

Sorting Algorithms — binary search requires sorted input; the two concepts are deeply intertwined
Gradient Descent — iterative optimization that similarly narrows in on a solution, but in continuous rather than discrete space
Logistic Map & Chaos — bifurcation (splitting in two) shares the same Latin root as binary search's "divide in half" strategy
Taylor Series — polynomial approximation improves with each term, analogous to binary search narrowing the interval with each step

Binary Search: Divide and Conquer in Sorted Data

Concept Overview

Mathematical Definition

Key Concepts

Logarithmic Time Complexity

The Sorted Precondition

Integer Overflow Bug

Variants and Extensions

Historical Context

Real-world Applications

Related Concepts

Experience it interactively

More in Computer Science

Sorting Algorithms

A* Pathfinding

Dijkstra's Algorithm