Computer Science

Binary Search

Finding elements in sorted data by halving the search space.

Binary Search: Divide and Conquer in Sorted Data

Concept Overview

Binary search is one of the most fundamental and elegant algorithms in computer science. Given a sorted collection of elements, it finds a target value by repeatedly dividing the search space in half. At each step, it compares the target to the middle element and eliminates the half that cannot contain the target. This simple strategy achieves logarithmic time complexity, making it exponentially faster than linear search for large datasets.

Mathematical Definition

Given a sorted array A[0..n-1] and a target value T, binary search operates as follows:

BinarySearch(A, T):
  lo = 0, hi = n - 1
  while lo ≤ hi:
    mid = ⌊lo + (hi - lo) / 2⌋
    if A[mid] = T: return mid
    if A[mid] < T: lo = mid + 1
    else: hi = mid - 1
  return NOT_FOUND
Time Complexity: O(log n)
Space Complexity: O(1) iterative, O(log n) recursive
Maximum comparisons for n elements: ⌈log₂(n + 1)⌉

Key Concepts

Logarithmic Time Complexity

The power of binary search comes from its O(log n) time complexity. Each comparison eliminates half of the remaining elements, so the number of steps grows logarithmically with input size. Searching through 1 million elements requires at most 20 comparisons; searching through 1 billion requires at most 30. This dramatic efficiency is why binary search is ubiquitous in computing.

The Sorted Precondition

Binary search requires the input to be sorted. This precondition is crucial: without it, eliminating half the search space based on a single comparison would not be valid. In practice, this means data is often sorted once (O(n log n)) and then searched many times (O(log n) each), amortizing the sorting cost over multiple queries.

Integer Overflow Bug

A subtle and historically significant bug lurks in the naive midpoint calculation mid = (lo + hi) / 2. When lo and hi are large, their sum can overflow the integer type. The correct formula is mid = lo + (hi - lo) / 2. This bug went undetected in the Java standard library for nine years before being discovered by Joshua Bloch in 2006, affecting billions of deployed programs.

Variants and Extensions

  • Lower bound (bisect_left): Find the first position where a value could be inserted while maintaining sorted order.
  • Upper bound (bisect_right): Find the last position where a value could be inserted.
  • Exponential search: Combine exponential doubling with binary search when the array size is unknown.
  • Interpolation search: Estimate the position based on the target value, achieving O(log log n) on uniformly distributed data.

Historical Context

The idea of binary search dates back to 1946, when John Mauchly described it during the Moore School Lectures on computing. However, the first correct implementation was not published until 1962. As Jon Bentley noted in Programming Pearls, while the concept is simple enough to explain in a sentence, writing a bug-free implementation is surprisingly difficult—over 90% of professional programmers failed to write a correct binary search in a study he conducted.

The algorithm's importance extends beyond arrays. Binary search is the conceptual foundation for binary search trees, B-trees (used in databases and file systems), and the bisection method in numerical analysis for finding roots of equations. The principle of halving the problem space appears throughout computer science and mathematics as one of the most powerful problem-solving strategies.

Real-world Applications

  • Database indexing: B-trees and B+ trees use binary search at each node to navigate indexes containing billions of records. Every SQL query using an index performs multiple binary searches.
  • Version control (git bisect): Git uses binary search to find the exact commit that introduced a bug, reducing the search from thousands of commits to roughly log₂(n) tests.
  • System libraries: Standard library functions like C's bsearch(), Python's bisect module, and Java's Arrays.binarySearch() are used throughout production software.
  • Network routing: IP routing tables use longest-prefix matching based on binary search through sorted prefix tables.
  • Numerical methods: The bisection method for root finding applies binary search to continuous functions, guaranteed to converge whenever a root exists in the interval.
  • Competitive programming: "Binary search the answer" is a powerful technique where the search space of possible answers is binary-searched using a feasibility check as the comparison function.

Related Concepts

  • Sorting Algorithms — binary search requires sorted input; the two concepts are deeply intertwined
  • Gradient Descent — iterative optimization that similarly narrows in on a solution, but in continuous rather than discrete space
  • Logistic Map & Chaos — bifurcation (splitting in two) shares the same Latin root as binary search's "divide in half" strategy
  • Taylor Series — polynomial approximation improves with each term, analogous to binary search narrowing the interval with each step

Experience it interactively

Adjust parameters, observe in real time, and build deep intuition with Riano’s interactive Binary Search module.

Try Binary Search on Riano →

More in Computer Science