Probability & Statistics

Survival Analysis

Estimate time-to-event outcomes using the Kaplan-Meier estimator.

Survival Analysis

Concept Overview

Survival analysis is a branch of statistics used for analyzing the expected duration of time until one or more events happen, such as death in biological organisms or failure in mechanical systems. It focuses on dealing with censored data—cases where the event of interest has not occurred by the end of the study period or the subject drops out before the event is observed.

Mathematical Definition

The fundamental object in survival analysis is the Survival Function, denoted as S(t). It represents the probability that a subject will survive past time t.

S(t) = P(T > t)

The most common non-parametric method to estimate the survival function is the Kaplan-Meier Estimator:

S(t) = ∏t_i ≤ t (1 - di / ni)
Where:
  • ti is a time when at least one event happened.
  • di is the number of events (e.g., deaths or failures) that happened at time ti.
  • ni is the number of individuals known to have survived (not yet experienced the event or been censored) up to time ti.

Key Concepts

Censoring

Censoring occurs when we have some information about individual survival time, but we don't know the exact survival time. The most common type is right-censoring, which happens when a subject leaves the study before an event occurs or the study ends before the event has occurred. Survival analysis techniques like the Kaplan-Meier estimator are specifically designed to incorporate censored data efficiently without biasing the results.

Hazard Function

The hazard function, h(t), assesses the instantaneous risk of demise at time t, conditional on survival to that time:

h(t) = limΔt → 0 P(t ≤ T < t + Δt | T ≥ t) / Δt

Weibull Distribution

In parametric survival analysis, specific distributions are assumed for survival times. The Weibull distribution is widely used because its hazard rate can be increasing, decreasing, or constant based on its shape parameter (k):

S(t) = exp(-(t / λ)k)
  • k < 1: Decreasing hazard rate (e.g., high infant mortality).
  • k = 1: Constant hazard rate (Exponential distribution, random events).
  • k > 1: Increasing hazard rate (e.g., aging process or wear and tear).

Historical Context

Survival analysis has roots spanning several centuries, primarily beginning with the construction of life tables by actuaries like John Graunt in 1662 and later Edmond Halley in 1693. These early works laid the foundation for demography and life insurance by estimating life expectancy from mortality records.

The field experienced a major leap forward in 1958 when Edward L. Kaplan and Paul Meier independently submitted papers to the Journal of the American Statistical Association proposing what is now known as the Kaplan-Meier estimator. The editor suggested they combine their work into a single paper, which became one of the most cited statistics papers of all time, revolutionizing medical research by providing a robust way to handle censored clinical trial data.

Real-world Applications

  • Medicine and Healthcare: Estimating patient survival rates after receiving a specific treatment, such as a new cancer drug, or time until disease recurrence.
  • Engineering (Reliability Analysis): Predicting the time until mechanical failure of a machine part or electronic component to determine warranty periods and maintenance schedules.
  • Customer Retention: Analyzing "churn rate" to estimate how long a user will remain a subscriber to a service before canceling.
  • Sociology and Economics: Modeling the duration of unemployment spells or the time until a convict reoffends (recidivism).

Related Concepts

  • Probability Distributions — Theoretical distributions that model failure times.
  • Monte Carlo Simulation — Used to computationally estimate complex survival probabilities.
  • Poisson Process — Relates to constant hazard rates (Exponential distribution).

Experience it interactively

Adjust parameters, observe in real time, and build deep intuition with Riano’s interactive Survival Analysis module.

Try Survival Analysis on Riano →

More in Probability & Statistics