\n",
" \n",
" \n",
" \n",
"

"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Likelihood Ratio Processes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Contents\n",
"\n",
"- [Likelihood Ratio Processes](#Likelihood-Ratio-Processes) \n",
" - [Overview](#Overview) \n",
" - [Likelihood Ratio Process](#Likelihood-Ratio-Process) \n",
" - [Nature Permanently Draws from Density g](#Nature-Permanently-Draws-from-Density-g) \n",
" - [Nature Permanently Draws from Density f](#Nature-Permanently-Draws-from-Density-f) \n",
" - [Likelihood Ratio Test](#Likelihood-Ratio-Test) \n",
" - [Kullback–Leibler divergence](#Kullback–Leibler-divergence) \n",
" - [Sequels](#Sequels) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from numba import vectorize, njit\n",
"from math import gamma\n",
"%matplotlib inline\n",
"from scipy.integrate import quad"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Overview\n",
"\n",
"This lecture describes likelihood ratio processes and some of their uses.\n",
"\n",
"We’ll use a setting described in [this lecture](https://python-programming.quantecon.org/exchangeable.html).\n",
"\n",
"Among things that we’ll learn are\n",
"\n",
"- A peculiar property of likelihood ratio processes \n",
"- How a likelihood ratio process is a key ingredient in frequentist hypothesis testing \n",
"- How a **receiver operator characteristic curve** summarizes information about a false alarm probability and power in frequentist hypothesis testing \n",
"- How during World War II the United States Navy devised a decision rule that Captain Garret L. Schyler challenged and asked Milton Friedman to justify to him, a topic to be studied in [this lecture](https://python-programming.quantecon.org/wald_friedman.html) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Likelihood Ratio Process\n",
"\n",
"A nonnegative random variable $ W $ has one of two probability density functions, either\n",
"$ f $ or $ g $.\n",
"\n",
"Before the beginning of time, nature once and for all decides whether she will draw a sequence of IID draws from either\n",
"$ f $ or $ g $.\n",
"\n",
"We will sometimes let $ q $ be the density that nature chose once and for all, so\n",
"that $ q $ is either $ f $ or $ g $, permanently.\n",
"\n",
"Nature knows which density it permanently draws from, but we the observers do not.\n",
"\n",
"We do know both $ f $ and $ g $ but we don’t know which density nature\n",
"chose.\n",
"\n",
"But we want to know.\n",
"\n",
"To do that, we use observations.\n",
"\n",
"We observe a sequence $ \\{w_t\\}_{t=1}^T $ of $ T $ IID draws\n",
"from either $ f $ or $ g $.\n",
"\n",
"We want to use these observations to infer whether nature chose $ f $ or\n",
"$ g $.\n",
"\n",
"A **likelihood ratio process** is a useful tool for this task.\n",
"\n",
"To begin, we define key component of a likelihood ratio process, namely, the time $ t $ likelihood ratio as the random variable\n",
"\n",
"$$\n",
"\\ell (w_t)=\\frac{f\\left(w_t\\right)}{g\\left(w_t\\right)},\\quad t\\geq1.\n",
"$$\n",
"\n",
"We assume that $ f $ and $ g $ both put positive probabilities on the\n",
"same intervals of possible realizations of the random variable $ W $.\n",
"\n",
"That means that under the $ g $ density, $ \\ell (w_t)=\n",
"\\frac{f\\left(w_{t}\\right)}{g\\left(w_{t}\\right)} $\n",
"is evidently a nonnegative random variable with mean $ 1 $.\n",
"\n",
"A **likelihood ratio process** for sequence\n",
"$ \\left\\{ w_{t}\\right\\} _{t=1}^{\\infty} $ is defined as\n",
"\n",
"$$\n",
"L\\left(w^{t}\\right)=\\prod_{i=1}^{t} \\ell (w_i),\n",
"$$\n",
"\n",
"where $ w^t=\\{ w_1,\\dots,w_t\\} $ is a history of\n",
"observations up to and including time $ t $.\n",
"\n",
"Sometimes for shorthand we’ll write $ L_t = L(w^t) $.\n",
"\n",
"Notice that the likelihood process satisfies the *recursion* or\n",
"*multiplicative decomposition*\n",
"\n",
"$$\n",
"L(w^t) = \\ell (w_t) L (w^{t-1}) .\n",
"$$\n",
"\n",
"The likelihood ratio and its logarithm are key tools for making\n",
"inferences using a classic frequentist approach due to Neyman and\n",
"Pearson [[NP33]](https://python-programming.quantecon.org/zreferences.html#neyman-pearson).\n",
"\n",
"To help us appreciate how things work, the following Python code evaluates $ f $ and $ g $ as two different\n",
"beta distributions, then computes and simulates an associated likelihood\n",
"ratio process by generating a sequence $ w^t $ from one of the two\n",
"probability distributionss, for example, a sequence of IID draws from $ g $."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# Parameters in the two beta distributions.\n",
"F_a, F_b = 1, 1\n",
"G_a, G_b = 3, 1.2\n",
"\n",
"@vectorize\n",
"def p(x, a, b):\n",
" r = gamma(a + b) / (gamma(a) * gamma(b))\n",
" return r * x** (a-1) * (1 - x) ** (b-1)\n",
"\n",
"# The two density functions.\n",
"f = njit(lambda x: p(x, F_a, F_b))\n",
"g = njit(lambda x: p(x, G_a, G_b))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"@njit\n",
"def simulate(a, b, T=50, N=500):\n",
" '''\n",
" Generate N sets of T observations of the likelihood ratio,\n",
" return as N x T matrix.\n",
"\n",
" '''\n",
"\n",
" l_arr = np.empty((N, T))\n",
"\n",
" for i in range(N):\n",
"\n",
" for j in range(T):\n",
" w = np.random.beta(a, b)\n",
" l_arr[i, j] = f(w) / g(w)\n",
"\n",
" return l_arr"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Nature Permanently Draws from Density g\n",
"\n",
"We first simulate the likelihood ratio process when nature permanently\n",
"draws from $ g $."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"l_arr_g = simulate(G_a, G_b)\n",
"l_seq_g = np.cumprod(l_arr_g, axis=1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"N, T = l_arr_g.shape\n",
"\n",
"for i in range(N):\n",
"\n",
" plt.plot(range(T), l_seq_g[i, :], color='b', lw=0.8, alpha=0.5)\n",
"\n",
"plt.ylim([0, 3])\n",
"plt.title(\"$L(w^{t})$ paths\");"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Evidently, as sample length $ T $ grows, most probability mass\n",
"shifts toward zero\n",
"\n",
"To see it this more clearly clearly, we plot over time the fraction of\n",
"paths $ L\\left(w^{t}\\right) $ that fall in the interval\n",
"$ \\left[0, 0.01\\right] $."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"plt.plot(range(T), np.sum(l_seq_g <= 0.01, axis=0) / N)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Despite the evident convergence of most probability mass to a\n",
"very small interval near $ 0 $, the unconditional mean of\n",
"$ L\\left(w^t\\right) $ under probability density $ g $ is\n",
"identically $ 1 $ for all $ t $.\n",
"\n",
"To verify this assertion, first notice that as mentioned earlier the unconditional mean\n",
"$ E_{0}\\left[\\ell \\left(w_{t}\\right)\\bigm|q=g\\right] $ is $ 1 $ for\n",
"all $ t $:\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"E_{0}\\left[\\ell \\left(w_{t}\\right)\\bigm|q=g\\right] &=\\int\\frac{f\\left(w_{t}\\right)}{g\\left(w_{t}\\right)}g\\left(w_{t}\\right)dw_{t} \\\\\n",
" &=\\int f\\left(w_{t}\\right)dw_{t} \\\\\n",
" &=1,\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"which immediately implies\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"E_{0}\\left[L\\left(w^{1}\\right)\\bigm|q=g\\right] &=E_{0}\\left[\\ell \\left(w_{1}\\right)\\bigm|q=g\\right]\\\\\n",
" &=1.\\\\\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"Because $ L(w^t) = \\ell(w_t) L(w^{t-1}) $ and\n",
"$ \\{w_t\\}_{t=1}^t $ is an IID sequence, we have\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"E_{0}\\left[L\\left(w^{t}\\right)\\bigm|q=g\\right] &=E_{0}\\left[L\\left(w^{t-1}\\right)\\ell \\left(w_{t}\\right)\\bigm|q=g\\right] \\\\\n",
" &=E_{0}\\left[L\\left(w^{t-1}\\right)E\\left[\\ell \\left(w_{t}\\right)\\bigm|q=g,w^{t-1}\\right]\\bigm|q=g\\right] \\\\\n",
" &=E_{0}\\left[L\\left(w^{t-1}\\right)E\\left[\\ell \\left(w_{t}\\right)\\bigm|q=g\\right]\\bigm|q=g\\right] \\\\\n",
" &=E_{0}\\left[L\\left(w^{t-1}\\right)\\bigm|q=g\\right] \\\\\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"for any $ t \\geq 1 $.\n",
"\n",
"Mathematical induction implies\n",
"$ E_{0}\\left[L\\left(w^{t}\\right)\\bigm|q=g\\right]=1 $ for all\n",
"$ t \\geq 1 $."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Peculiar Property of Likelihood Ratio Process\n",
"\n",
"How can $ E_{0}\\left[L\\left(w^{t}\\right)\\bigm|q=g\\right]=1 $ possibly be true when most probability mass of the likelihood\n",
"ratio process is piling up near $ 0 $ as\n",
"$ t \\rightarrow + \\infty $?\n",
"\n",
"The answer has to be that as $ t \\rightarrow + \\infty $, the\n",
"distribution of $ L_t $ becomes more and more fat-tailed:\n",
"enough mass shifts to larger and larger values of $ L_t $ to make\n",
"the mean of $ L_t $ continue to be one despite most of the probability mass piling up\n",
"near $ 0 $.\n",
"\n",
"To illustrate this peculiar property, we simulate many paths and\n",
"calculate the unconditional mean of $ L\\left(w^t\\right) $ by\n",
"averaging across these many paths at each $ t $."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"l_arr_g = simulate(G_a, G_b, N=50000)\n",
"l_seq_g = np.cumprod(l_arr_g, axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It would be useful to use simulations to verify that unconditional means\n",
"$ E_{0}\\left[L\\left(w^{t}\\right)\\right] $ equal unity by averaging across sample\n",
"paths.\n",
"\n",
"But it would be too challenging for us to that here simply by applying a standard Monte Carlo simulation approach.\n",
"\n",
"The reason is that the distribution of $ L\\left(w^{t}\\right) $ is extremely skewed for large values of $ t $.\n",
"\n",
"Because the probability density in the right tail is close to $ 0 $, it just takes too much computer time to sample enough points from the right tail.\n",
"\n",
"Instead, the following code just illustrates that the unconditional means of $ l(w_t) $ are $ 1 $.\n",
"\n",
"While sample averages hover around their population means of $ 1 $, there is evidently quite a bit\n",
"of variability."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"N, T = l_arr_g.shape\n",
"plt.plot(range(T), np.mean(l_arr_g, axis=0))\n",
"plt.hlines(1, 0, T, linestyle='--')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Nature Permanently Draws from Density f\n",
"\n",
"Now suppose that before time $ 0 $ nature permanently decided to draw repeatedly from density $ f $.\n",
"\n",
"While the mean of the likelihood ratio $ \\ell \\left(w_{t}\\right) $ under density\n",
"$ g $ is $ 1 $, its mean under the density $ f $ exceeds one.\n",
"\n",
"To see this, we compute\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"E_{0}\\left[\\ell \\left(w_{t}\\right)\\bigm|q=f\\right] &=\\int\\frac{f\\left(w_{t}\\right)}{g\\left(w_{t}\\right)}f\\left(w_{t}\\right)dw_{t} \\\\\n",
" &=\\int\\frac{f\\left(w_{t}\\right)}{g\\left(w_{t}\\right)}\\frac{f\\left(w_{t}\\right)}{g\\left(w_{t}\\right)}g\\left(w_{t}\\right)dw_{t} \\\\\n",
" &=\\int \\ell \\left(w_{t}\\right)^{2}g\\left(w_{t}\\right)dw_{t} \\\\\n",
" &=E_{0}\\left[\\ell \\left(w_{t}\\right)^{2}\\mid q=g\\right] \\\\\n",
" &=E_{0}\\left[\\ell \\left(w_{t}\\right)\\mid q=g\\right]^{2}+Var\\left(\\ell \\left(w_{t}\\right)\\mid q=g\\right) \\\\\n",
" &>E_{0}\\left[\\ell \\left(w_{t}\\right)\\mid q=g\\right]^{2} = 1 \\\\\n",
" \\end{aligned}\n",
"$$\n",
"\n",
"This in turn implies that the unconditional mean of the likelihood ratio process $ L(w^t) $\n",
"diverges toward $ + \\infty $.\n",
"\n",
"Simulations below confirm this conclusion.\n",
"\n",
"Please note the scale of the $ y $ axis."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"l_arr_f = simulate(F_a, F_b, N=50000)\n",
"l_seq_f = np.cumprod(l_arr_f, axis=1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"N, T = l_arr_f.shape\n",
"plt.plot(range(T), np.mean(l_seq_f, axis=0))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We also plot the probability that $ L\\left(w^t\\right) $ falls into\n",
"the interval $ [10000, \\infty) $ as a function of time and watch how\n",
"fast probability mass diverges to $ +\\infty $."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"plt.plot(range(T), np.sum(l_seq_f > 10000, axis=0) / N)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Likelihood Ratio Test\n",
"\n",
"We now describe how to employ the machinery\n",
"of Neyman and Pearson [[NP33]](https://python-programming.quantecon.org/zreferences.html#neyman-pearson) to test the hypothesis that history $ w^t $ is generated by repeated\n",
"IID draws from density $ g $.\n",
"\n",
"Denote $ q $ as the data generating process, so that\n",
"$ q=f \\text{ or } g $.\n",
"\n",
"Upon observing a sample $ \\{W_i\\}_{i=1}^t $, we want to decide\n",
"whether nature is drawing from $ g $ or from $ f $ by performing a (frequentist)\n",
"hypothesis test.\n",
"\n",
"We specify\n",
"\n",
"- Null hypothesis $ H_0 $: $ q=f $, \n",
"- Alternative hypothesis $ H_1 $: $ q=g $. \n",
"\n",
"\n",
"Neyman and Pearson proved that the best way to test this hypothesis is to use a **likelihood ratio test** that takes the\n",
"form:\n",
"\n",
"- reject $ H_0 $ if $ L(W^t) < c $, \n",
"- accept $ H_0 $ otherwise. \n",
"\n",
"\n",
"where $ c $ is a given discrimination threshold, to be chosen in a way we’ll soon describe.\n",
"\n",
"This test is *best* in the sense that it is a **uniformly most powerful** test.\n",
"\n",
"To understand what this means, we have to define probabilities of two important events that\n",
"allow us to characterize a test associated with a given\n",
"threshold $ c $.\n",
"\n",
"The two probabilities are:\n",
"\n",
"- Probability of detection (= power = 1 minus probability\n",
" of Type II error): \n",
" $$\n",
" 1-\\beta \\equiv \\Pr\\left\\{ L\\left(w^{t}\\right)