{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Likelihood Ratio Processes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Contents\n", "\n", "- [Likelihood Ratio Processes](#Likelihood-Ratio-Processes) \n", " - [Overview](#Overview) \n", " - [Likelihood Ratio Process](#Likelihood-Ratio-Process) \n", " - [Nature Permanently Draws from Density g](#Nature-Permanently-Draws-from-Density-g) \n", " - [Nature Permanently Draws from Density f](#Nature-Permanently-Draws-from-Density-f) \n", " - [Likelihood Ratio Test](#Likelihood-Ratio-Test) \n", " - [Kullback–Leibler divergence](#Kullback–Leibler-divergence) \n", " - [Sequels](#Sequels) " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from numba import vectorize, njit\n", "from math import gamma\n", "%matplotlib inline\n", "from scipy.integrate import quad" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview\n", "\n", "This lecture describes likelihood ratio processes and some of their uses.\n", "\n", "We’ll use a setting described in [this lecture](https://python-programming.quantecon.org/exchangeable.html).\n", "\n", "Among things that we’ll learn are\n", "\n", "- A peculiar property of likelihood ratio processes \n", "- How a likelihood ratio process is a key ingredient in frequentist hypothesis testing \n", "- How a **receiver operator characteristic curve** summarizes information about a false alarm probability and power in frequentist hypothesis testing \n", "- How during World War II the United States Navy devised a decision rule that Captain Garret L. Schyler challenged and asked Milton Friedman to justify to him, a topic to be studied in [this lecture](https://python-programming.quantecon.org/wald_friedman.html) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Likelihood Ratio Process\n", "\n", "A nonnegative random variable $W$ has one of two probability density functions, either\n", "$f$ or $g$.\n", "\n", "Before the beginning of time, nature once and for all decides whether she will draw a sequence of IID draws from either\n", "$f$ or $g$.\n", "\n", "We will sometimes let $q$ be the density that nature chose once and for all, so\n", "that $q$ is either $f$ or $g$, permanently.\n", "\n", "Nature knows which density it permanently draws from, but we the observers do not.\n", "\n", "We do know both $f$ and $g$ but we don’t know which density nature\n", "chose.\n", "\n", "But we want to know.\n", "\n", "To do that, we use observations.\n", "\n", "We observe a sequence $\\{w_t\\}_{t=1}^T$ of $T$ IID draws\n", "from either $f$ or $g$.\n", "\n", "We want to use these observations to infer whether nature chose $f$ or\n", "$g$.\n", "\n", "A **likelihood ratio process** is a useful tool for this task.\n", "\n", "To begin, we define key component of a likelihood ratio process, namely, the time $t$ likelihood ratio as the random variable\n", "\n", "$$\n", "\\ell (w_t)=\\frac{f\\left(w_t\\right)}{g\\left(w_t\\right)},\\quad t\\geq1.\n", "$$\n", "\n", "We assume that $f$ and $g$ both put positive probabilities on the\n", "same intervals of possible realizations of the random variable $W$.\n", "\n", "That means that under the $g$ density, $\\ell (w_t)=\n", "\\frac{f\\left(w_{t}\\right)}{g\\left(w_{t}\\right)}$\n", "is evidently a nonnegative random variable with mean $1$.\n", "\n", "A **likelihood ratio process** for sequence\n", "$\\left\\{ w_{t}\\right\\} _{t=1}^{\\infty}$ is defined as\n", "\n", "$$\n", "L\\left(w^{t}\\right)=\\prod_{i=1}^{t} \\ell (w_i),\n", "$$\n", "\n", "where $w^t=\\{ w_1,\\dots,w_t\\}$ is a history of\n", "observations up to and including time $t$.\n", "\n", "Sometimes for shorthand we’ll write $L_t = L(w^t)$.\n", "\n", "Notice that the likelihood process satisfies the *recursion* or\n", "*multiplicative decomposition*\n", "\n", "$$\n", "L(w^t) = \\ell (w_t) L (w^{t-1}) .\n", "$$\n", "\n", "The likelihood ratio and its logarithm are key tools for making\n", "inferences using a classic frequentist approach due to Neyman and\n", "Pearson [[NP33]](https://python-programming.quantecon.org/zreferences.html#neyman-pearson).\n", "\n", "To help us appreciate how things work, the following Python code evaluates $f$ and $g$ as two different\n", "beta distributions, then computes and simulates an associated likelihood\n", "ratio process by generating a sequence $w^t$ from one of the two\n", "probability distributionss, for example, a sequence of IID draws from $g$." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "# Parameters in the two beta distributions.\n", "F_a, F_b = 1, 1\n", "G_a, G_b = 3, 1.2\n", "\n", "@vectorize\n", "def p(x, a, b):\n", " r = gamma(a + b) / (gamma(a) * gamma(b))\n", " return r * x** (a-1) * (1 - x) ** (b-1)\n", "\n", "# The two density functions.\n", "f = njit(lambda x: p(x, F_a, F_b))\n", "g = njit(lambda x: p(x, G_a, G_b))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "@njit\n", "def simulate(a, b, T=50, N=500):\n", " '''\n", " Generate N sets of T observations of the likelihood ratio,\n", " return as N x T matrix.\n", "\n", " '''\n", "\n", " l_arr = np.empty((N, T))\n", "\n", " for i in range(N):\n", "\n", " for j in range(T):\n", " w = np.random.beta(a, b)\n", " l_arr[i, j] = f(w) / g(w)\n", "\n", " return l_arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Nature Permanently Draws from Density g\n", "\n", "We first simulate the likelihood ratio process when nature permanently\n", "draws from $g$." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "l_arr_g = simulate(G_a, G_b)\n", "l_seq_g = np.cumprod(l_arr_g, axis=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "N, T = l_arr_g.shape\n", "\n", "for i in range(N):\n", "\n", " plt.plot(range(T), l_seq_g[i, :], color='b', lw=0.8, alpha=0.5)\n", "\n", "plt.ylim([0, 3])\n", "plt.title(\"$L(w^{t})$ paths\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Evidently, as sample length $T$ grows, most probability mass\n", "shifts toward zero\n", "\n", "To see it this more clearly clearly, we plot over time the fraction of\n", "paths $L\\left(w^{t}\\right)$ that fall in the interval\n", "$\\left[0, 0.01\\right]$." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "plt.plot(range(T), np.sum(l_seq_g <= 0.01, axis=0) / N)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Despite the evident convergence of most probability mass to a\n", "very small interval near $0$, the unconditional mean of\n", "$L\\left(w^t\\right)$ under probability density $g$ is\n", "identically $1$ for all $t$.\n", "\n", "To verify this assertion, first notice that as mentioned earlier the unconditional mean\n", "$E_{0}\\left[\\ell \\left(w_{t}\\right)\\bigm|q=g\\right]$ is $1$ for\n", "all $t$:\n", "\n", "\n", "\\begin{aligned}\n", "E_{0}\\left[\\ell \\left(w_{t}\\right)\\bigm|q=g\\right] &=\\int\\frac{f\\left(w_{t}\\right)}{g\\left(w_{t}\\right)}g\\left(w_{t}\\right)dw_{t} \\\\\n", " &=\\int f\\left(w_{t}\\right)dw_{t} \\\\\n", " &=1,\n", "\\end{aligned}\n", "\n", "\n", "which immediately implies\n", "\n", "\n", "\\begin{aligned}\n", "E_{0}\\left[L\\left(w^{1}\\right)\\bigm|q=g\\right] &=E_{0}\\left[\\ell \\left(w_{1}\\right)\\bigm|q=g\\right]\\\\\n", " &=1.\\\\\n", "\\end{aligned}\n", "\n", "\n", "Because $L(w^t) = \\ell(w_t) L(w^{t-1})$ and\n", "$\\{w_t\\}_{t=1}^t$ is an IID sequence, we have\n", "\n", "\n", "\\begin{aligned}\n", "E_{0}\\left[L\\left(w^{t}\\right)\\bigm|q=g\\right] &=E_{0}\\left[L\\left(w^{t-1}\\right)\\ell \\left(w_{t}\\right)\\bigm|q=g\\right] \\\\\n", " &=E_{0}\\left[L\\left(w^{t-1}\\right)E\\left[\\ell \\left(w_{t}\\right)\\bigm|q=g,w^{t-1}\\right]\\bigm|q=g\\right] \\\\\n", " &=E_{0}\\left[L\\left(w^{t-1}\\right)E\\left[\\ell \\left(w_{t}\\right)\\bigm|q=g\\right]\\bigm|q=g\\right] \\\\\n", " &=E_{0}\\left[L\\left(w^{t-1}\\right)\\bigm|q=g\\right] \\\\\n", "\\end{aligned}\n", "\n", "\n", "for any $t \\geq 1$.\n", "\n", "Mathematical induction implies\n", "$E_{0}\\left[L\\left(w^{t}\\right)\\bigm|q=g\\right]=1$ for all\n", "$t \\geq 1$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Peculiar Property of Likelihood Ratio Process\n", "\n", "How can $E_{0}\\left[L\\left(w^{t}\\right)\\bigm|q=g\\right]=1$ possibly be true when most probability mass of the likelihood\n", "ratio process is piling up near $0$ as\n", "$t \\rightarrow + \\infty$?\n", "\n", "The answer has to be that as $t \\rightarrow + \\infty$, the\n", "distribution of $L_t$ becomes more and more fat-tailed:\n", "enough mass shifts to larger and larger values of $L_t$ to make\n", "the mean of $L_t$ continue to be one despite most of the probability mass piling up\n", "near $0$.\n", "\n", "To illustrate this peculiar property, we simulate many paths and\n", "calculate the unconditional mean of $L\\left(w^t\\right)$ by\n", "averaging across these many paths at each $t$." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "l_arr_g = simulate(G_a, G_b, N=50000)\n", "l_seq_g = np.cumprod(l_arr_g, axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It would be useful to use simulations to verify that unconditional means\n", "$E_{0}\\left[L\\left(w^{t}\\right)\\right]$ equal unity by averaging across sample\n", "paths.\n", "\n", "But it would be too challenging for us to that here simply by applying a standard Monte Carlo simulation approach.\n", "\n", "The reason is that the distribution of $L\\left(w^{t}\\right)$ is extremely skewed for large values of $t$.\n", "\n", "Because the probability density in the right tail is close to $0$, it just takes too much computer time to sample enough points from the right tail.\n", "\n", "Instead, the following code just illustrates that the unconditional means of $l(w_t)$ are $1$.\n", "\n", "While sample averages hover around their population means of $1$, there is evidently quite a bit\n", "of variability." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "N, T = l_arr_g.shape\n", "plt.plot(range(T), np.mean(l_arr_g, axis=0))\n", "plt.hlines(1, 0, T, linestyle='--')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Nature Permanently Draws from Density f\n", "\n", "Now suppose that before time $0$ nature permanently decided to draw repeatedly from density $f$.\n", "\n", "While the mean of the likelihood ratio $\\ell \\left(w_{t}\\right)$ under density\n", "$g$ is $1$, its mean under the density $f$ exceeds one.\n", "\n", "To see this, we compute\n", "\n", "\n", "\\begin{aligned}\n", "E_{0}\\left[\\ell \\left(w_{t}\\right)\\bigm|q=f\\right] &=\\int\\frac{f\\left(w_{t}\\right)}{g\\left(w_{t}\\right)}f\\left(w_{t}\\right)dw_{t} \\\\\n", " &=\\int\\frac{f\\left(w_{t}\\right)}{g\\left(w_{t}\\right)}\\frac{f\\left(w_{t}\\right)}{g\\left(w_{t}\\right)}g\\left(w_{t}\\right)dw_{t} \\\\\n", " &=\\int \\ell \\left(w_{t}\\right)^{2}g\\left(w_{t}\\right)dw_{t} \\\\\n", " &=E_{0}\\left[\\ell \\left(w_{t}\\right)^{2}\\mid q=g\\right] \\\\\n", " &=E_{0}\\left[\\ell \\left(w_{t}\\right)\\mid q=g\\right]^{2}+Var\\left(\\ell \\left(w_{t}\\right)\\mid q=g\\right) \\\\\n", " &>E_{0}\\left[\\ell \\left(w_{t}\\right)\\mid q=g\\right]^{2} = 1 \\\\\n", " \\end{aligned}\n", "\n", "\n", "This in turn implies that the unconditional mean of the likelihood ratio process $L(w^t)$\n", "diverges toward $+ \\infty$.\n", "\n", "Simulations below confirm this conclusion.\n", "\n", "Please note the scale of the $y$ axis." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "l_arr_f = simulate(F_a, F_b, N=50000)\n", "l_seq_f = np.cumprod(l_arr_f, axis=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "N, T = l_arr_f.shape\n", "plt.plot(range(T), np.mean(l_seq_f, axis=0))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also plot the probability that $L\\left(w^t\\right)$ falls into\n", "the interval $[10000, \\infty)$ as a function of time and watch how\n", "fast probability mass diverges to $+\\infty$." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "plt.plot(range(T), np.sum(l_seq_f > 10000, axis=0) / N)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Likelihood Ratio Test\n", "\n", "We now describe how to employ the machinery\n", "of Neyman and Pearson [[NP33]](https://python-programming.quantecon.org/zreferences.html#neyman-pearson) to test the hypothesis that history $w^t$ is generated by repeated\n", "IID draws from density $g$.\n", "\n", "Denote $q$ as the data generating process, so that\n", "$q=f \\text{ or } g$.\n", "\n", "Upon observing a sample $\\{W_i\\}_{i=1}^t$, we want to decide\n", "whether nature is drawing from $g$ or from $f$ by performing a (frequentist)\n", "hypothesis test.\n", "\n", "We specify\n", "\n", "- Null hypothesis $H_0$: $q=f$, \n", "- Alternative hypothesis $H_1$: $q=g$. \n", "\n", "\n", "Neyman and Pearson proved that the best way to test this hypothesis is to use a **likelihood ratio test** that takes the\n", "form:\n", "\n", "- reject $H_0$ if $L(W^t) < c$, \n", "- accept $H_0$ otherwise. \n", "\n", "\n", "where $c$ is a given discrimination threshold, to be chosen in a way we’ll soon describe.\n", "\n", "This test is *best* in the sense that it is a **uniformly most powerful** test.\n", "\n", "To understand what this means, we have to define probabilities of two important events that\n", "allow us to characterize a test associated with a given\n", "threshold $c$.\n", "\n", "The two probabilities are:\n", "\n", "- Probability of detection (= power = 1 minus probability\n", " of Type II error): \n", " $$\n", " 1-\\beta \\equiv \\Pr\\left\\{ L\\left(w^{t}\\right) np.log(c)\n", " axs[nr, nc].fill_between(x[1:][ind], hist[ind], alpha=0.5, label=label)\n", "\n", " axs[nr, nc].legend()\n", " axs[nr, nc].set_title(f\"t={t}\")\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The graph below shows more clearly that, when we hold the threshold\n", " c  fixed, the probability of detection monotonically increases with increases in\n", " t  and that the probability of a false alarm monotonically decreases." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "PD = np.empty(T)\n", "PFA = np.empty(T)\n", "\n", "for t in range(T):\n", " PD[t] = np.sum(l_seq_g[:, t] < c) / N\n", " PFA[t] = np.sum(l_seq_f[:, t] < c) / N\n", "\n", "plt.plot(range(T), PD, label=\"Probability of detection\")\n", "plt.plot(range(T), PFA, label=\"Probability of false alarm\")\n", "plt.xlabel(\"t\")\n", "plt.title(\"c=1\")\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For a given sample size  t , the threshold  c  uniquely pins down probabilities\n", "of both types of error.\n", "\n", "If for a fixed  t  we now free up and move  c , we will sweep out the probability\n", "of detection as a function of the probability of false alarm.\n", "\n", "This produces what is called a [receiver operating characteristic\n", "curve](https://en.wikipedia.org/wiki/Receiver_operating_characteristic).\n", "\n", "Below, we plot receiver operating characteristic curves for different\n", "sample sizes  t ." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "PFA = np.arange(0, 100, 1)\n", "\n", "for t in range(1, 15, 4):\n", " percentile = np.percentile(l_seq_f[:, t], PFA)\n", " PD = [np.sum(l_seq_g[:, t] < p) / N for p in percentile]\n", "\n", " plt.plot(PFA / 100, PD, label=f\"t={t}\")\n", "\n", "plt.scatter(0, 1, label=\"perfect detection\")\n", "plt.plot([0, 1], [0, 1], color='k', ls='--', label=\"random detection\")\n", "\n", "plt.arrow(0.5, 0.5, -0.15, 0.15, head_width=0.03)\n", "plt.text(0.35, 0.7, \"better\")\n", "plt.xlabel(\"Probability of false alarm\")\n", "plt.ylabel(\"Probability of detection\")\n", "plt.legend()\n", "plt.title(\"Receiver Operating Characteristic Curve\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that as  t  increases, we are assured a larger probability\n", "of detection and a smaller probability of false alarm associated with\n", "a given discrimination threshold  c .\n", "\n", "As  t \\rightarrow + \\infty , we approach the perfect detection\n", "curve that is indicated by a right angle hinging on the blue dot.\n", "\n", "For a given sample size  t , the discrimination threshold  c  determines a point on the receiver operating\n", "characteristic curve.\n", "\n", "It is up to the test designer to trade off probabilities of\n", "making the two types of errors.\n", "\n", "But we know how to choose the smallest sample size to achieve given targets for\n", "the probabilities.\n", "\n", "Typically, frequentists aim for a high probability of detection that\n", "respects an upper bound on the probability of false alarm.\n", "\n", "Below we show an example in which we fix the probability of false alarm at\n", " 0.05 .\n", "\n", "The required sample size for making a decision is then determined by a\n", "target probability of detection, for example,  0.9 , as depicted in the following graph." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "PFA = 0.05\n", "PD = np.empty(T)\n", "\n", "for t in range(T):\n", "\n", " c = np.percentile(l_seq_f[:, t], PFA * 100)\n", " PD[t] = np.sum(l_seq_g[:, t] < c) / N\n", "\n", "plt.plot(range(T), PD)\n", "plt.axhline(0.9, color=\"k\", ls=\"--\")\n", "\n", "plt.xlabel(\"t\")\n", "plt.ylabel(\"Probability of detection\")\n", "plt.title(f\"Probability of false alarm={PFA}\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The United States Navy evidently used a procedure like this to select a sample size  t  for doing quality\n", "control tests during World War II.\n", "\n", "A Navy Captain who had been ordered to perform tests of this kind had doubts about it that he\n", "presented to Milton Friedman, as we describe in [this lecture](https://python-programming.quantecon.org/wald_friedman.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Kullback–Leibler divergence\n", "\n", "Now let’s consider a case in which neither  g  nor  f \n", "generates the data.\n", "\n", "Instead, a third distribution  h  does.\n", "\n", "Let’s watch how how the cumulated likelihood ratios  f/g  behave\n", "when  h  governs the data.\n", "\n", "A key tool here is called **Kullback–Leibler divergence**.\n", "\n", "It is also called **relative entropy**.\n", "\n", "It measures how one probability distribution differs from another.\n", "\n", "In our application, we want to measure how  f  or  g \n", "diverges from  h \n", "\n", "The two Kullback–Leibler divergences pertinent for us are  K_f \n", "and  K_g  defined as\n", "\n", "$$\n", "\\begin{aligned}\n", "K_{f} &=E_{h}\\left[\\log\\left(\\frac{f\\left(w\\right)}{h\\left(w\\right)}\\right)\\frac{f\\left(w\\right)}{h\\left(w\\right)}\\right] \\\\\n", " &=\\int\\log\\left(\\frac{f\\left(w\\right)}{h\\left(w\\right)}\\right)\\frac{f\\left(w\\right)}{h\\left(w\\right)}h\\left(w\\right)dw \\\\\n", " &=\\int\\log\\left(\\frac{f\\left(w\\right)}{h\\left(w\\right)}\\right)f\\left(w\\right)dw\n", "\\end{aligned}\n", "$$\n", "\n", "$$\n", "\\begin{aligned}\n", "K_{g} &=E_{h}\\left[\\log\\left(\\frac{g\\left(w\\right)}{h\\left(w\\right)}\\right)\\frac{g\\left(w\\right)}{h\\left(w\\right)}\\right] \\\\\n", " &=\\int\\log\\left(\\frac{g\\left(w\\right)}{h\\left(w\\right)}\\right)\\frac{g\\left(w\\right)}{h\\left(w\\right)}h\\left(w\\right)dw \\\\\n", " &=\\int\\log\\left(\\frac{g\\left(w\\right)}{h\\left(w\\right)}\\right)g\\left(w\\right)dw\n", "\\end{aligned}\n", "\n", "\n", "When $K_g < K_f$, $g$ is closer to $h$ than $f$\n", "is.\n", "\n", "- In that case we’ll find that $L\\left(w^t\\right) \\rightarrow 0$. \n", "\n", "\n", "When $K_g > K_f$, $f$ is closer to $h$ than $g$\n", "is.\n", "\n", "- In that case we’ll find that\n", " $L\\left(w^t\\right) \\rightarrow + \\infty$ \n", "\n", "\n", "We’ll now experiment with an $h$ is also a beta distribution\n", "\n", "We’ll start by setting parameters $G_a$ and $G_b$ so that\n", "$h$ is closer to $g$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "H_a, H_b = 3.5, 1.8\n", "\n", "h = njit(lambda x: p(x, H_a, H_b))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "x_range = np.linspace(0, 1, 100)\n", "plt.plot(x_range, f(x_range), label='f')\n", "plt.plot(x_range, g(x_range), label='g')\n", "plt.plot(x_range, h(x_range), label='h')\n", "\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let’s compute the Kullback–Leibler discrepancies by quadrature\n", "integration." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "def KL_integrand(w, q, h):\n", "\n", " m = q(w) / h(w)\n", "\n", " return np.log(m) * q(w)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "def compute_KL(h, f, g):\n", "\n", " Kf, _ = quad(KL_integrand, 0, 1, args=(f, h))\n", " Kg, _ = quad(KL_integrand, 0, 1, args=(g, h))\n", "\n", " return Kf, Kg" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "Kf, Kg = compute_KL(h, f, g)\n", "Kf, Kg" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have $K_g < K_f$.\n", "\n", "Next, we can verify our conjecture about $L\\left(w^t\\right)$ by\n", "simulation." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "l_arr_h = simulate(H_a, H_b)\n", "l_seq_h = np.cumprod(l_arr_h, axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The figure below plots over time the fraction of paths\n", "$L\\left(w^t\\right)$ that fall in the interval $[0,0.01]$.\n", "\n", "Notice that it converges to 1 as expected when $g$ is closer to\n", "$h$ than $f$ is." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "N, T = l_arr_h.shape\n", "plt.plot(range(T), np.sum(l_seq_h <= 0.01, axis=0) / N)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also try an $h$ that is closer to $f$ than is\n", "$g$ so that now $K_g$ is larger than $K_f$." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "H_a, H_b = 1.2, 1.2\n", "h = njit(lambda x: p(x, H_a, H_b))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "Kf, Kg = compute_KL(h, f, g)\n", "Kf, Kg" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "l_arr_h = simulate(H_a, H_b)\n", "l_seq_h = np.cumprod(l_arr_h, axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now probability mass of $L\\left(w^t\\right)$ falling above\n", "$10000$ diverges to $+\\infty$." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "N, T = l_arr_h.shape\n", "plt.plot(range(T), np.sum(l_seq_h > 10000, axis=0) / N)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sequels\n", "\n", "Likelihood processes play an important role in Bayesian learning, as described in [this lecture](https://python-programming.quantecon.org/likelihood_bayes.html)\n", "and as applied in [this lecture](https://python-programming.quantecon.org/odu.html).\n", "\n", "Likelihood ratio processes appear again in [this lecture](https://python-advanced.quantecon.org/additive_functionals.html), which contains another illustration\n", "of the **peculiar property** of likelihood ratio processes described above." ] } ], "metadata": { "date": 1615288254.264089, "filename": "likelihood_ratio_process.rst", "kernelspec": { "display_name": "Python", "language": "python3", "name": "python3" }, "next_doc": { "link": "wald_friedman", "title": "A Problem that Stumped Milton Friedman" }, "prev_doc": { "link": "odu", "title": "Job Search VII: Search with Learning" }, "title": "Likelihood Ratio Processes" }, "nbformat": 4, "nbformat_minor": 2 }