|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "### Basic settings\n", |
| 8 | + "\n", |
| 9 | + "There are $N$ members. Each member has $K$ types of pictures. Each pack consists of $M$ pictures of distinct members. We want to buy $L$ packs of pictures." |
| 10 | + ] |
| 11 | + }, |
| 12 | + { |
| 13 | + "cell_type": "markdown", |
| 14 | + "metadata": {}, |
| 15 | + "source": [ |
| 16 | + "### Number of members (Uniq)\n", |
| 17 | + "\n", |
| 18 | + "We first calculate the expected number of members that you get from buying $L$ sets of pictures. Since we are interested only in the number of members, the types can be ignored.\n", |
| 19 | + "\n", |
| 20 | + "Let's consider member $i$. For each pack, the probability that you get member $i$'s picture is exactly\n", |
| 21 | + "\n", |
| 22 | + "$$M/N.$$\n", |
| 23 | + "\n", |
| 24 | + "Thus, if we buy $L$ packs, the probability that you do not get member $i$'s picture is \n", |
| 25 | + "\n", |
| 26 | + "$$(1 - M/N)^L.$$\n", |
| 27 | + "\n", |
| 28 | + "For each member $i$, let indicator random variable $X_i$ be 1 of you get at least one picture of member $i$. Therefore,\n", |
| 29 | + "\n", |
| 30 | + "$$E[X_i] = Pr[X_i=1] = 1 - (1-M/N)^L$$\n", |
| 31 | + "\n", |
| 32 | + "Let random variable $X$ be the number of members that you get at least one picture. Note that\n", |
| 33 | + "\n", |
| 34 | + "$$X = \\sum_{i=1}^{N} X_i.$$\n", |
| 35 | + "\n", |
| 36 | + "Therefore, using linearity of expectation, we have\n", |
| 37 | + "\n", |
| 38 | + "$$E[X] = E\\left[\\sum_{i=1}^{N} X_i\\right] = \\sum_{i=1}^{N} E[X_i] = N\\left(1-(1-M/N)^L\\right) = N - N(1-M/N)^L$$" |
| 39 | + ] |
| 40 | + }, |
| 41 | + { |
| 42 | + "cell_type": "markdown", |
| 43 | + "metadata": {}, |
| 44 | + "source": [ |
| 45 | + "#### Let's see the numbers\n", |
| 46 | + "\n", |
| 47 | + "Let's try to plug in $N=27$ and $M=5$, and calculate $E[X]$ for various values of $L$." |
| 48 | + ] |
| 49 | + }, |
| 50 | + { |
| 51 | + "cell_type": "code", |
| 52 | + "execution_count": 1, |
| 53 | + "metadata": {}, |
| 54 | + "outputs": [], |
| 55 | + "source": [ |
| 56 | + "def uniq(n,m,l):\n", |
| 57 | + " return n - n*((1-m/n)**l)" |
| 58 | + ] |
| 59 | + }, |
| 60 | + { |
| 61 | + "cell_type": "code", |
| 62 | + "execution_count": 2, |
| 63 | + "metadata": {}, |
| 64 | + "outputs": [ |
| 65 | + { |
| 66 | + "name": "stdout", |
| 67 | + "output_type": "stream", |
| 68 | + "text": [ |
| 69 | + "#sets = 1, exp. uniq = 5.00000\n", |
| 70 | + "#sets = 2, exp. uniq = 9.07407\n", |
| 71 | + "#sets = 3, exp. uniq = 12.39369\n", |
| 72 | + "#sets = 4, exp. uniq = 15.09856\n", |
| 73 | + "#sets = 5, exp. uniq = 17.30253\n", |
| 74 | + "#sets = 6, exp. uniq = 19.09836\n", |
| 75 | + "#sets = 7, exp. uniq = 20.56163\n", |
| 76 | + "#sets = 8, exp. uniq = 21.75392\n", |
| 77 | + "#sets = 9, exp. uniq = 22.72541\n", |
| 78 | + "#sets = 10, exp. uniq = 23.51700\n", |
| 79 | + "#sets = 15, exp. uniq = 25.74903\n", |
| 80 | + "#sets = 20, exp. uniq = 26.55069\n", |
| 81 | + "#sets = 25, exp. uniq = 26.83862\n", |
| 82 | + "#sets = 30, exp. uniq = 26.94204\n", |
| 83 | + "#sets = 35, exp. uniq = 26.97918\n", |
| 84 | + "#sets = 40, exp. uniq = 26.99252\n", |
| 85 | + "#sets = 45, exp. uniq = 26.99731\n", |
| 86 | + "#sets = 50, exp. uniq = 26.99904\n", |
| 87 | + "#sets = 55, exp. uniq = 26.99965\n", |
| 88 | + "#sets = 60, exp. uniq = 26.99988\n", |
| 89 | + "#sets = 65, exp. uniq = 26.99996\n", |
| 90 | + "#sets = 70, exp. uniq = 26.99998\n", |
| 91 | + "#sets = 75, exp. uniq = 26.99999\n", |
| 92 | + "#sets = 80, exp. uniq = 27.00000\n", |
| 93 | + "#sets = 85, exp. uniq = 27.00000\n", |
| 94 | + "#sets = 90, exp. uniq = 27.00000\n", |
| 95 | + "#sets = 95, exp. uniq = 27.00000\n", |
| 96 | + "#sets = 100, exp. uniq = 27.00000\n" |
| 97 | + ] |
| 98 | + } |
| 99 | + ], |
| 100 | + "source": [ |
| 101 | + "for i,ex in [(i,uniq(27,5,i)) for i in range(1,101)]:\n", |
| 102 | + " if (i <= 10) or (i % 5 == 0):\n", |
| 103 | + " print(\"#sets = %3d, exp. uniq = %.5f\" % (i,ex))" |
| 104 | + ] |
| 105 | + }, |
| 106 | + { |
| 107 | + "cell_type": "markdown", |
| 108 | + "metadata": {}, |
| 109 | + "source": [ |
| 110 | + "### Number of members with comp (u_comp)\n", |
| 111 | + "\n", |
| 112 | + "Let's consider the number of members you get at least one complete set. The current formula is pretty involved, but it is not very hard to understand.\n", |
| 113 | + "\n", |
| 114 | + "#### Single member $i$\n", |
| 115 | + "\n", |
| 116 | + "Suppose we get $P$ pictures of member $i$, what is the probability that we get a complete set? It is hard to deal with general $K$, but for $K=3$ we have a simple formula. There are 2 cases that you do not get a complete set:\n", |
| 117 | + "\n", |
| 118 | + "**Case 1:** You only get one type. This occurs with probability $3\\cdot (1/3)^P$. (3 types, each picture has to be in this type.)\n", |
| 119 | + "\n", |
| 120 | + "**Case 2:** You only get exactly two types. Let's consider the probability that you get only type-1 and type-2. With probability $(2/3)^P$ you do not get any type-3 pictures. However, it can be the case that you get only type-1 or type-2. Excluding both possibility, you have that the desired probability is\n", |
| 121 | + "\n", |
| 122 | + "$$(2/3)^P - 2(1/3)^P.$$\n", |
| 123 | + "\n", |
| 124 | + "There are ${3 \\choose 2}=3$ possible pairs of types, thus Case 2 occurs with probability \n", |
| 125 | + "\n", |
| 126 | + "$$3\\left((2/3)^p - 2(1/3)^P\\right).$$\n", |
| 127 | + "\n", |
| 128 | + "Since both two cases are disjoint, we can add their probabilities, yielding that we get a complete set with probability\n", |
| 129 | + "\n", |
| 130 | + "$$1 - \\left(3\\cdot (1/3)^P + 3\\left((2/3)^p - 2(1/3)^P\\right)\\right).$$\n", |
| 131 | + "\n", |
| 132 | + "Let's call this quantity $q(P)$. We will use $q(P)$ to find the probability that when buying $L$ packs, you get at least one complete set. Let event $A_P$ be the event that you get $P$ pictures. Let event $B$ be the event that you get at least one complete set. We want to compute\n", |
| 133 | + "\n", |
| 134 | + "$$Pr[B]=Pr\\left[\\bigcup_{P=0}^{L} B\\cap A_P\\right].$$\n", |
| 135 | + "\n", |
| 136 | + "Since each event $B\\cap A_P$ in the union are disjoint, we have that\n", |
| 137 | + "\n", |
| 138 | + "$$Pr[B]=\\sum_{P=0}^{L} Pr\\left[B\\cap A_P\\right]=\\sum_{P=0}^{L} Pr[B|A_P]\\cdot Pr[A_P].$$\n", |
| 139 | + "\n", |
| 140 | + "Recall that $Pr[B|A_P]=q(P)$ and the number of pictures that you get is a binomial random variable with parameter $(L,M/N)$. Hence,\n", |
| 141 | + "\n", |
| 142 | + "$$Pr[B]=\\sum_{P=0}^{L} q(P)\\cdot {L \\choose P}(M/N)^P(1-M/N)^{L-P}.$$\n", |
| 143 | + "\n", |
| 144 | + "#### All members\n", |
| 145 | + "\n", |
| 146 | + "To get the expected number of members that you get complete sets, we define, for member $i$, an indicator random variable $Y_i$ to be 1 iff you get a complete set of member $i$, and let random variable $Y$ be the number of members that you get complete sets. Thus, $Y=\\sum_{i=1}^N Y_i$.\n", |
| 147 | + "\n", |
| 148 | + "From previous section, we have $Pr[Y_i=1]=Pr[B]$.\n", |
| 149 | + "\n", |
| 150 | + "We then have\n", |
| 151 | + "\n", |
| 152 | + "$$E[Y] = E\\left[\\sum_{i=1}^{N} Y_i\\right] = \\sum_{i=1}^N E[Y_i] = N\\cdot Pr[B]\n", |
| 153 | + "= N\\left(\\sum_{P=0}^{L} q(P)\\cdot {L \\choose P}(M/N)^P(1-M/N)^{L-P}\\right).$$\n", |
| 154 | + "\n", |
| 155 | + "#### Let's see the numbers\n", |
| 156 | + "\n", |
| 157 | + "We will plug in, again, $N=27$ and $M=5$." |
| 158 | + ] |
| 159 | + }, |
| 160 | + { |
| 161 | + "cell_type": "code", |
| 162 | + "execution_count": 3, |
| 163 | + "metadata": {}, |
| 164 | + "outputs": [], |
| 165 | + "source": [ |
| 166 | + "def q(p):\n", |
| 167 | + " if p == 0: # the formula doesn't work when p = 0\n", |
| 168 | + " return 0\n", |
| 169 | + " else:\n", |
| 170 | + " return 1 - (3*(1/3**p) + 3*((2**p/3**p) - 2*(1/3**p)))" |
| 171 | + ] |
| 172 | + }, |
| 173 | + { |
| 174 | + "cell_type": "code", |
| 175 | + "execution_count": 4, |
| 176 | + "metadata": {}, |
| 177 | + "outputs": [ |
| 178 | + { |
| 179 | + "name": "stdout", |
| 180 | + "output_type": "stream", |
| 181 | + "text": [ |
| 182 | + "[(0, 0), (1, 0.0), (2, 0.0), (3, 0.22222222222222232), (4, 0.4444444444444444), (5, 0.6172839506172839), (6, 0.7407407407407407), (7, 0.8257887517146776), (8, 0.8834019204389575), (9, 0.9221155311690291)]\n" |
| 183 | + ] |
| 184 | + } |
| 185 | + ], |
| 186 | + "source": [ |
| 187 | + "# Let's see if q makes sense.\n", |
| 188 | + "\n", |
| 189 | + "print([(i,q(i)) for i in range(0,10)])" |
| 190 | + ] |
| 191 | + }, |
| 192 | + { |
| 193 | + "cell_type": "markdown", |
| 194 | + "metadata": {}, |
| 195 | + "source": [ |
| 196 | + "For sanity check, if $P=3$, the probability is $3!/3^3 = 6/27 = 0.222222$. So our formula seems correct." |
| 197 | + ] |
| 198 | + }, |
| 199 | + { |
| 200 | + "cell_type": "code", |
| 201 | + "execution_count": 5, |
| 202 | + "metadata": {}, |
| 203 | + "outputs": [], |
| 204 | + "source": [ |
| 205 | + "from scipy.stats import binom\n", |
| 206 | + "\n", |
| 207 | + "def prb(l):\n", |
| 208 | + " s = sum([q(i) * binom.pmf(i,l,5./27.)\n", |
| 209 | + " for i in range(0,l+1)])\n", |
| 210 | + " return s" |
| 211 | + ] |
| 212 | + }, |
| 213 | + { |
| 214 | + "cell_type": "code", |
| 215 | + "execution_count": 6, |
| 216 | + "metadata": {}, |
| 217 | + "outputs": [ |
| 218 | + { |
| 219 | + "name": "stdout", |
| 220 | + "output_type": "stream", |
| 221 | + "text": [ |
| 222 | + "#sets = 1, exp. u_comp = 0.00000\n", |
| 223 | + "#sets = 2, exp. u_comp = 0.00000\n", |
| 224 | + "#sets = 3, exp. u_comp = 0.03810\n", |
| 225 | + "#sets = 4, exp. u_comp = 0.13830\n", |
| 226 | + "#sets = 5, exp. u_comp = 0.31411\n", |
| 227 | + "#sets = 6, exp. u_comp = 0.57136\n", |
| 228 | + "#sets = 7, exp. u_comp = 0.91044\n", |
| 229 | + "#sets = 8, exp. u_comp = 1.32792\n", |
| 230 | + "#sets = 9, exp. u_comp = 1.81785\n", |
| 231 | + "#sets = 10, exp. u_comp = 2.37271\n", |
| 232 | + "#sets = 15, exp. u_comp = 5.82459\n", |
| 233 | + "#sets = 20, exp. u_comp = 9.70830\n", |
| 234 | + "#sets = 25, exp. u_comp = 13.37323\n", |
| 235 | + "#sets = 30, exp. u_comp = 16.52004\n", |
| 236 | + "#sets = 35, exp. u_comp = 19.07439\n", |
| 237 | + "#sets = 40, exp. u_comp = 21.07557\n", |
| 238 | + "#sets = 45, exp. u_comp = 22.60730\n", |
| 239 | + "#sets = 50, exp. u_comp = 23.76152\n", |
| 240 | + "#sets = 55, exp. u_comp = 24.62201\n", |
| 241 | + "#sets = 60, exp. u_comp = 25.25880\n", |
| 242 | + "#sets = 65, exp. u_comp = 25.72762\n", |
| 243 | + "#sets = 70, exp. u_comp = 26.07152\n", |
| 244 | + "#sets = 75, exp. u_comp = 26.32316\n", |
| 245 | + "#sets = 80, exp. u_comp = 26.50695\n", |
| 246 | + "#sets = 85, exp. u_comp = 26.64101\n", |
| 247 | + "#sets = 90, exp. u_comp = 26.73872\n", |
| 248 | + "#sets = 95, exp. u_comp = 26.80988\n", |
| 249 | + "#sets = 100, exp. u_comp = 26.86169\n" |
| 250 | + ] |
| 251 | + } |
| 252 | + ], |
| 253 | + "source": [ |
| 254 | + "# finally the numbers\n", |
| 255 | + "for i,ex in [(i,27 * prb(i)) for i in range(1,101)]:\n", |
| 256 | + " if (i <= 10) or (i % 5 == 0):\n", |
| 257 | + " print(\"#sets = %3d, exp. u_comp = %.5f\" % (i,ex))" |
| 258 | + ] |
| 259 | + } |
| 260 | + ], |
| 261 | + "metadata": { |
| 262 | + "kernelspec": { |
| 263 | + "display_name": "Python 3", |
| 264 | + "language": "python", |
| 265 | + "name": "python3" |
| 266 | + }, |
| 267 | + "language_info": { |
| 268 | + "codemirror_mode": { |
| 269 | + "name": "ipython", |
| 270 | + "version": 3 |
| 271 | + }, |
| 272 | + "file_extension": ".py", |
| 273 | + "mimetype": "text/x-python", |
| 274 | + "name": "python", |
| 275 | + "nbconvert_exporter": "python", |
| 276 | + "pygments_lexer": "ipython3", |
| 277 | + "version": "3.5.3" |
| 278 | + } |
| 279 | + }, |
| 280 | + "nbformat": 4, |
| 281 | + "nbformat_minor": 2 |
| 282 | +} |
0 commit comments