added BNK calculations

jittat · jittat · commit 346f5557eae5 · 2018-02-26T22:39:02.000+07:00
diff --git a/BNK-rand.ipynb b/BNK-rand.ipynb
@@ -0,0 +1,282 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Basic settings\n",
+    "\n",
+    "There are $N$ members.  Each member has $K$ types of pictures.  Each pack consists of $M$ pictures of distinct members.  We want to buy $L$ packs of pictures."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Number of members (Uniq)\n",
+    "\n",
+    "We first calculate the expected number of members that you get from buying $L$ sets of pictures.  Since we are interested only in the number of members, the types can be ignored.\n",
+    "\n",
+    "Let's consider member $i$.  For each pack, the probability that you get member $i$'s picture is exactly\n",
+    "\n",
+    "$$M/N.$$\n",
+    "\n",
+    "Thus, if we buy $L$ packs, the probability that you do not get member $i$'s picture is \n",
+    "\n",
+    "$$(1 - M/N)^L.$$\n",
+    "\n",
+    "For each member $i$, let indicator random variable $X_i$ be 1 of you get at least one picture of member $i$.  Therefore,\n",
+    "\n",
+    "$$E[X_i] = Pr[X_i=1] = 1 - (1-M/N)^L$$\n",
+    "\n",
+    "Let random variable $X$ be the number of members that you get at least one picture.  Note that\n",
+    "\n",
+    "$$X = \\sum_{i=1}^{N} X_i.$$\n",
+    "\n",
+    "Therefore, using linearity of expectation, we have\n",
+    "\n",
+    "$$E[X] = E\\left[\\sum_{i=1}^{N} X_i\\right] = \\sum_{i=1}^{N} E[X_i] = N\\left(1-(1-M/N)^L\\right) = N - N(1-M/N)^L$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Let's see the numbers\n",
+    "\n",
+    "Let's try to plug in $N=27$ and $M=5$, and calculate $E[X]$ for various values of $L$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def uniq(n,m,l):\n",
+    "    return n - n*((1-m/n)**l)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "#sets =   1, exp. uniq = 5.00000\n",
+      "#sets =   2, exp. uniq = 9.07407\n",
+      "#sets =   3, exp. uniq = 12.39369\n",
+      "#sets =   4, exp. uniq = 15.09856\n",
+      "#sets =   5, exp. uniq = 17.30253\n",
+      "#sets =   6, exp. uniq = 19.09836\n",
+      "#sets =   7, exp. uniq = 20.56163\n",
+      "#sets =   8, exp. uniq = 21.75392\n",
+      "#sets =   9, exp. uniq = 22.72541\n",
+      "#sets =  10, exp. uniq = 23.51700\n",
+      "#sets =  15, exp. uniq = 25.74903\n",
+      "#sets =  20, exp. uniq = 26.55069\n",
+      "#sets =  25, exp. uniq = 26.83862\n",
+      "#sets =  30, exp. uniq = 26.94204\n",
+      "#sets =  35, exp. uniq = 26.97918\n",
+      "#sets =  40, exp. uniq = 26.99252\n",
+      "#sets =  45, exp. uniq = 26.99731\n",
+      "#sets =  50, exp. uniq = 26.99904\n",
+      "#sets =  55, exp. uniq = 26.99965\n",
+      "#sets =  60, exp. uniq = 26.99988\n",
+      "#sets =  65, exp. uniq = 26.99996\n",
+      "#sets =  70, exp. uniq = 26.99998\n",
+      "#sets =  75, exp. uniq = 26.99999\n",
+      "#sets =  80, exp. uniq = 27.00000\n",
+      "#sets =  85, exp. uniq = 27.00000\n",
+      "#sets =  90, exp. uniq = 27.00000\n",
+      "#sets =  95, exp. uniq = 27.00000\n",
+      "#sets = 100, exp. uniq = 27.00000\n"
+     ]
+    }
+   ],
+   "source": [
+    "for i,ex in [(i,uniq(27,5,i)) for i in range(1,101)]:\n",
+    "    if (i <= 10) or (i % 5 == 0):\n",
+    "        print(\"#sets = %3d, exp. uniq = %.5f\" % (i,ex))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Number of members with comp (u_comp)\n",
+    "\n",
+    "Let's consider the number of members you get at least one complete set.  The current formula is pretty involved, but it is not very hard to understand.\n",
+    "\n",
+    "#### Single member $i$\n",
+    "\n",
+    "Suppose we get $P$ pictures of member $i$, what is the probability that we get a complete set?  It is hard to deal with general $K$, but for $K=3$ we have a simple formula.  There are 2 cases that you do not get a complete set:\n",
+    "\n",
+    "**Case 1:** You only get one type.  This occurs with probability $3\\cdot (1/3)^P$.  (3 types, each picture has to be in this type.)\n",
+    "\n",
+    "**Case 2:** You only get exactly two types.  Let's consider the probability that you get only type-1 and type-2.  With probability $(2/3)^P$ you do not get any type-3 pictures.  However, it can be the case that you get only type-1 or type-2.  Excluding both possibility, you have that the desired probability is\n",
+    "\n",
+    "$$(2/3)^P - 2(1/3)^P.$$\n",
+    "\n",
+    "There are ${3 \\choose 2}=3$ possible pairs of types, thus Case 2 occurs with probability \n",
+    "\n",
+    "$$3\\left((2/3)^p - 2(1/3)^P\\right).$$\n",
+    "\n",
+    "Since both two cases are disjoint, we can add their probabilities, yielding that we get a complete set with probability\n",
+    "\n",
+    "$$1 - \\left(3\\cdot (1/3)^P + 3\\left((2/3)^p - 2(1/3)^P\\right)\\right).$$\n",
+    "\n",
+    "Let's call this quantity $q(P)$.  We will use $q(P)$ to find the probability that when buying $L$ packs, you get at least one complete set.  Let event $A_P$ be the event that you get $P$ pictures.  Let event $B$ be the event that you get at least one complete set.  We want to compute\n",
+    "\n",
+    "$$Pr[B]=Pr\\left[\\bigcup_{P=0}^{L} B\\cap A_P\\right].$$\n",
+    "\n",
+    "Since each event $B\\cap A_P$ in the union are disjoint, we have that\n",
+    "\n",
+    "$$Pr[B]=\\sum_{P=0}^{L} Pr\\left[B\\cap A_P\\right]=\\sum_{P=0}^{L} Pr[B|A_P]\\cdot Pr[A_P].$$\n",
+    "\n",
+    "Recall that $Pr[B|A_P]=q(P)$ and the number of pictures that you get is a binomial random variable with parameter $(L,M/N)$.  Hence,\n",
+    "\n",
+    "$$Pr[B]=\\sum_{P=0}^{L} q(P)\\cdot {L \\choose P}(M/N)^P(1-M/N)^{L-P}.$$\n",
+    "\n",
+    "#### All members\n",
+    "\n",
+    "To get the expected number of members that you get complete sets, we define, for member $i$, an indicator random variable $Y_i$ to be 1 iff you get a complete set of member $i$, and let random variable $Y$ be the number of members that you get complete sets.  Thus, $Y=\\sum_{i=1}^N Y_i$.\n",
+    "\n",
+    "From previous section, we have $Pr[Y_i=1]=Pr[B]$.\n",
+    "\n",
+    "We then have\n",
+    "\n",
+    "$$E[Y] = E\\left[\\sum_{i=1}^{N} Y_i\\right] = \\sum_{i=1}^N E[Y_i] = N\\cdot Pr[B]\n",
+    "= N\\left(\\sum_{P=0}^{L} q(P)\\cdot {L \\choose P}(M/N)^P(1-M/N)^{L-P}\\right).$$\n",
+    "\n",
+    "#### Let's see the numbers\n",
+    "\n",
+    "We will plug in, again, $N=27$ and $M=5$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def q(p):\n",
+    "    if p == 0:   # the formula doesn't work when p = 0\n",
+    "        return 0\n",
+    "    else:\n",
+    "        return 1 - (3*(1/3**p) + 3*((2**p/3**p) - 2*(1/3**p)))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[(0, 0), (1, 0.0), (2, 0.0), (3, 0.22222222222222232), (4, 0.4444444444444444), (5, 0.6172839506172839), (6, 0.7407407407407407), (7, 0.8257887517146776), (8, 0.8834019204389575), (9, 0.9221155311690291)]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Let's see if q makes sense.\n",
+    "\n",
+    "print([(i,q(i)) for i in range(0,10)])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For sanity check, if $P=3$, the probability is $3!/3^3 = 6/27 = 0.222222$.  So our formula seems correct."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from scipy.stats import binom\n",
+    "\n",
+    "def prb(l):\n",
+    "    s = sum([q(i) * binom.pmf(i,l,5./27.)\n",
+    "            for i in range(0,l+1)])\n",
+    "    return s"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "#sets =   1, exp. u_comp = 0.00000\n",
+      "#sets =   2, exp. u_comp = 0.00000\n",
+      "#sets =   3, exp. u_comp = 0.03810\n",
+      "#sets =   4, exp. u_comp = 0.13830\n",
+      "#sets =   5, exp. u_comp = 0.31411\n",
+      "#sets =   6, exp. u_comp = 0.57136\n",
+      "#sets =   7, exp. u_comp = 0.91044\n",
+      "#sets =   8, exp. u_comp = 1.32792\n",
+      "#sets =   9, exp. u_comp = 1.81785\n",
+      "#sets =  10, exp. u_comp = 2.37271\n",
+      "#sets =  15, exp. u_comp = 5.82459\n",
+      "#sets =  20, exp. u_comp = 9.70830\n",
+      "#sets =  25, exp. u_comp = 13.37323\n",
+      "#sets =  30, exp. u_comp = 16.52004\n",
+      "#sets =  35, exp. u_comp = 19.07439\n",
+      "#sets =  40, exp. u_comp = 21.07557\n",
+      "#sets =  45, exp. u_comp = 22.60730\n",
+      "#sets =  50, exp. u_comp = 23.76152\n",
+      "#sets =  55, exp. u_comp = 24.62201\n",
+      "#sets =  60, exp. u_comp = 25.25880\n",
+      "#sets =  65, exp. u_comp = 25.72762\n",
+      "#sets =  70, exp. u_comp = 26.07152\n",
+      "#sets =  75, exp. u_comp = 26.32316\n",
+      "#sets =  80, exp. u_comp = 26.50695\n",
+      "#sets =  85, exp. u_comp = 26.64101\n",
+      "#sets =  90, exp. u_comp = 26.73872\n",
+      "#sets =  95, exp. u_comp = 26.80988\n",
+      "#sets = 100, exp. u_comp = 26.86169\n"
+     ]
+    }
+   ],
+   "source": [
+    "# finally the numbers\n",
+    "for i,ex in [(i,27 * prb(i)) for i in range(1,101)]:\n",
+    "    if (i <= 10) or (i % 5 == 0):\n",
+    "        print(\"#sets = %3d, exp. u_comp = %.5f\" % (i,ex))"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.5.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}