Skip to content

Commit 346f555

Browse files
committed
added BNK calculations
1 parent 654da75 commit 346f555

File tree

1 file changed

+282
-0
lines changed

1 file changed

+282
-0
lines changed

BNK-rand.ipynb

Lines changed: 282 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,282 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"### Basic settings\n",
8+
"\n",
9+
"There are $N$ members. Each member has $K$ types of pictures. Each pack consists of $M$ pictures of distinct members. We want to buy $L$ packs of pictures."
10+
]
11+
},
12+
{
13+
"cell_type": "markdown",
14+
"metadata": {},
15+
"source": [
16+
"### Number of members (Uniq)\n",
17+
"\n",
18+
"We first calculate the expected number of members that you get from buying $L$ sets of pictures. Since we are interested only in the number of members, the types can be ignored.\n",
19+
"\n",
20+
"Let's consider member $i$. For each pack, the probability that you get member $i$'s picture is exactly\n",
21+
"\n",
22+
"$$M/N.$$\n",
23+
"\n",
24+
"Thus, if we buy $L$ packs, the probability that you do not get member $i$'s picture is \n",
25+
"\n",
26+
"$$(1 - M/N)^L.$$\n",
27+
"\n",
28+
"For each member $i$, let indicator random variable $X_i$ be 1 of you get at least one picture of member $i$. Therefore,\n",
29+
"\n",
30+
"$$E[X_i] = Pr[X_i=1] = 1 - (1-M/N)^L$$\n",
31+
"\n",
32+
"Let random variable $X$ be the number of members that you get at least one picture. Note that\n",
33+
"\n",
34+
"$$X = \\sum_{i=1}^{N} X_i.$$\n",
35+
"\n",
36+
"Therefore, using linearity of expectation, we have\n",
37+
"\n",
38+
"$$E[X] = E\\left[\\sum_{i=1}^{N} X_i\\right] = \\sum_{i=1}^{N} E[X_i] = N\\left(1-(1-M/N)^L\\right) = N - N(1-M/N)^L$$"
39+
]
40+
},
41+
{
42+
"cell_type": "markdown",
43+
"metadata": {},
44+
"source": [
45+
"#### Let's see the numbers\n",
46+
"\n",
47+
"Let's try to plug in $N=27$ and $M=5$, and calculate $E[X]$ for various values of $L$."
48+
]
49+
},
50+
{
51+
"cell_type": "code",
52+
"execution_count": 1,
53+
"metadata": {},
54+
"outputs": [],
55+
"source": [
56+
"def uniq(n,m,l):\n",
57+
" return n - n*((1-m/n)**l)"
58+
]
59+
},
60+
{
61+
"cell_type": "code",
62+
"execution_count": 2,
63+
"metadata": {},
64+
"outputs": [
65+
{
66+
"name": "stdout",
67+
"output_type": "stream",
68+
"text": [
69+
"#sets = 1, exp. uniq = 5.00000\n",
70+
"#sets = 2, exp. uniq = 9.07407\n",
71+
"#sets = 3, exp. uniq = 12.39369\n",
72+
"#sets = 4, exp. uniq = 15.09856\n",
73+
"#sets = 5, exp. uniq = 17.30253\n",
74+
"#sets = 6, exp. uniq = 19.09836\n",
75+
"#sets = 7, exp. uniq = 20.56163\n",
76+
"#sets = 8, exp. uniq = 21.75392\n",
77+
"#sets = 9, exp. uniq = 22.72541\n",
78+
"#sets = 10, exp. uniq = 23.51700\n",
79+
"#sets = 15, exp. uniq = 25.74903\n",
80+
"#sets = 20, exp. uniq = 26.55069\n",
81+
"#sets = 25, exp. uniq = 26.83862\n",
82+
"#sets = 30, exp. uniq = 26.94204\n",
83+
"#sets = 35, exp. uniq = 26.97918\n",
84+
"#sets = 40, exp. uniq = 26.99252\n",
85+
"#sets = 45, exp. uniq = 26.99731\n",
86+
"#sets = 50, exp. uniq = 26.99904\n",
87+
"#sets = 55, exp. uniq = 26.99965\n",
88+
"#sets = 60, exp. uniq = 26.99988\n",
89+
"#sets = 65, exp. uniq = 26.99996\n",
90+
"#sets = 70, exp. uniq = 26.99998\n",
91+
"#sets = 75, exp. uniq = 26.99999\n",
92+
"#sets = 80, exp. uniq = 27.00000\n",
93+
"#sets = 85, exp. uniq = 27.00000\n",
94+
"#sets = 90, exp. uniq = 27.00000\n",
95+
"#sets = 95, exp. uniq = 27.00000\n",
96+
"#sets = 100, exp. uniq = 27.00000\n"
97+
]
98+
}
99+
],
100+
"source": [
101+
"for i,ex in [(i,uniq(27,5,i)) for i in range(1,101)]:\n",
102+
" if (i <= 10) or (i % 5 == 0):\n",
103+
" print(\"#sets = %3d, exp. uniq = %.5f\" % (i,ex))"
104+
]
105+
},
106+
{
107+
"cell_type": "markdown",
108+
"metadata": {},
109+
"source": [
110+
"### Number of members with comp (u_comp)\n",
111+
"\n",
112+
"Let's consider the number of members you get at least one complete set. The current formula is pretty involved, but it is not very hard to understand.\n",
113+
"\n",
114+
"#### Single member $i$\n",
115+
"\n",
116+
"Suppose we get $P$ pictures of member $i$, what is the probability that we get a complete set? It is hard to deal with general $K$, but for $K=3$ we have a simple formula. There are 2 cases that you do not get a complete set:\n",
117+
"\n",
118+
"**Case 1:** You only get one type. This occurs with probability $3\\cdot (1/3)^P$. (3 types, each picture has to be in this type.)\n",
119+
"\n",
120+
"**Case 2:** You only get exactly two types. Let's consider the probability that you get only type-1 and type-2. With probability $(2/3)^P$ you do not get any type-3 pictures. However, it can be the case that you get only type-1 or type-2. Excluding both possibility, you have that the desired probability is\n",
121+
"\n",
122+
"$$(2/3)^P - 2(1/3)^P.$$\n",
123+
"\n",
124+
"There are ${3 \\choose 2}=3$ possible pairs of types, thus Case 2 occurs with probability \n",
125+
"\n",
126+
"$$3\\left((2/3)^p - 2(1/3)^P\\right).$$\n",
127+
"\n",
128+
"Since both two cases are disjoint, we can add their probabilities, yielding that we get a complete set with probability\n",
129+
"\n",
130+
"$$1 - \\left(3\\cdot (1/3)^P + 3\\left((2/3)^p - 2(1/3)^P\\right)\\right).$$\n",
131+
"\n",
132+
"Let's call this quantity $q(P)$. We will use $q(P)$ to find the probability that when buying $L$ packs, you get at least one complete set. Let event $A_P$ be the event that you get $P$ pictures. Let event $B$ be the event that you get at least one complete set. We want to compute\n",
133+
"\n",
134+
"$$Pr[B]=Pr\\left[\\bigcup_{P=0}^{L} B\\cap A_P\\right].$$\n",
135+
"\n",
136+
"Since each event $B\\cap A_P$ in the union are disjoint, we have that\n",
137+
"\n",
138+
"$$Pr[B]=\\sum_{P=0}^{L} Pr\\left[B\\cap A_P\\right]=\\sum_{P=0}^{L} Pr[B|A_P]\\cdot Pr[A_P].$$\n",
139+
"\n",
140+
"Recall that $Pr[B|A_P]=q(P)$ and the number of pictures that you get is a binomial random variable with parameter $(L,M/N)$. Hence,\n",
141+
"\n",
142+
"$$Pr[B]=\\sum_{P=0}^{L} q(P)\\cdot {L \\choose P}(M/N)^P(1-M/N)^{L-P}.$$\n",
143+
"\n",
144+
"#### All members\n",
145+
"\n",
146+
"To get the expected number of members that you get complete sets, we define, for member $i$, an indicator random variable $Y_i$ to be 1 iff you get a complete set of member $i$, and let random variable $Y$ be the number of members that you get complete sets. Thus, $Y=\\sum_{i=1}^N Y_i$.\n",
147+
"\n",
148+
"From previous section, we have $Pr[Y_i=1]=Pr[B]$.\n",
149+
"\n",
150+
"We then have\n",
151+
"\n",
152+
"$$E[Y] = E\\left[\\sum_{i=1}^{N} Y_i\\right] = \\sum_{i=1}^N E[Y_i] = N\\cdot Pr[B]\n",
153+
"= N\\left(\\sum_{P=0}^{L} q(P)\\cdot {L \\choose P}(M/N)^P(1-M/N)^{L-P}\\right).$$\n",
154+
"\n",
155+
"#### Let's see the numbers\n",
156+
"\n",
157+
"We will plug in, again, $N=27$ and $M=5$."
158+
]
159+
},
160+
{
161+
"cell_type": "code",
162+
"execution_count": 3,
163+
"metadata": {},
164+
"outputs": [],
165+
"source": [
166+
"def q(p):\n",
167+
" if p == 0: # the formula doesn't work when p = 0\n",
168+
" return 0\n",
169+
" else:\n",
170+
" return 1 - (3*(1/3**p) + 3*((2**p/3**p) - 2*(1/3**p)))"
171+
]
172+
},
173+
{
174+
"cell_type": "code",
175+
"execution_count": 4,
176+
"metadata": {},
177+
"outputs": [
178+
{
179+
"name": "stdout",
180+
"output_type": "stream",
181+
"text": [
182+
"[(0, 0), (1, 0.0), (2, 0.0), (3, 0.22222222222222232), (4, 0.4444444444444444), (5, 0.6172839506172839), (6, 0.7407407407407407), (7, 0.8257887517146776), (8, 0.8834019204389575), (9, 0.9221155311690291)]\n"
183+
]
184+
}
185+
],
186+
"source": [
187+
"# Let's see if q makes sense.\n",
188+
"\n",
189+
"print([(i,q(i)) for i in range(0,10)])"
190+
]
191+
},
192+
{
193+
"cell_type": "markdown",
194+
"metadata": {},
195+
"source": [
196+
"For sanity check, if $P=3$, the probability is $3!/3^3 = 6/27 = 0.222222$. So our formula seems correct."
197+
]
198+
},
199+
{
200+
"cell_type": "code",
201+
"execution_count": 5,
202+
"metadata": {},
203+
"outputs": [],
204+
"source": [
205+
"from scipy.stats import binom\n",
206+
"\n",
207+
"def prb(l):\n",
208+
" s = sum([q(i) * binom.pmf(i,l,5./27.)\n",
209+
" for i in range(0,l+1)])\n",
210+
" return s"
211+
]
212+
},
213+
{
214+
"cell_type": "code",
215+
"execution_count": 6,
216+
"metadata": {},
217+
"outputs": [
218+
{
219+
"name": "stdout",
220+
"output_type": "stream",
221+
"text": [
222+
"#sets = 1, exp. u_comp = 0.00000\n",
223+
"#sets = 2, exp. u_comp = 0.00000\n",
224+
"#sets = 3, exp. u_comp = 0.03810\n",
225+
"#sets = 4, exp. u_comp = 0.13830\n",
226+
"#sets = 5, exp. u_comp = 0.31411\n",
227+
"#sets = 6, exp. u_comp = 0.57136\n",
228+
"#sets = 7, exp. u_comp = 0.91044\n",
229+
"#sets = 8, exp. u_comp = 1.32792\n",
230+
"#sets = 9, exp. u_comp = 1.81785\n",
231+
"#sets = 10, exp. u_comp = 2.37271\n",
232+
"#sets = 15, exp. u_comp = 5.82459\n",
233+
"#sets = 20, exp. u_comp = 9.70830\n",
234+
"#sets = 25, exp. u_comp = 13.37323\n",
235+
"#sets = 30, exp. u_comp = 16.52004\n",
236+
"#sets = 35, exp. u_comp = 19.07439\n",
237+
"#sets = 40, exp. u_comp = 21.07557\n",
238+
"#sets = 45, exp. u_comp = 22.60730\n",
239+
"#sets = 50, exp. u_comp = 23.76152\n",
240+
"#sets = 55, exp. u_comp = 24.62201\n",
241+
"#sets = 60, exp. u_comp = 25.25880\n",
242+
"#sets = 65, exp. u_comp = 25.72762\n",
243+
"#sets = 70, exp. u_comp = 26.07152\n",
244+
"#sets = 75, exp. u_comp = 26.32316\n",
245+
"#sets = 80, exp. u_comp = 26.50695\n",
246+
"#sets = 85, exp. u_comp = 26.64101\n",
247+
"#sets = 90, exp. u_comp = 26.73872\n",
248+
"#sets = 95, exp. u_comp = 26.80988\n",
249+
"#sets = 100, exp. u_comp = 26.86169\n"
250+
]
251+
}
252+
],
253+
"source": [
254+
"# finally the numbers\n",
255+
"for i,ex in [(i,27 * prb(i)) for i in range(1,101)]:\n",
256+
" if (i <= 10) or (i % 5 == 0):\n",
257+
" print(\"#sets = %3d, exp. u_comp = %.5f\" % (i,ex))"
258+
]
259+
}
260+
],
261+
"metadata": {
262+
"kernelspec": {
263+
"display_name": "Python 3",
264+
"language": "python",
265+
"name": "python3"
266+
},
267+
"language_info": {
268+
"codemirror_mode": {
269+
"name": "ipython",
270+
"version": 3
271+
},
272+
"file_extension": ".py",
273+
"mimetype": "text/x-python",
274+
"name": "python",
275+
"nbconvert_exporter": "python",
276+
"pygments_lexer": "ipython3",
277+
"version": "3.5.3"
278+
}
279+
},
280+
"nbformat": 4,
281+
"nbformat_minor": 2
282+
}

0 commit comments

Comments
 (0)