Skip to content

Commit b22e491

Browse files
author
celic
committed
Final changes
1 parent 78650e4 commit b22e491

File tree

2 files changed

+5
-3
lines changed

2 files changed

+5
-3
lines changed

docs/final.pdf

94 Bytes
Binary file not shown.

docs/final.tex

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -125,18 +125,20 @@ \section{Theoretical Analysis}
125125

126126
\section{Experimental Analysis}
127127

128-
We developed Python code to do some simple testing of our algorithm. The code repository can be found here: \url{https://github.com/celic/alpha-sort.git}. To begin testing, we generated four different datasets of one billion numbers with $\beta \approx 0.9$. Warning: Given that we pad each number for easier seeking, generating each file to approximately 90 minutes and took up 11GB.
128+
We developed Python code to do some simple testing of our algorithm. The code repository can be found here: \url{https://github.com/celic/alpha-sort}. To begin testing, we generated four different datasets of one billion numbers with $\beta \approx 0.9$. Warning: Given that we pad each number for easier seeking, generating each file took approximately 90 minutes and was 11GB in size.
129129

130130
We tested a couple of different scenarios on sets of 1000 simulations. In general, it seemed like our algorithm was extremely accurate and in general very promising. However, we noticed an oddity when testing the consistency of the program. We ran 1000 simulations with $\epsilon=0.1, \delta=0.1, \alpha=\beta$, and with 1 iteration. We expect 75\% of the simulations to result with $\alpha$-estimates within the error bound; and 25\% to be outside $\epsilon$. The largest error noticed was $0.039$, which is well within $\epsilon$. We repeated this with several other sets of values, and the results always seemed to be within the bounds. We are unsure as to why this occurred exactly; perhaps $1 - \delta^i$ ends up being a very loose bound of $Pr[(1+\epsilon)\beta n]$. This could mean for the same confidence the error bounds are actually significantly stricter/smaller than what we initially thought. This doesn't overly matter, as the confidence still converges to one after a couple of boosting iterations. Given the chance we would investigate this further, the question came up towards the end of the assignment.
131131

132-
First, we tested the consistency of the program. We ran 1000 simulations with $\epsilon=0.1, \delta=0.25, \alpha=\beta$, and with only one iteration. We expect 75\% of the simulations to result with $\alpha$-estimates within the error bound. The largest error noticed was $0.039$, which is well within $\epsilon$. We repeated this with several other sets of values, and the results always seemed to be within the bounds. We are unsure as to why this occurred exactly.
133-
134132
Performance wise, the code executed very quickly. For any practical set of values, using just one iteration (i.e. no boosting), computing the algorithm ran in just a fraction of a second. Worst case, with 16 iterations, it still only took a few seconds. On top of this, on average over 95\% of execution time was spent doing I/O. This makes sense as I/O is much slower, and $2k$ numbers need to be read, whereas only $k$ calculations are performed (and are much faster). However, if this algorithm was put into production, it would be trivial to parallelize the I/O, providing significant performance boosts to our already very fast algorithm. Overall, we did not perform extensive testing, but from the results so far, our approach seems very promising.
135133

136134
\section{Conclusion}
137135

138136
This algorithm performs surprisingly well for how simple it is in design. By pulling $k$ samples from the $n$ sized list and predicting the $\beta$ value for the list we end up with a very accurate verification algorithm based on $\alpha$. It is important to note that the size of $k$ does not scale with $n$ which is a very pleasing result. This allows the size of the list to potentially be unlimited in size.
139137

138+
The experimental results were exceedingly successful yielding even smaller bounds than were theoretically expected. Again this result can be improved even more with boosting. In practice, the algorithm could be greatly improved with a parallelized implementation as the majority of the computation time was spent reading the values from the file.
139+
140+
With more time and resources (especially CPU speed and HDD size) we would test the algorithm on more varied data sets to reveal a bit more about the nature of the algorithm. This perhaps would clear up the questions surrounding the extremely tight bounds on predicting $\alpha$ as well.
141+
140142
\newpage
141143
\begin{appendices}
142144

0 commit comments

Comments
 (0)