-
-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Tim sort. #239
Implement Tim sort. #239
Conversation
Can you review this PR? It's been sitting here for 3 weeks, @appgurueu? |
Hey, @appgurueu. Could you please have a look at this PR? It's been here for a month. I think this PR is good enough to be accepted and merged into the master branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review, I needed to take the time to review this properly. I'm afraid there are some issues with this.
To summarize: The tests are currently insufficient and not deduplicated. The implementation is overly simple and loses defining characteristics of Timsort. (In other words, this is not quite a Timsort yet, although it is inspired by the same core idea.)
- Refactor merge function to handle merge space over head and add galloping mode - Make the comparator optional, default to an ascending comparator - Managing a stack of sorted runs with special size invariants in order to do balanced merges - Add tests (include edge cases as proposed)
Can you review the updates @appgurueu, @raklaptudirm? |
- calculates the minimum run length for Tim sort based on the length of the array - add some edge tests
I will break down the implementation in detail as follows:
Key Techniques
The optimized TimSort provides a highly efficient and robust sorting algorithm for various applications. |
Can you have a look @appgurueu? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests look fine, though please also add a test for sorting stability. (Note that this is not the same as testing duplicates, since you can't detect it if indistinguishable duplicates were swapped. What you need to do is to for example sort objects consisting of a "key" and a "value" by the "key" using a custom comparator, such that you can then check whether the order of "values" was preserved among objects of equal keys. For example if you had [{key: 42, value: 1}, {key: 42, value: 2}]
, the sorting algorithm should leave that untouched.)
It would also be good to do a couple iterations of the randomized test.
As an aside, please don't dump what looks like spammy LLM output on me.
Thanks for your helpful reviews and suggestions. I will close the PR because I can't go forward on this PR anymore. The algorithm is very complicated in detail and the way you need it is so technically restricted to an open-source project for education. |
You're welcome. Thank you for your PR.
Indeed it is. It is probably a bit of an unfortunate choice if you're looking for a particularly "elegant" algorithm to implement.
It is crucial for education that Timsort (as well as other algorithms) are represented properly; Timsort specifically lives from all these little observations enabling it to save some real-world time on real-world data. These aren't arbitrary technical restrictions, these are defining characteristics of Timsort. If you call something Timsort, it should be Timsort. If we were to misrepresent an oversimplified version of Timsort as Timsort, the issue would only get worse down the line: Each time someone makes a port of this implementation and decides to cut some corners themselves, the algorithm would stray more and more from the original. |
From a reviewer's perspective, I appreciate the steps you are taking to enhance the quality of the PRs. It helps the learner or anyone using these resources for their learning (whether studying or working) right away. |
Tim sort is a sorting algorithm that combines the techniques of insertion sort and merge sort. It was designed to perform well on many kinds of real-world data. The algorithm works by dividing the input into small pieces, sorting them using insertion sort, and then merging them using merge sort. Tim sort has a time complexity of
O(nlogn)
in the worst-case scenario and is widely used in programming languages such as Python, Java, and C++.