Is there any reason to implement my own sorting algorithm?

Sorting has been studied for decades, so surely the sorting algorithms provided by any programming platform (java, .NET, etc.) should be good by now, right? Is there a reason to override something like System.Collections.SortedList?

+4
source share
9 answers

There is absolutely a time when your intimate understanding of your data can lead to much more efficient sorting algorithms than any general-purpose algorithm available. I shared an example of this situation in another post at SO, but I will share it just to provide an example-in-point:

In the days of COBOL, FORTRAN, etc. the developer working in the telephone company had to take a relatively large piece of data consisting of active phone numbers (I believe it was in the New York area) and sort this list. The initial implementation used heap sorting (these were 7-digit phone numbers, and many sorting operations took place during the sorting process to replace disks, so heap sorting made sense).

In the end, the developer came up with a different approach: realizing that one, and only one of each phone number could exist in his data set, he realized that he did not need to store the actual phone numbers in memory. Instead, he processed the entire 7-digit number of phone numbers as a very long bitmap (8 phone numbers per byte, 10 million phone numbers require a little more than a megabyte to capture the entire space). Then he made one pass through his original data and set a bit for each phone number that he found in 1. Then he made a final pass through a bit bit, which searches for high bits and displays a sorted list of phone numbers.

This new algorithm was much faster (at least 1000 times faster) than the heap sorting algorithm, and consumed approximately the same amount of memory.

I would say that in this case, the developer absolutely needs to develop his own sorting algorithm.

If your application is related to sorting, and you really know your problem space, then it is entirely possible that you will come up with an application-specific algorithm that is superior to any general purpose algorithm.

However, if sorting is an auxiliary part of your application or you just implement a general purpose algorithm, the chances are very, very good that some extremely smart university types have already provided an algorithm that is better than anything you can come up with. Quick sorting is really difficult if you can store things in memory, and heap sorting is very effective for massive sorting of the data set (although I personally prefer to use implementations like B + Tree for the b / c heap, which they are configured for disk paging )

+16
source

Generally not.

However, you know your data better than the people who wrote these sorting algorithms. Perhaps you can come up with an algorithm that is better than a general algorithm for your specific data set.

+8
source

Implementing your own sorting algorithm is akin to optimization, and as Sir Charles Anthony Richard Hoar said: "We must forget about low efficiency, say about 97% of the time: premature optimization is the root of all evil."

+3
source

Some libraries (such as Java very own Collections.sort) implement sorting based on criteria that may or may not apply to you. For example, Collections.sort uses merge sort for O (n log (n)) efficiency, as well as the fact that it is in-place sort. If two different elements have the same value, the first element in the original collection stays ahead (useful for multi-pass sorting by different criteria (first scan by date, then by name, collection by name (then date))) However, if you want to get a little higher constants or to have a special data set, it may make more sense to implement your own sorting or notation sorting exactly the way you want.

However, all operations are performed quickly on sufficiently small n

+2
source

Short answer; no, except for academic interest.

+1
source
  • You may need to have multithreading implementation sorting.
  • You may need better performance characteristics than Quicksorts O (n log n), such as bucketsort.
  • You may need stable sorting, while the default algorithm uses quicksort. Especially for user interfaces, you want the sort order to be consistent.
  • More efficient algorithms may be available for the data structures that you use.
  • You may need an iterative implementation of the default sorting algorithm due to (e.g. sorting large datasets).

Ad infinitum.

+1
source

A few months ago, the Coding Horror blog reported on some platform with a terribly poor sorting algorithm. If you need to use this platform, you will probably want to implement your own.

0
source

The general purpose sorting problem has been investigated to hell and vice versa, so there is no point in worrying about what is beyond academic interest. However, most sorting is not performed for generic input, and often you can use data properties to increase the speed of sorting.

A common example is sorting a count. It has been proven that for general purpose comparisons, O (n lg n) is the best we can ever hope to do.

However, suppose we know a range in which the values ​​to be sorted are in a fixed range, for example [a, b]. If we create an array of size b - a + 1 (everything is set to zero by default), we can linearly scan the array using this array to store the count of each element, which leads to linear sorting by time (by data range) - violation of n lg n bound, but only because we use a special property of our data. See here for more details.

So yes, it’s useful to write your own sorting algorithms. Pay attention to what you sort, and sometimes you can come up with wonderful improvements.

0
source

If you have experience implementing sorting algorithms and understanding how data characteristics affect their performance, you already know the answer to your question. In other words, you already know things like QuickSort, it has pedestrian performance versus an almost sorted list. :-) And what if you have data in certain structures, some types of sorting are (almost) free. Etc.

Otherwise, no.

0
source

All Articles