Sorting lists in .Net - why it sucks

Today I stumbled upon a very interesting 'feature' of .net 2.0
My application uses a generic SortedList class for storing a sorted
list of objects (on a DateTime key) - that is, SortedList<DateTime,
object>. However, the program kept crashing for some time until I
debugged it to find out that it crashes on inserting data to the
sorted list. Problem was that I was inserting two items with the same
date (the same key).
Hey, how do you think, should sorted list just crash when there is a
duplicate value? Normal lists are expected to store duplicate values,
but some great inventor at Microsoft has made it better. They have
documented this behavior, but why they assume sorting shouldn't work
for duplicate entries in the list? Normal sorting algorithms are just
fine with that.
Of course, I had to use non-generic ArrayList instead with custom
IComparer for sorting.
And there goes another surprise - ArrayList.Sort implements an
unstable sort algorithm. That is, for equal objects, it does not
guarantee that after sorting they will keep the original order. Of
course this is not documented at all. And of course causes problems -
in my case it would make sequential messages with timestamp come in
wrong order.

I don't know what sort algorithm did Microsoft use, but simple
QuickSort implementation is a stable algorithm and has no problem with
sorting duplicated values. When Microsoft recruits any developer, they
ask him to implement some list sorting algorithm during job interview,
so I thought all developers in Microsoft are capable of implementing
it correctly.

Best regards,
RG
nightwatch77 [ Fr, 25 Januar 2008 14:38 ] [ ID #1915954 ]

Re: Sorting lists in .Net - why it sucks

For info, Wintellect (http://www.wintellect.com/PowerCollections.aspx) has
written a set of generic collection classes that support what you are
looking for (OrderedBag<T>).

From Jeffrey Richter Book:
---------------------------
At Microsoft's request, Wintellect has produced the "Power Collections
Library" to bring some of the C++ Standard Template Library's collection
classes to the CLR programmer.

- José

"nightwatch77" <rafal.gwizdala [at] gmail.com> wrote in message
news:8c3f6f35-5f68-4cdf-8fd5-8aa2b462c433 [at] v46g2000hsv.google groups.com...
> Today I stumbled upon a very interesting 'feature' of .net 2.0
> My application uses a generic SortedList class for storing a sorted
> list of objects (on a DateTime key) - that is, SortedList<DateTime,
> object>. However, the program kept crashing for some time until I
> debugged it to find out that it crashes on inserting data to the
> sorted list. Problem was that I was inserting two items with the same
> date (the same key).
> Hey, how do you think, should sorted list just crash when there is a
> duplicate value? Normal lists are expected to store duplicate values,
> but some great inventor at Microsoft has made it better. They have
> documented this behavior, but why they assume sorting shouldn't work
> for duplicate entries in the list? Normal sorting algorithms are just
> fine with that.
> Of course, I had to use non-generic ArrayList instead with custom
> IComparer for sorting.
> And there goes another surprise - ArrayList.Sort implements an
> unstable sort algorithm. That is, for equal objects, it does not
> guarantee that after sorting they will keep the original order. Of
> course this is not documented at all. And of course causes problems -
> in my case it would make sequential messages with timestamp come in
> wrong order.
>
> I don't know what sort algorithm did Microsoft use, but simple
> QuickSort implementation is a stable algorithm and has no problem with
> sorting duplicated values. When Microsoft recruits any developer, they
> ask him to implement some list sorting algorithm during job interview,
> so I thought all developers in Microsoft are capable of implementing
> it correctly.
>
> Best regards,
> RG
>
>
Alexander Gilman Carv [ Fr, 25 Januar 2008 15:41 ] [ ID #1915958 ]

Re: Sorting lists in .Net - why it sucks

ouups, the one you need is:
OrderedMultiDictionary<TKey,TValue>

- José

"José Joye" <Me [at] me.com> wrote in message
news:FF2FA6B1-C294-4013-A1DA-C444038D8FC8 [at] microsoft.com...
> For info, Wintellect (http://www.wintellect.com/PowerCollections.aspx) has
> written a set of generic collection classes that support what you are
> looking for (OrderedBag<T>).
>
> From Jeffrey Richter Book:
> ---------------------------
> At Microsoft's request, Wintellect has produced the "Power Collections
> Library" to bring some of the C++ Standard Template Library's collection
> classes to the CLR programmer.
>
> - José
>
> "nightwatch77" <rafal.gwizdala [at] gmail.com> wrote in message
> news:8c3f6f35-5f68-4cdf-8fd5-8aa2b462c433 [at] v46g2000hsv.google groups.com...
>> Today I stumbled upon a very interesting 'feature' of .net 2.0
>> My application uses a generic SortedList class for storing a sorted
>> list of objects (on a DateTime key) - that is, SortedList<DateTime,
>> object>. However, the program kept crashing for some time until I
>> debugged it to find out that it crashes on inserting data to the
>> sorted list. Problem was that I was inserting two items with the same
>> date (the same key).
>> Hey, how do you think, should sorted list just crash when there is a
>> duplicate value? Normal lists are expected to store duplicate values,
>> but some great inventor at Microsoft has made it better. They have
>> documented this behavior, but why they assume sorting shouldn't work
>> for duplicate entries in the list? Normal sorting algorithms are just
>> fine with that.
>> Of course, I had to use non-generic ArrayList instead with custom
>> IComparer for sorting.
>> And there goes another surprise - ArrayList.Sort implements an
>> unstable sort algorithm. That is, for equal objects, it does not
>> guarantee that after sorting they will keep the original order. Of
>> course this is not documented at all. And of course causes problems -
>> in my case it would make sequential messages with timestamp come in
>> wrong order.
>>
>> I don't know what sort algorithm did Microsoft use, but simple
>> QuickSort implementation is a stable algorithm and has no problem with
>> sorting duplicated values. When Microsoft recruits any developer, they
>> ask him to implement some list sorting algorithm during job interview,
>> so I thought all developers in Microsoft are capable of implementing
>> it correctly.
>>
>> Best regards,
>> RG
>>
>>
>
Alexander Gilman Carv [ Fr, 25 Januar 2008 16:05 ] [ ID #1915959 ]

RE: Sorting lists in .Net - why it sucks

"nightwatch77" wrote:

> Today I stumbled upon a very interesting 'feature' of .net 2.0
> My application uses a generic SortedList class for storing a sorted
> list of objects (on a DateTime key) - that is, SortedList<DateTime,
> object>. However, the program kept crashing for some time until I
> debugged it to find out that it crashes on inserting data to the
> sorted list. Problem was that I was inserting two items with the same
> date (the same key).
> Hey, how do you think, should sorted list just crash when there is a
> duplicate value? Normal lists are expected to store duplicate values,
> but some great inventor at Microsoft has made it better. They have
> documented this behavior, but why they assume sorting shouldn't work
> for duplicate entries in the list? Normal sorting algorithms are just
> fine with that.
> Of course, I had to use non-generic ArrayList instead with custom
> IComparer for sorting.
> And there goes another surprise - ArrayList.Sort implements an
> unstable sort algorithm. That is, for equal objects, it does not
> guarantee that after sorting they will keep the original order. Of
> course this is not documented at all. And of course causes problems -
> in my case it would make sequential messages with timestamp come in
> wrong order.
>
> I don't know what sort algorithm did Microsoft use, but simple
> QuickSort implementation is a stable algorithm and has no problem with
> sorting duplicated values. When Microsoft recruits any developer, they
> ask him to implement some list sorting algorithm during job interview,
> so I thought all developers in Microsoft are capable of implementing
> it correctly.
>
> Best regards,
> RG
>
>

You should create your own class that encapsulates a DateTime object and
implements IComparable. You just need to handle the equals case as you see
fit. Here is an example of what I mean:

namespace SortedListTest
{
public class ComparableDateTime : IComparable
{
private DateTime mDT;
public int CompareTo ( object obj )
{
if (mDT.Ticks < ((ComparableDateTime) obj).mDT.Ticks) return -1;
return 1;
}

public ComparableDateTime ( long milliseconds )
{
mDT = new DateTime ( milliseconds );
}
}

class Program
{
static void Main ( string [ ] args )
{
SortedList<ComparableDateTime, string> msl;
msl = new SortedList<ComparableDateTime, string> ();

msl.Add ( new ComparableDateTime ( 123456786 ), "string one" );
msl.Add ( new ComparableDateTime ( 123456786 ), "string two" );

Console.Out.WriteLine ( );

foreach (string s in MySortedList.Values)
{
Console.WriteLine ( s );
}

Console.Out.WriteLine ( );
Console.ReadLine ();
}
}
}


>
FamilyTreeMike [ Fr, 25 Januar 2008 16:26 ] [ ID #1915960 ]

Re: Sorting lists in .Net - why it sucks

When you're done with the venom, this is clearly documented:

http://msdn2.microsoft.com/en-us/library/ms132319.aspx
"Every key in a SortedList<(Of <(TKey, TValue>)>) must be unique."

(equally, the exception tells you very clearly what the problem is)

With regards to unstable sort, this too is clearly documented:
http://msdn2.microsoft.com/en-us/library/b0zbh7b6.aspx
"This implementation performs an unstable sort; that is, if two
elements are equal, their order might not be preserved."

I might be mistaken but I seem to recall that the LINQ sort is stable.
Likewise in like there is an ILookup<TKey,TValue> for duplicated keys,
but the default implementation (Lookup<TKey, TValue>) is immutable. It
took me only a few moments to write a mutable ILookup<TKey, TValue>
implementation, however.

Marc
Marc Gravell [ Fr, 25 Januar 2008 16:38 ] [ ID #1915961 ]

Re: Sorting lists in .Net - why it sucks

On Jan 25, 1:38 pm, nightwatch77 <rafal.gwizd... [at] gmail.com> wrote:

<snip>

> I don't know what sort algorithm did Microsoft use, but simple
> QuickSort implementation is a stable algorithm and has no problem with
> sorting duplicated values.

I assume by "simple" you mean "using extra memory rather than sorting
in place". The more efficient in-place quick sort isn't stable.

Jon
skeet [ Fr, 25 Januar 2008 17:20 ] [ ID #1915964 ]

Re: Sorting lists in .Net - why it sucks

Hi,

- Family Tree Mike, you probably didn't notice I have implemented
IComparer, thanks anyway.
- Marc, I know Microsoft has documented the weird SortedList behavior,
but does it add more sense to the API?
- Jon, you're right, in-place quicksort is unstable, but anyway, there
is a stable version.
Still, the sorting functionality is very essential to a collections
API, just like a good hashtable implementation. It should be done
better and programmers should have the option to select sorting
algorithm.
best regards
RG
nightwatch77 [ Fr, 25 Januar 2008 20:17 ] [ ID #1915967 ]

Re: Sorting lists in .Net - why it sucks

Check out http://www.itu.dk/research/c5/.
Great stuff.

LP,
Dejan
Dejan Stanic [ Fr, 25 Januar 2008 23:59 ] [ ID #1915983 ]

Re: Sorting lists in .Net - why it sucks

I guess what I was too subtle about was, that by

1. writing an IComparable (not IComparer) class wrapping DateTime,
2. that the comparison routine never returns equal,

the code does not crash on equal keys.

"nightwatch77" wrote:

> Hi,
>
> - Family Tree Mike, you probably didn't notice I have implemented
> IComparer, thanks anyway.
> - Marc, I know Microsoft has documented the weird SortedList behavior,
> but does it add more sense to the API?
> - Jon, you're right, in-place quicksort is unstable, but anyway, there
> is a stable version.
> Still, the sorting functionality is very essential to a collections
> API, just like a good hashtable implementation. It should be done
> better and programmers should have the option to select sorting
> algorithm.
> best regards
> RG
>
>
>
FamilyTreeMike [ Sa, 26 Januar 2008 01:15 ] [ ID #1916710 ]

Re: Sorting lists in .Net - why it sucks

On Jan 26, 1:15 am, Family Tree Mike
<FamilyTreeM... [at] discussions.microsoft.com> wrote:
> I guess what I was too subtle about was, that by
>
> 1. writing an IComparable (not IComparer) class wrapping DateTime,
> 2. that the comparison routine never returns equal,
>
> the code does not crash on equal keys.

Mike, your solution is very simple but doesn't help a bit because it
doesn't guarantee stable sorting. Equal timestamps is something I have
to handle in a certain way, that is I have to get them in the same
order as in input, and your IComparable just makes sure two timestamps
are never equal - that solves some problem, but not mine.
nightwatch77 [ Mo, 28 Januar 2008 12:32 ] [ ID #1917818 ]

Re: Sorting lists in .Net - why it sucks

To restate - the LINQ sort is stable:
http://technet.microsoft.com/en-us/library/bb549422.aspx
"This method performs a stable sort; that is, if the keys of two
elements are equal, the order of the elements is preserved."

If .NET 3.5 isn't an option, then hacky though it sounds, perhaps
consider adding a member to keep track of the original insertion order
(just an int), and then use that as a secondary after your primary
sort condition(s) - i.e. the following will give you a stable sort
(assuming you give each Foo incremental Sequence values, perhaps using
something in the ctor):

class Foo {
public int Sequence { get; set; }
public double Value { get; set; }
}
public class FooComparer : IComparer<Foo> {
int Compare(Foo x, Foo y) {
// primary sort
int result = x.Value.CompareTo(y.Value);
// backup for stability
if (result == 0) {
result = x.Sequence.CompareTo(y.Sequence);
}
return result;
}
}

Yes it is duplication...

Third option: use an external library.

Marc
Marc Gravell [ Mo, 28 Januar 2008 12:51 ] [ ID #1917820 ]
Microsoft » microsoft.public.dotnet.general » Sorting lists in .Net - why it sucks

Vorheriges Thema: Automated Install of a .NET application
Nächstes Thema: Q re Com Interop and threading