Sort question

Hello Group,

I have a file that looks something like this:

655060-17 B08068013 11.00 EA
655060-V5 B08068013 33.00 EA
655060-3C B08068013 22.00 EA
655060-4H B08068013 11.00 EA
655060-4J B08068013 11.00 EA
655060-4I B08068013 11.00 EA

When I sort it by the 2nd field (only) as: sort -k2,2 ; I get the
follwing output:

655060-17 B08068013 11.00 EA
655060-3C B08068013 22.00 EA
655060-4H B08068013 11.00 EA
655060-4I B08068013 11.00 EA
655060-4J B08068013 11.00 EA
655060-V5 B08068013 33.00 EA

If the value in the 2nd field is the same, why is it (re)sorting and
changing the order of the 1st field? Shouldn't the output be the same
as the input?

Thanks,
Suhas
Unix-Shell [ Do, 21 Juli 2005 22:25 ] [ ID #888024 ]

Re: Sort question

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Unix-Shell wrote:
> Hello Group,
>
> I have a file that looks something like this:
>
> 655060-17 B08068013 11.00 EA
> 655060-V5 B08068013 33.00 EA
> 655060-3C B08068013 22.00 EA
> 655060-4H B08068013 11.00 EA
> 655060-4J B08068013 11.00 EA
> 655060-4I B08068013 11.00 EA
>
> When I sort it by the 2nd field (only) as: sort -k2,2 ; I get the
> follwing output:
>
> 655060-17 B08068013 11.00 EA
> 655060-3C B08068013 22.00 EA
> 655060-4H B08068013 11.00 EA
> 655060-4I B08068013 11.00 EA
> 655060-4J B08068013 11.00 EA
> 655060-V5 B08068013 33.00 EA
>
> If the value in the 2nd field is the same, why is it (re)sorting and
> changing the order of the 1st field? Shouldn't the output be the same
> as the input?

It depends on which sorting algorithm is used. Some sorting algorithms don't
change the relative order of records with the same key, and some do. What sort
algorithm does your sort(1) use?

- --
Lew Pitcher

Master Codewright & JOAT-in-training | GPG public key available on request
Registered Linux User #112576 (http://counter.li.org/)
Slackware - Because I know what I'm doing.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.7 (GNU/Linux)

iD8DBQFC4AzjagVFX4UWr64RAhQ+AJwMWgkNK4apbyMJHme0QOsrEc9AIgCc D7qc
hWVYgXvW+9SQmYrl2J5nsGw=
=cdlT
-----END PGP SIGNATURE-----
Lew Pitcher [ Do, 21 Juli 2005 23:00 ] [ ID #888028 ]

Re: Sort question

On 21.07.2005, Unix-Shell <sgtembe [at] hotmail.com> wrote:
> Hello Group,
>
> I have a file that looks something like this:
>
> 655060-17 B08068013 11.00 EA
> 655060-V5 B08068013 33.00 EA
> 655060-3C B08068013 22.00 EA
> 655060-4H B08068013 11.00 EA
> 655060-4J B08068013 11.00 EA
> 655060-4I B08068013 11.00 EA
>
> When I sort it by the 2nd field (only) as: sort -k2,2 ; I get the
> follwing output:
>
> 655060-17 B08068013 11.00 EA
> 655060-3C B08068013 22.00 EA
> 655060-4H B08068013 11.00 EA
> 655060-4I B08068013 11.00 EA
> 655060-4J B08068013 11.00 EA
> 655060-V5 B08068013 33.00 EA
>
> If the value in the 2nd field is the same, why is it (re)sorting and
> changing the order of the 1st field? Shouldn't the output be the same
> as the input?

Is the sorting algorithm _stable_? No, it isn't, because stable sorting
is (by a constant factor) slower than non-stable sorting. sort from GNU
coreutils has -s switch, which enables stable sorting.

--
Feel free to correct my English
Stanislaw Klekot
dozzie [ Do, 21 Juli 2005 23:10 ] [ ID #888029 ]

Re: Sort question

Not sure about the sorting algorithm, but I am doing this on AIX 5.2.
Thanks
Unix-Shell [ Do, 21 Juli 2005 23:20 ] [ ID #888030 ]

Re: Sort question

"Stachu 'Dozzie' K." <dozzie [at] dynamit.im.pwr.wroc.pl.nospam> wrote in message news:slrnde04ch.cg8.dozzie [at] hans.zsh.bash.org.pl...
> On 21.07.2005, Unix-Shell <sgtembe [at] hotmail.com> wrote:
> > Hello Group,
> >
> > I have a file that looks something like this:
> >
> > 655060-17 B08068013 11.00 EA
> > 655060-V5 B08068013 33.00 EA
> > 655060-3C B08068013 22.00 EA
> > 655060-4H B08068013 11.00 EA
> > 655060-4J B08068013 11.00 EA
> > 655060-4I B08068013 11.00 EA
> >
> > When I sort it by the 2nd field (only) as: sort -k2,2 ; I get the
> > follwing output:
> >
> > 655060-17 B08068013 11.00 EA
> > 655060-3C B08068013 22.00 EA
> > 655060-4H B08068013 11.00 EA
> > 655060-4I B08068013 11.00 EA
> > 655060-4J B08068013 11.00 EA
> > 655060-V5 B08068013 33.00 EA
> >
> > If the value in the 2nd field is the same, why is it (re)sorting and
> > changing the order of the 1st field? Shouldn't the output be the same
> > as the input?
>
> Is the sorting algorithm _stable_? No, it isn't, because stable sorting
> is (by a constant factor) slower than non-stable sorting. sort from GNU
> coreutils has -s switch, which enables stable sorting.
>

One way to achieve stable sorting is to use nl (or awk/perl) to
add the line number as an extra field, then sort with the desired
keys *and* the line number as the last sort key, then use cut
(or sed etc) to remove the line number field.

--
John.
John L [ Fr, 22 Juli 2005 09:01 ] [ ID #889729 ]

Re: Sort question

John L wrote:
> "Stachu 'Dozzie' K." <dozzie [at] dynamit.im.pwr.wroc.pl.nospam> wrote in message news:slrnde04ch.cg8.dozzie [at] hans.zsh.bash.org.pl...
>
>>On 21.07.2005, Unix-Shell <sgtembe [at] hotmail.com> wrote:
>>
>>>Hello Group,
>>>
>>>I have a file that looks something like this:
>>>
>>>655060-17 B08068013 11.00 EA
>>>655060-V5 B08068013 33.00 EA
>>>655060-3C B08068013 22.00 EA
>>>655060-4H B08068013 11.00 EA
>>>655060-4J B08068013 11.00 EA
>>>655060-4I B08068013 11.00 EA
>>>
>>>When I sort it by the 2nd field (only) as: sort -k2,2 ; I get the
>>>follwing output:
>>>
>>>655060-17 B08068013 11.00 EA
>>>655060-3C B08068013 22.00 EA
>>>655060-4H B08068013 11.00 EA
>>>655060-4I B08068013 11.00 EA
>>>655060-4J B08068013 11.00 EA
>>>655060-V5 B08068013 33.00 EA
>>>
>>>If the value in the 2nd field is the same, why is it (re)sorting and
>>>changing the order of the 1st field? Shouldn't the output be the same
>>>as the input?
<snip>
> One way to achieve stable sorting is to use nl (or awk/perl) to
> add the line number as an extra field, then sort with the desired
> keys *and* the line number as the last sort key, then use cut
> (or sed etc) to remove the line number field.
>

I was so impressed with the simplicity and elegance of this solution I
just had to applaud.

$ cat -n file | sort -k3,3 -k1,1 | cut -d" " -f2-

655060-17 B08068013 11.00 EA
655060-V5 B08068013 33.00 EA
655060-3C B08068013 22.00 EA
655060-4H B08068013 11.00 EA
655060-4J B08068013 11.00 EA
655060-4I B08068013 11.00 EA

Having said that, GNU sort has a "-s" option for "stable" sort which
preserves the input ordering when the keys are equal:

$ sort -s -k2,2 file
655060-17 B08068013 11.00 EA
655060-V5 B08068013 33.00 EA
655060-3C B08068013 22.00 EA
655060-4H B08068013 11.00 EA
655060-4J B08068013 11.00 EA
655060-4I B08068013 11.00 EA

so, if GNU sort is an option I'd go with that.

Regards,

Ed.
Ed Morton [ Fr, 22 Juli 2005 16:31 ] [ ID #889739 ]

Re: Sort question

On the idea of adding in a line number to the data ( JohnL) :
`grep -n . foo.txt`
1:655060-17 B08068013 11.00 EA
2:655060-V5 B08068013 33.00 EA
3:655060-3C B08068013 22.00 EA
4:655060-4H B08068013 11.00 EA
5:655060-4J B08068013 11.00 EA
6:655060-4I B08068013 11.00 EA
The -n option in grep adds in the line number.

Then the sort worked the way you wanted, If I understood.
`grep -n . foo.txt |sort -k2,2`
1:655060-17 B08068013 11.00 EA
2:655060-V5 B08068013 33.00 EA
3:655060-3C B08068013 22.00 EA
4:655060-4H B08068013 11.00 EA
5:655060-4J B08068013 11.00 EA
6:655060-4I B08068013 11.00 EA

Then we only need to drop out the number:, cut can do that with a final
pipe at the end:
`| cut -d ':' -f2-. `

2 cents. JB
johngnub [ Fr, 22 Juli 2005 20:36 ] [ ID #889753 ]
Linux » comp.unix.shell » Sort question

Vorheriges Thema: tcsh - commandline editing does not go back to previous line!
Nächstes Thema: file date stamp