script takes long time to run when comparing digits within strings using foreach
--0-1840572514-1306484281=:10519
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Hi,
I have an array , [at] datas, and each element within [at] datas is a string that's=
made up of 6 digits with spaces in between like this =E2=80=9C1 2 3 4 5 6=
=E2=80=9D, so the array look like this
[at] datas =3D ('1 2 3 4 5 6', '1 2 9 10 11 12', '1 2 3 4 5 8', '1 2 3 4 5 9' ,=
'6 7 8 9 10 11');
Now I wish to compare each element of [at] datas with the rest of the elements =
in [at] datas in such a way that if 5 of the digits match, to take note of the =
matching indices, =C2=A0and so the script I wrote is appended below.
However, the script below takes a long time to run if the datas at [at] datas a=
re huge( eg 30,000 elements). I then wonder is there a way to rewrite the s=
cript so that the script can run faster.
Thanks
=C2=A0
###### script below #######################
=C2=A0
#!/usr/bin/perl
use strict;
=C2=A0
my [at] matched_location =3D ();
my [at] datas =3D ('1 2 3 4 5 6', '1 2 9 10 11 12', '1 2 3 4 5 8', '1 2 3 4 5 9=
' , '6 7 8 9 10 11');
=C2=A0
my $iteration_counter =3D -1;
foreach ( [at] datas){
=C2=A0=C2=A0 $iteration_counter++;
=C2=A0=C2=A0 my $reference =3D $_;
=C2=A0
=C2=A0=C2=A0 my $second_iteration_counter =3D -1;
=C2=A0=C2=A0 my $string =3D '';
=C2=A0=C2=A0 foreach ( [at] datas){
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 $second_iteration_counter++;
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 my [at] individual_digits =3D split / /,$_;
=C2=A0
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 my $ctr =3D 0;
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 foreach( [at] individual_digits){
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if($reference =3D~/^=
$_ | $_ | $_$/){
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=
=A0 $ctr++;
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 }
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 }
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if ($ctr >=3D 5){
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 $string =3D $string =
.. "$second_iteration_counter ";
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 }
=C2=A0=C2=A0 }
=C2=A0=C2=A0 $matched_location[$iteration_counter] =3D $string;
}
=C2=A0
my $ctr =3D -1;
foreach( [at] matched_location){
=C2=A0=C2=A0=C2=A0 $ctr++;
=C2=A0=C2=A0=C2=A0 print "Index $ctr of \ [at] matched_location =3D $_\n";
}
=C2=A0
--0-1840572514-1306484281=:10519--
Re: script takes long time to run when comparing digits within strings using foreach
Hi eventual,
On Friday 27 May 2011 11:18:01 eventual wrote:
> Hi,
> I have an array , [at] datas, and each element within [at] datas is a string that=
's
> made up of 6 digits with spaces in between like this =E2=80=9C1 2 3 4 5 6=
=E2=80=9D, so the
> array look like this [at] datas =3D ('1 2 3 4 5 6', '1 2 9 10 11 12', '1 2 3 =
4 5
> 8', '1 2 3 4 5 9' , '6 7 8 9 10 11'); Now I wish to compare each element
> of [at] datas with the rest of the elements in [at] datas in such a way that if 5
> of the digits match, to take note of the matching indices, and so the
> script I wrote is appended below. However, the script below takes a long
> time to run if the datas at [at] datas are huge( eg 30,000 elements). I then
> wonder is there a way to rewrite the script so that the script can run
> faster. Thanks
>
> ###### script below #######################
>
> #!/usr/bin/perl
> use strict;
>
> my [at] matched_location =3D ();
> my [at] datas =3D ('1 2 3 4 5 6', '1 2 9 10 11 12', '1 2 3 4 5 8', '1 2 3 4 5=
9'
> , '6 7 8 9 10 11');
> my $iteration_counter =3D -1;
> foreach ( [at] datas){
> $iteration_counter++;
> my $reference =3D $_;
>
> my $second_iteration_counter =3D -1;
> my $string =3D '';
> foreach ( [at] datas){
> $second_iteration_counter++;
> my [at] individual_digits =3D split / /,$_;
>
> my $ctr =3D 0;
> foreach( [at] individual_digits){
> if($reference =3D~/^$_ | $_ | $_$/){
> $ctr++;
> }
> }
> if ($ctr >=3D 5){
> $string =3D $string . "$second_iteration_counter ";
> }
> }
> $matched_location[$iteration_counter] =3D $string;
> }
>
> my $ctr =3D -1;
> foreach( [at] matched_location){
> $ctr++;
> print "Index $ctr of \ [at] matched_location =3D $_\n";
> }
>
=46irst of all, you should add "use warnings;" to your code. Then you shoul=
d get
rid of the implicit $_ as loop iterator because it's easy to break. For mor=
e
information see:
http://perl-begin.org/tutorials/bad-elements/
Other than that - you should use a better algorithm. One option would be to=
sort the integers and then use a diff/merge-like algorithm:
http://en.wikipedia.org/wiki/Merge_algorithm
A different way would be to use a hash to count the number of times each
number occured in the two sets, and then see how many of them got a value o=
f 2
(indicating they are in both sets).
But at the moment, everything is very inefficient there.
Regards,
Shlomi Fish
=2D-
=2D--------------------------------------------------------- -------
Shlomi Fish http://www.shlomifish.org/
"Star Trek: We, the Living Dead" - http://shlom.in/st-wtld
I often wonder why I hang out with so many people who are so pedantic. And
then I remember - because they are so pedantic.
=2D- Israeli Perl Monger
Please reply to list if it's a mailing list post - http://shlom.in/reply .
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: script takes long time to run when comparing digits within stringsusing foreach
On 2011-05-27 10:18, eventual wrote:
> I have an array , [at] datas, and each element within [at] datas is a string th=
at's made up of 6 digits with spaces in between like this =E2=80=9C1 2 3 =
4 5 6=E2=80=9D, so the array look like this
> [at] datas =3D ('1 2 3 4 5 6', '1 2 9 10 11 12', '1 2 3 4 5 8', '1 2 3 4 5 =
9' , '6 7 8 9 10 11');
> Now I wish to compare each element of [at] datas with the rest of the eleme=
nts in [at] datas in such a way that if 5 of the digits match, to take note o=
f the matching indices, and so the script I wrote is appended below.
a. Do once what you can do only once. There are at least 2 points where
you didn't: 1. prepare [at] datas before looping; 2. don't compare the same
stuff more than once.
b. Assemble a result, and report at the end. Don't use any 'shared
resources' like incrementing global counters while going along.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my [at] data =3D <DATA>;
$_ =3D { map { $_ =3D> 1 } split } for [at] data;
$ARGV[0] and print Dumper( \ [at] data );
my [at] result;
for my $i ( 0 .. $#data - 1 ) {
my [at] k =3D keys %{ $data[ $i ] };
for my $j ( $i + 1 .. $#data ) {
my $n =3D 0;
exists $data[ $j ]{ $_ } and ++$n for [at] k;
$n >=3D 5 and push [at] result, [ $i, $j ];
}
}
print Dumper( \ [at] result );
__DATA__
1 2 3 4 5 6
1 2 9 10 11 12
1 2 3 4 5 8
1 2 3 4 5 9
6 7 8 9 10 11
--
Ruud
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: script takes long time to run when comparing digits within stringsusing foreach
eventual wrote:
> Hi,
Hello,
> I have an array , [at] datas, and each element within [at] datas is a string
> that's made up of 6 digits with spaces in between like this =E2=80=9C1 =
2 3 4 5
> 6=E2=80=9D, so the array look like this
> [at] datas =3D ('1 2 3 4 5 6', '1 2 9 10 11 12', '1 2 3 4 5 8', '1 2 3 4 5
> 9' , '6 7 8 9 10 11');
> Now I wish to compare each element of [at] datas with the rest of the
> elements in [at] datas in such a way that if 5 of the digits match, to
> take note of the matching indices, and so the script I wrote is
> appended below.
> However, the script below takes a long time to run if the datas at
> [at] datas are huge( eg 30,000 elements). I then wonder is there a way to
> rewrite the script so that the script can run faster.
> Thanks
>
> ###### script below #######################
>
> #!/usr/bin/perl
> use strict;
>
> my [at] matched_location =3D ();
> my [at] datas =3D ('1 2 3 4 5 6', '1 2 9 10 11 12', '1 2 3 4 5 8', '1 2 3 4=
5 9' , '6 7 8 9 10 11');
>
> my $iteration_counter =3D -1;
> foreach ( [at] datas){
> $iteration_counter++;
> my $reference =3D $_;
>
> my $second_iteration_counter =3D -1;
> my $string =3D '';
> foreach ( [at] datas){
> $second_iteration_counter++;
> my [at] individual_digits =3D split / /,$_;
>
> my $ctr =3D 0;
> foreach( [at] individual_digits){
> if($reference =3D~/^$_ | $_ | $_$/){
> $ctr++;
> }
> }
> if ($ctr>=3D 5){
> $string =3D $string . "$second_iteration_counter ";
> }
> }
> $matched_location[$iteration_counter] =3D $string;
> }
>
> my $ctr =3D -1;
> foreach( [at] matched_location){
> $ctr++;
> print "Index $ctr of \ [at] matched_location =3D $_\n";
> }
Your program can be reduced to:
my [at] matched_location;
my [at] datas =3D ( '1 2 3 4 5 6', '1 2 9 10 11 12', '1 2 3 4 5 8', '1 2 3 4 =
5
9', '6 7 8 9 10 11' );
for my $i ( 0 .. $#datas ) {
for my $j ( 0 .. $#datas ) {
$matched_location[ $i ] .=3D "$j " if 5 <=3D grep $datas[ $i ] =3D=
~
/(?:^|(?<=3D ))$_(?=3D |$)/, split ' ', $datas[ $j ]
}
}
print map "Index $_ of \ [at] matched_location =3D $matched_location[$_]\n", 0=
... $#matched_location;
You should benchmark it to see if it is any faster than your original cod=
e.
John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/