good practice in File::Find
I want to know if doing something like what is in the code below would
be expensive or for some other reason a bad choice.
There is more code, that either feeds the `find()' function or further
processes the results of the `find()' part. The code is not in
finished form, or tested, but more to show what I'm trying to do.
There are, probably unnecessary, comments to try to show intent.
I want to be able to pass in a regex to find specific directories and
a regex to find things in the text of numeric named files in those
directories.
The passing part will probably be done with getopts standard. Or just
a shift of two expected args. That part is not what I'm asking about.
I'm more concerned with how the code would plow through a directory
hierarchy.
This would be in a directory hierarchy that would contain many levels
such as a News hierarchy, where each segment of a newsgroup name is a
level that may contain many branches and possibly many thousands of single
messages in each level and branch.
A place where you might want to pass in the regex `linux\.' to search
only the newsgroups below linux in the hierarchy, for the text_regex
that might be in the files there.
The idea being to allow you to focus a search without having to know
the exact name of the newsgroup[s]. You would at least be searching a
group with the string `linux.' in it.
So what I'm curious about is if it would be good to `next' out if the
File::Find::dir does not contain linux\.
Like:
next if(! $File::Find::dir =~ /$dir_rgx/);
or
Like I've done in he code below. Just let the dir_rgx be a selector
and not worry about pulling the next line immediately.
I've thought about using `stat' to allow only directories into the
first directory based test as a further way to help focus things. But
not sure any of this will help speed things up.
Not really sure, even if File::Find is the best way to do this.
Or probably the most likely, still another way of doing this that will
be faster or better coding.
The search is bound to be a bit slow but here is a place where coding
for speed might really make a difference.
------- --------- ---=--- --------- --------
use strict;
use warnings;
use File::Find;
[...]
find(
sub {
## if we have a directory name that matches
if($File::Find::dir =~ /$dir_rgx/){
## if that directory has files with all numeric names
if(/^\d+$/){
## Open the files and search for a regex in the text
open($fh,"< $File::Find::name")
or die "Can't open $File::Find::name: $!";
while(<$fd>){
if(/$text_rgx/){
print, $_;
}
close($fh);
}
}
}
)
[...]
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: good practice in File::Find
Harry Putnam <reader [at] newsguy.com> writes:
> [...]
>
> find(
> sub {
> ## if we have a directory name that matches
> if($File::Find::dir =~ /$dir_rgx/){
> ## if that directory has files with all numeric names
> if(/^\d+$/){
> ## Open the files and search for a regex in the text
> open($fh,"< $File::Find::name")
> or die "Can't open $File::Find::name: $!";
> while(<$fd>){
> if(/$text_rgx/){
> print, $_;
> }
> close($fh);
> }
> }
> }
},
## Top level dir we start at
$topdir;
> )
>
> [...]
I neglected to finish that find() a little better.
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: good practice in File::Find
Harry Putnam wrote:
> find(
> sub {
> ## if we have a directory name that matches
> if($File::Find::dir =~ /$dir_rgx/){
> ## if that directory has files with all numeric names
> if(/^\d+$/){
if( ! /\D/ ){
> ## Open the files and search for a regex in the text
> open($fh,"< $File::Find::name")
> or die "Can't open $File::Find::name: $!";
open my $fh, '<', $File::Find::name
or die "could not open $File::Find::name: $!\n";
> while(<$fd>){
while( <$fh> ){
> if(/$text_rgx/){
> print, $_;
print;
> }
> close($fh);
> }
> }
> }
},
[at] directory_list
> )
--
Just my 0.00000002 million dollars worth,
Shawn
Programming is as much about organization and communication
as it is about coding.
I like Perl; it's the only language where you can bless your
thingy.
Eliminate software piracy: use only FLOSS.
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: good practice in File::Find
Harry Putnam wrote:
> I want to know if doing something like what is in the code below would
> be expensive or for some other reason a bad choice.
>
> There is more code, that either feeds the `find()' function or further
> processes the results of the `find()' part. The code is not in
> finished form, or tested, but more to show what I'm trying to do.
>
> There are, probably unnecessary, comments to try to show intent.
>
> I want to be able to pass in a regex to find specific directories and
> a regex to find things in the text of numeric named files in those
> directories.
>
> The passing part will probably be done with getopts standard. Or just
> a shift of two expected args. That part is not what I'm asking about.
>
> I'm more concerned with how the code would plow through a directory
> hierarchy.
>
> This would be in a directory hierarchy that would contain many levels
> such as a News hierarchy, where each segment of a newsgroup name is a
> level that may contain many branches and possibly many thousands of single
> messages in each level and branch.
>
> A place where you might want to pass in the regex `linux\.' to search
> only the newsgroups below linux in the hierarchy, for the text_regex
> that might be in the files there.
>
> The idea being to allow you to focus a search without having to know
> the exact name of the newsgroup[s]. You would at least be searching a
> group with the string `linux.' in it.
>
> So what I'm curious about is if it would be good to `next' out if the
> File::Find::dir does not contain linux\.
>
> Like:
> next if(! $File::Find::dir =~ /$dir_rgx/);
No. Because you are inside a subroutine you have to use return:
return unless $File::Find::dir =~ /$dir_rgx/;
Or perhaps:
return if $File::Find::dir !~ /$dir_rgx/;
> or
> Like I've done in he code below. Just let the dir_rgx be a selector
> and not worry about pulling the next line immediately.
>
> I've thought about using `stat' to allow only directories into the
> first directory based test as a further way to help focus things. But
> not sure any of this will help speed things up.
$File::Find::dir will *only* contain directory names so such a test is
not needed.
John
--
The programmer is fighting against the two most
destructive forces in the universe: entropy and
human stupidity. -- Damian Conway
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: good practice in File::Find
"John W. Krahn" <jwkrahn [at] shaw.ca> writes:
>> Like: next if(! $File::Find::dir =~ /$dir_rgx/);
>
> No. Because you are inside a subroutine you have to use return:
>
> return unless $File::Find::dir =~ /$dir_rgx/;
>
> Or perhaps:
>
> return if $File::Find::dir !~ /$dir_rgx/;
>
Thanks.
>> or
>> Like I've done in he code below. Just let the dir_rgx be a selector
>> and not worry about pulling the next line immediately.
>>
>> I've thought about using `stat' to allow only directories into the
>> first directory based test as a further way to help focus things. But
>> not sure any of this will help speed things up.
>
> $File::Find::dir will *only* contain directory names so such a test is
> not needed.
This isn't the kind of test I meant. I'm talking about testing to see
if the directory matches a regex of directories to search.
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: good practice in File::Find
Harry Putnam <reader [at] newsguy.com> writes:
>> Or perhaps:
>>
>> return if $File::Find::dir !~ /$dir_rgx/;
>>
>
> Thanks.
>
>>> or
>>> Like I've done in he code below. Just let the dir_rgx be a selector
>>> and not worry about pulling the next line immediately.
>>>
>>> I've thought about using `stat' to allow only directories into the
>>> first directory based test as a further way to help focus things. But
>>> not sure any of this will help speed things up.
>>
>> $File::Find::dir will *only* contain directory names so such a test is
>> not needed.
>
> This isn't the kind of test I meant. I'm talking about testing to see
> if the directory matches a regex of directories to search.
Oh crap... sorry.. I didn't really read what statement of mine you
were referencing there... and you point is well taken.
A moments thought would have told me as much.
My script does test for the directory name as well... hence my self
induced inability to ready your reply.
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/