file processing

Hello all ,
I have large text filw with lots of spaces and newline chararters in
it, which i want to remove.
And after that i need to construct the hash tables for the unique word
which are present in the file. Its like i need the hash for only
unigrams (one word at a time), a hash for bigrams (2 words at a time)
and same as for 3 words.
I am all lost in removing and accessing the spaces in the text file but

am not bale to access the each word at a time.
Just a simple example of what i need to do is:

if my text in file is :


hello how are you all hello how are.


so my unigrams will be like:
hello 2
how 2
are 2
you 1...


bigrams will be
hello how 2
how are 2
are you 1
you all 1


trigrams
hello how are 2
how are you 1
are you all 1
.....so on


Can anyone help me with this code.
-thanks
amit_h123 [ So, 23 April 2006 06:03 ] [ ID #1286257 ]

Re: file processing

amit_h123 [at] yahoo.co.in wrote (in alt.perl):
> Hello all ,
> I have large text filw ...

You posted the same job spec. in clpmisc a couple of hours ago, and I
suggested that you learn a programming language. Was that a bad idea?

Didn't you like the hints you were given by another poster either?

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
Gunnar Hjalmarsson [ So, 23 April 2006 09:36 ] [ ID #1286259 ]

Re: file processing

amit_h123 [at] yahoo.co.in wrote:
> I am all lost in removing and accessing the spaces in the text file but

That's super trivial - its one of the homework assignments given on
the first day on any respectable Perl class.

[at] words = split; # Where $_ contains the line of text

> And after that i need to construct the hash tables for the unique word

$unigram{$_}++ foreach [at] words;

> unigrams (one word at a time), a hash for bigrams (2 words at a time)
> and same as for 3 words.

my($prev1,$prev2) = ('','');
while (<>) {
[at] words = split;
foreach my $word ( [at] words) {
$unigram{$word}++;
$bigram{"$prev1 $word}++;
$trigram{"$prev2 $prev1 $word}++;
$prev2 = $prev1;
$prev1 = $word;
}
}

So what's the problem? It almost sounds as if you never heard of
the split() function, or how it works when given no arguments.

-Joe
Joe Smith [ So, 23 April 2006 09:37 ] [ ID #1286260 ]

Re: file processing

Hi,
thanks for the help. I knew the split but nt without the arguments.
This really helped.
amit_h123 [ So, 23 April 2006 14:32 ] [ ID #1286261 ]

Re: file processing

hello ,
But i still have one problem. Its like is there a way to access the
bigram hash values on the basis of trigram since i have to calculate
the value as :

for each key in trigarm: I have to do the following thing.
trigram{ hello how are} / bigram{hello how}

how do i access these values simultaneuosly..
any suggestions
amit_h123 [ So, 23 April 2006 15:19 ] [ ID #1286262 ]

Re: file processing

<amit_h123 [at] yahoo.co.in> wrote in message
news:1145798342.838627.116070 [at] i39g2000cwa.googlegroups.com.. .
> hello ,
> But i still have one problem. Its like is there a way to access the
> bigram hash values on the basis of trigram since i have to calculate
> the value as :
>
> for each key in trigarm: I have to do the following thing.
> trigram{ hello how are} / bigram{hello how}
>
> how do i access these values simultaneuosly..
> any suggestions
>

I suggest you learn to quote some context when posting so people have some
idea what you're talking about. Usenet is not a bulletin board, even if
*you* happen to be using google groups and see it that way.

Matt
Matt Garrish [ So, 23 April 2006 15:56 ] [ ID #1286263 ]

Re: file processing

amit_h123 [at] yahoo.co.in wrote (in alt.perl):
> Hello all ,
> I have large text filw ...

You posted the same job spec. in clpmisc a couple of hours ago, and I
suggested that you learn a programming language. Was that a bad idea?

Didn't you like the hints you were given by another poster either?

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
Gunnar Hjalmarsson [ So, 23 April 2006 09:36 ] [ ID #1286284 ]
Perl » alt.perl » file processing

Vorheriges Thema: How do I find if a module is installed ?
Nächstes Thema: Inverted syntax for an if conditional