Scan web pages and compose summary

Hello.

I am looking for a way to read html file and create
a short summary (like that shows in google results for example)
which ought to be the first few lines of welcome text or so.

Does any got any idea on how to do this? (I searched allot,
but all I found was simply extracting meta tags).

Thanks
solk [ Do, 17 Januar 2008 21:48 ] [ ID #1910787 ]

Re: Scan web pages and compose summary

Well, the tricky part is that you'll need to decide what text to grab
and show from the file - which is why there's a meta description tag
for the purpose. I believe google grabs the text surrounding a search
term and displays that if there's no meta description tag to use - so
if you're actually searching for a term you could do something like
that.

---
www.NEXCESS.NET - Shared/Reseller Hosting
www.EliteRax.com - Dedicated Servers, Server Clusters
www.MaxVPS.com - Virtual Private Servers
- Great prices, Great service - check us out!

On Jan 17, 3:48 pm, solk <rikibl... [at] gmail.com> wrote:
> Hello.
>
> I am looking for a way to read html file and create
> a short summary (like that shows in google results for example)
> which ought to be the first few lines of welcome text or so.
>
> Does any got any idea on how to do this? (I searched allot,
> but all I found was simply extracting meta tags).
>
> Thanks
adwatson [ Do, 17 Januar 2008 23:11 ] [ ID #1910794 ]

Re: Scan web pages and compose summary

Hello,

solk wrote:
> Hello.
>
> I am looking for a way to read html file and create
> a short summary (like that shows in google results for example)
> which ought to be the first few lines of welcome text or so.
>
> Does any got any idea on how to do this? (I searched allot,
> but all I found was simply extracting meta tags).
>
> Thanks

I can recommend Snoopy (http://snoopy.sourceforge.net/). It is able to
retrieve an entire web page, follow links and so on. The result will be
the HTML source output you can see if you do a view source in your web
browser. From there you can strip HTML tags, use substr() to jump to
certain sections in the source (eg: jump to right after the body tag,
remove all HTML tags and save the text output).

- Jensen
Jensen Somers [ Fr, 18 Januar 2008 11:51 ] [ ID #1911504 ]
PHP » comp.lang.php » Scan web pages and compose summary

Vorheriges Thema: How to submit a form when user presses Enter key
Nächstes Thema: Flash - Problem to call an mp3 file from a swf file