Can I use WGET to create a site site based on its URL?

I need a script that can host a website and return a list of all bypass pages in text or similar format; which I will send to search engines as a sitemap. Can I use WGET to create a website site? Or is there a PHP script that can do the same?

+7
php web-crawler bots wget
source share
2 answers
wget --spider --recursive --no-verbose --output-file=wgetlog.txt http://somewebsite.com sed -n "s@.\+ URL:\([^ ]\+\) .\+@\1@p" wgetlog.txt | sed "s@&@\&@" > sedlog.txt 

Creates a file called sedlog.txt that contains all the links found on the specified website. You can use PHP or a shell script to convert a sitemap into a text sitemap. Change the parameters of the wget command (accept / reject / include / exclude) to get only the links you need.

+30
source share

You can use this perl script to do the trick: http://code.google.com/p/perlsitemapgenerator/

+1
source share

All Articles