Home > Paid Traffic Sources > POP / PPV / Redirect

Any tools that can scrape every page of a site???? (7)


05-09-2014 09:06 AM #1 lewis69 (Member)
Any tools that can scrape every page of a site????

Basically I'm looking for a tool that can scrape every single page on a site. The site I'm wanting to target has a lot of pages x,xxx,xxx indexed in Google is there anything out there that can do this already or is it something I would need to get made?

Any and all help appreciated


05-09-2014 10:37 AM #2 angry old lady (Member)

winHTTrack should do the job, shouldnt it?



if not, wget should do the trick using --mirror

Code:
$ wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL
–mirror : turn on options suitable for mirroring.
-p : download all files that are necessary to properly display a given HTML page.
–convert-links : after the download, convert the links in document for local viewing.
-P ./LOCAL-DIR : save all the files and directories to the specified directory.


05-09-2014 11:17 AM #3 richierich (Member)

Code:
wget -pkmr http://ripemeoff.com


05-09-2014 06:00 PM #4 caurmen (Administrator)

If the site is dynamic, you can often achieve a lot using a library called Nokugiri on top of Ruby. You'll still need a script coded, but that combination reduces the coding time down to about a day for a reasonably complex bit of scraping from what's otherwise potentially quite a complex job.

If the site's really heinously complicated to scrape, Selenium is your friend, but that starts to get into Deep Coding Magic pretty quickly.


05-12-2014 08:24 PM #5 MrClean (Senior Member)

scrapebox will do the job


05-14-2014 10:43 AM #6 johnnygood (Member)

Wondering if scraping sitemap would do the job with scrapebox?

Thanks,


05-18-2014 01:46 PM #7 bbrock32 (Administrator)

The Affexpert domain expander tool is what you looking for


Home > Paid Traffic Sources > POP / PPV / Redirect