How to prevent Httrack from downloading the same file again?

I am using httrack to download this website: http://4minutearticles.com/

However, the problem is that the author refers to the main page on each page of his website.

For example http://4minutearticles.com/ext/

Link from the parent directory Redirects to the home page and the software starts downloading again

How to prevent this cycle?

+4
source share
3 answers

Read the answer to the question at the link below:

"I have duplicate files! What is happening?"

Link: http://www.httrack.com/html/faq.html#Q1b11

Also look at "Filters: Advanced" at the following link:

http://www.httrack.com/html/filters.html

This can help you with your problem.

+3
source

You can use filters to stop HTTRACK from loading the same files or folders. You can do this by clicking the "Set Options" button in front of the "Settings and Mirror" label, then open the "Scan Rules" tab and then the "Exclude Links" button to set the rules as you like.

+1
source

This generally applies to superscripts (index.html and Index-2.html).

This is a common problem, but this cannot be avoided!

For example, http://www.foobar.com/ and http://www.foobar.com/index.html can be identical pages. But if the links on the site link to both http://www.foobar.com/ and http://www.foobar.com/index.html , these two pages will be caught. And also because http://www.foobar.com/ must have a name, since you may want to browse the website locally (the directory / will provide a list of directories, NOT the index itself!), HTTrack should find it. Therefore, two index.html will be created, one with -2 to indicate that the file should have been renamed.

You might think that http://www.foobar.com/ and http://www.foobar.com/index.html are the same links to avoid duplicate files, right? NO, because the superscript (/) can refer to ANY filename, and if index.html is usually the default name, index.htm can be selected, or index.php3, mydog.jpg, or anything you can imagine . (some webmasters are really crazy)

Note. In some rare cases, duplicate data files can be found when a website is redirected to another file. This problem should be rare and may avoid the use of filters.

See also: Project update

0
source

All Articles