I am creating a site where the privacy policies of hundreds of thousands of other sites on the Internet will be viewed. Its initial content is based on my launch of CommonCrawl 5 billion web dump pages and analysis of all privacy policies with a script to determine certain characteristics (for example, “Sells your personal information”).
According to SEO Guide for SEO :
Search engines tend to crawl only 100 links on any given page. This free restriction is necessary to prevent spam and maintain a rating.
I was wondering what would be a reasonable way to create a navigation network that will not leave a single page orphaned, but will still avoid this fine for the SEO they are talking about. I have a few ideas:
- Create alphabetical pages (or Google Sitemap.xml), such as "Sites Starting With Ado *." And this, for example, will link "Adobe.com". This or any other pointless split of pages seems to be far-fetched, and I wonder if Google might not like it.
- Use meta keywords or descriptions to categorize
- Find a way to apply more interesting categories, such as geographic or content-oriented. It bothers me that I'm not sure how I can apply such categories across all sites on so many sites. I believe that if necessary, I could write another classifier to try to analyze the contents of the pages from the crawl. Sounds like a lot of work in and of itself, though.
- Use the DMOZ project to help categorize pages.
Wikipedia and StackOverflow clearly solved this problem very well by allowing users to classify or flag all pages. In my case, I do not have such luxury, but I want to find the best option.
At the heart of this question is how Google responds to various navigation structures. Does this punish those who create web pages programmatically / meaninglessly? Or does it not bother as long as everything is connected via links?
babonk
source share