Linking> 100K Pages Without SEO

Question

Linking> 100K Pages Without SEO

I am creating a site where the privacy policies of hundreds of thousands of other sites on the Internet will be viewed. Its initial content is based on my launch of CommonCrawl 5 billion web dump pages and analysis of all privacy policies with a script to determine certain characteristics (for example, “Sells your personal information”).

According to SEO Guide for SEO :

Search engines tend to crawl only 100 links on any given page. This free restriction is necessary to prevent spam and maintain a rating.

I was wondering what would be a reasonable way to create a navigation network that will not leave a single page orphaned, but will still avoid this fine for the SEO they are talking about. I have a few ideas:

Create alphabetical pages (or Google Sitemap.xml), such as "Sites Starting With Ado *." And this, for example, will link "Adobe.com". This or any other pointless split of pages seems to be far-fetched, and I wonder if Google might not like it.
Use meta keywords or descriptions to categorize
Find a way to apply more interesting categories, such as geographic or content-oriented. It bothers me that I'm not sure how I can apply such categories across all sites on so many sites. I believe that if necessary, I could write another classifier to try to analyze the contents of the pages from the crawl. Sounds like a lot of work in and of itself, though.
Use the DMOZ project to help categorize pages.

Wikipedia and StackOverflow clearly solved this problem very well by allowing users to classify or flag all pages. In my case, I do not have such luxury, but I want to find the best option.

At the heart of this question is how Google responds to various navigation structures. Does this punish those who create web pages programmatically / meaninglessly? Or does it not bother as long as everything is connected via links?

+7

seo web web-crawler

babonk May 07 '12 at 17:23

source share

1 answer

Jason · Accepted Answer · 2012-05-10T02:06:58+0000

Google PageRank does not punish you for having> 100 links on a page. But each link above a certain threshold decreases in value / importance in the PageRank algorithm.

Quoting SEOMOZ and Matt Cutts:

Could you be punished?
Before we go deeper, I want to make clear that the 100-link limit has never been a punishment. In an August 2007 interview, Rand quotes Matt Katz:
"Keep the number of links to 100" is in the technical section of the manual, not the section of the quality manual. This means we are not going to delete the page if you have 101 or 102 links per page. Think about it more, as a rule.
At that time, it was likely that Google began to ignore links after a certain point, but in the worst case, this kept these post-100 links from PageRank. The page itself will not be indexed or punished.

So the question is how to make Google take all of your links seriously. You achieve this by creating an XML map for Google to crawl (you can either have a static sitemap.xml file, or its contents can be dynamically generated). You’ll want to read the About Sitemaps section of the Google Webmaster Tools reference documents.

Just as there are too many links on the page, the problem with too many links in the Sitemap for XML is also a problem. What you need to do is split the sitemap into XML. Jeff Atwood talks about how StackOverflow implements this: The Importance of Sitemaps . Jeff also discusses the same issue in the StackOverflow # 24 podcast .

In addition, this concept applies to Bing .

Linking> 100K Pages Without SEO

More articles: