Short question:
Does anyone have C # code to parse robots.txt and then evaluate URLS against it to see if they will be excluded or not.
Long Term Question:
I am creating a sitemap for a new site not yet released by Google. The site map has two modes: user mode (for example, a traditional sitemap) and "admin" mode.
Admin mode will display all possible URLs on the site, including custom login URLs or URLs for a specific external partner - for example, example.com/oprah for those who see our site on Oprah. I want to track published links somewhere other than an Excel spreadsheet.
I should have suggested that someone could post the /oprah link on their blog or somewhere else. We actually do not want this “mini-site” to be indexed, because this would cause non-supporters to find Oprah special offers.
Therefore, at the same time, I was creating a site map. I also added URLs like /oprah to exclude robots.txt from our file.
Then (and this is the actual question), I thought: "It would be nice to show on the site map whether files for robots are indexed or not." It would be quite simple - just analyze robots.txt and then evaluate the link to it.
However, this is a “bonus feature”, and of course I do not have time to leave and write (I even thought it was probably not so difficult), so I was wondering if anyone had already written any code for parsing robots. txt?
c # robots.txt
Simon_Weaver
source share