Google Custom Search Underway

Question

Google Custom Search Underway

I am doing a redesign for a client. On the new site, I would like to use Google Custom Search (CSE) as a search engine. While I'm in development, I can't let Google index the new site, as it would be a terrible mess of duplication and half-past pages floating in the Googles index.

So, how can I check and refine the search result from Google CSE on my development site before launch?

Thanks Daniel

+4

google-custom-search google-cse

Daniel Hedenström May 02, '13 at 12:48

source share

3 answers

styks · Answer 1 · 2013-08-19T17:48:58+0000

Your sites must be added to your webmaster tools account. After I added my test sites, I was able to crawl the site search pages using the robots.txt file in the index, which prohibits site searches.

I managed to crawl the page, but for verification I checked the URL in the webmaster tools and it says the page is still hidden from the main google index. I added the same URL to the custom search index, and if I found it just fine.

Thus, this will allow you to search your test site, but not hide it from the world search.

enter image description here

Jonk · Answer 2 · 2016-11-09T14:12:16+0000

As of November 2016, this is still not possible. I understand that this is years after the question was asked, but I tried to do it. This is the (disappointing) response I received from a request to Google support.

A Google search will only return URLs that:
Added site to search for configuration and
indexed
I would like to inform you that GSS is hosted on Google infrastructure and uses the same technology as Google.com. it’s not possible to get pages indexed in GSS, but not in the Google Basic Index.
GSS can only index and scan documents that are publicly available and accessible over the Internet.
GSS and Google.com use the same crawler and the same index server. So if you block access to google.com, your pages will not be indexed and also served in the GSS results.

fotanus · Answer 3 · 2013-05-02T12:51:42+0000

You can avoid some indexing pages with robots

Website owners use the /robots.txt file to provide instructions on their website to web robots; this is called the "Robot Exclusion" protocol.
I like it: the robot wants to display the URL of the website, say, http://www.example.com/welcome.html . Before he does this, he first checks http://www.example.com/robots.txt and finds:

User-agent: * Disallow: /

"User-agent: *" means that this section applies to all robots. "Disallow: /" tells the robot that it should not visit any pages on the site.

Looking at the documents, I can not find anything about the robots.txt file, so I'm not sure that it is followed. but looking at the docs it says you can delete it manually or set the expiration date on sitemap.xml

Google Custom Search Underway

More articles: