I manage a website which is in fact a search engine for a specific branch of industry. Because most of the sites are dynamically accessed via form (where a user can input a keyword to search for), I thought of submitting a sitemap to Google to inform their crawlers about my pages.
There are two problems:
- I have 25 million pages (and I think the limit per sitemap is ~50 MB, which will be exceeded)
- I want to remove the sitemap after submit, because I did not want to be crawled by competitors
Should I submit a sitemap to Google or is it better to let the crawlers index the page by themselves?
First up, Google doesn't want to index search results pages - if you want to see an example of just how much Google doesn't want to go and look what happened to Giphy recently.
Putting that to one side, you can have multiple sites maps and a sitemap of sitemaps to help with indexing very large sites. You can also have multiple sitemap indexes, you get the idea.
You seem to be stuck in two schools of thought here - get pages indexed and found or don't get pages indexed and found just in case competitors see it. Pick one because as soon as Google finds your pages your competitors will be able to see it.
- I have submitted sitemaps specifically for Google via Search Console using a unique and unpredictable file name. This will keep competitors from seeing what they should not. As well, sitemaps only help sites that cannot be properly crawled. Most sites do not need one. It drives me nuts that the SEO industry is driving the myth and misleading too many people to do unnecessary work. Nice answer! Cheers!!
- Thank you for your answers. Nice tip that I can just randomize the name of the sitemap. I dont need the search result pages to be indexed, just the detail pages of the products. Example: A user searches for
Keywordand gets the search result page with teasers of products matching
keyword. From here he can click on
more details...and gets to the main product page (it works as f. e. amazon).
- One point is Google doesn't really index pages that are ONLY listed in a sitemap (know this from experience & discussion) ... pages need to be findable by crawling. So a sitemap in itself won't help all that much. But conversely removing the sitemap won't really hinder anything either (google will already have to extract the URLs from it). But because you need a crawlable website. Competitors could just crawl (ala scrape) your site the same was as Google does. (ie a sitemap isn't a magic way to just let Google crawl your site)