Intex auto pool cleaner
What is the best initial or general setup for the robots.txt to allow search engines to go through the site, but maybe restrict a few folders?
Is there a general setup that should always be used?
- 1 It's 'robots.txt' (plural)
- Thanks. I have fixed the title and question to reflect this.
Google Webmaster tools has a Section called 'Crawler access'
This section allows you very easily to create your robots.txt
For example to allow everything except blog a folder called test your robot.txt would look something like
User-agent: * Disallow: /Test Allow: /
- Make sure you also follow the link in Jason's answer for more information. webmasters.stackexchange.com/questions/89/…
- 1 There is no
Allowdirective in the original robots.txt standard. Some crawlers now understand that, but most don't. Since the default is crawling allowed, that line can just be omitted.
The best configuration, if you don't have any special requirements, is nothing at all. (Although you may at least want to add a blank file to avoid 404s filling up your error logs.)
To block a directory on the site, use the 'Disallow' clause:
User-agent: * Disallow: /example/
There is also an 'Allow' clause which overrides previous 'Disallow' clauses. So if you've disallowed the 'example' folder you may wish to allow a folder like 'example/foobar'.
Remember that robots.txt doesn't prevent anyone visiting those pages if they want to, so if some pages should remain secret you should hide them behind some kind of authentication (i.e. a username/password).
The other directive that is likely to be in many robots.txt files is 'Sitemap', which specifies the location of your XML sitemap if you have one. Put it on a line on its own:
The official robots.txt site has lots more information on the various options. But in general, the vast majority of sites will need very little config.
Here's everything you need to know about the robots.txt file
- This link only answer is not very useful compared to other much better answers here.
You can use google webmaster tool to do this. Google webmaster tool is very helpful to create robot.txt
- 1 The accepted answer already says to use Google Webmaster Tools. It has more detail as well such as which section to use and an example robots.txt file. When posting an additional answer, you need to add something above and beyond the existing answers. Even if this were the only answer, it still isn't very high quality. A better answer would have a couple paragraphs and some links for reference.