# NOTES: Per Hostgator Bots are overloading server, causing TOS violations # Dealing With Google & Bing -- http://tinyurl.com/6q9s8wp # Yahoo ("slurp") and Bing ("msnbot")—support the "crawl-delay" directive # http://tinyurl.com/89m85oq # # Adjust the GoogleBot crawl-rate every 90 days: # https://www.google.com/webmasters/tools/home?hl=en # Current crawl rate set to 0.2 requests per second with 5 seconds between # requests -- for 90 days -- Reset on 12/17/2012 # # Adjust Bing bots here: http://tinyurl.com/Bing-Bots # # CRAWL-DELAY DIRECTIVE: To achieve compatibility with robots that # somewhat deviate from standard behaviour when processing robots.txt, # the Crawl-delay directive must be added to the group that starts from the # 'User-Agent' entry, right after the 'Disallow'('Allow') directive(s). # REF: http://help.yandex.com/webmaster/?id=1113851 | See Bottom of File # # Robots.TXT Checker: http://tool.motoricerca.info/robots-checker.phtml # Robots.txt Analysis: http://phpweby.com/services/robots # # ------------------------------------ # CODE BELOW BLOCKS MAJOR SPIDERS FROM # China, Japan, Korea, Russia ... # No Crawling. Last Resort: Block in .htaccess # ------------------------------------ # Baidu (CN) # Info: http://www.baidu.com/search/spider.htm # Info: http://www.baidu.com/search/robots_english.html # info: http://www.baidu.com/search/spider_english.html # Do Same For "images2" sub-domain as it fills up. # That is, create a robots.txt file, place in sub-domain root. User-agent: Baiduspider User-agent: Baiduspider-video User-agent: Baiduspider-image User-agent: BaiduMobaider User-agent: BaiduImagespider Disallow: / # Yandex (RU) # Info: User-agent: Yandex Disallow: / # Goo (JP) # Info (Japanese): http://help.goo.ne.jp/help/article/704/ # Info (English): http://help.goo.ne.jp/help/article/853/ User-agent: moget User-agent: ichiro Disallow: / # Naver (KR) # Info: http://help.naver.com/customer/etc/webDocument02.nhn User-agent: NaverBot User-agent: Yeti Disallow: / # SoGou (CN) # Info: http://www.sogou.com/docs/help/webmasters.htm#07 User-agent: sogou spider Disallow: / #Youdao (CN) #Info: http://www.youdao.com/help/webmaster/spider/ # # ------------------------------------ # User-agent: YoudaoBot Disallow: / User-agent: * Disallow: /styles/ Disallow: /cgi-bin/ Disallow: /javascripts/ Disallow: /images/ Disallow: /test/ Disallow: /go/ Disallow: /downloads/ Disallow: /audio/ Disallow: /video/ Disallow: /error_log.php # #sitemap: http://cdn.attracta.com/sitemap/119604.xml.gz # # Prevents spiders from crawling feeds and auxiliary pages [DUPLICATE CONTENT] # NOTE: Besides the major search engines, most crawlers don't support # wildcard matches and will most likely misunderstand or ignore them. # REF: http://www.frobee.com/robots-txt-check # REF: http://www.mcanerin.com/EN/search-engine/robots-txt.asp # Commented out below -- errors returned by validator. # #User-agent: * #Disallow: /blog/wp- #Disallow: /blog/search #Disallow: /blog/feed #Disallow: /blog/comments/feed #Disallow: /blog/feed/$ #Disallow: /blog/*/feed/$ #Disallow: /blog/*/feed/rss/$ #Disallow: /blog/*/trackback/$ #Disallow: /blog/*/*/feed/$ #Disallow: /blog/*/*/feed/rss/$ #Disallow: /blog/*/*/trackback/$ #Disallow: /blog/*/*/*/feed/$ #Disallow: /blog/*/*/*/feed/rss/$ #Disallow: /blog/*/*/*/trackback/$ Crawl-delay: 120 # # Set to the number of seconds to wait between successive requests to the same server. # REF: http://en.wikipedia.org/wiki/Robots_exclusion_standard#Crawl-delay_directive