Robots Dot Text plugin

Plugin details

A plugin for dynamically creating your robots.txt file with Ruby code and logging crawler user-agents

Websitehttp://handyrailstips.com/ Repositorygit://github.com/GavinM/robots_dot_text.git Author Gavin Morrice Tags robot LicenseMIT

Documentation

Install the plugin:
ruby script/plugin install git://github.com/GavinM/robots_dot_text.git

Setup Instructions
==================
1. script/plugin install http://github.com/GavinM/robots dot text.git
2. remove robots.txt from your /public directory
3. create a controller for your robots with an action called index

      script/generate controller robots index
   


4. add a route in routes.rb to your robots index action:

      map.connect "robots.txt", :controller => "robots"
   



Examples:
==================

Simple example
---------------------

    
class RobotsController < ActionController::Base

  def   index
    respond_to do |format|
      format.text do
        log_user_agent # adds the crawler's user_agent to user_agents.log
        @page_content = robots dot text do |rules|
          rules.comment "Tell all crawlers to keep out of these pages"
          rules.add :all, admin_path, customers_path, log_path
          rules.br
          rules.sitemap sitemap_url
        end
        render :text => @page_content, :layout => false
      end
    end
  end

end



will render:

# Tell all crawlers to keep out of these pages
User-agent: *
Disallow: /admin
Disallow: /customers
Disallow: /log

Sitemap: http://handyrailstips.com/sitemap.xml



Complex Example
----------------------

    
class RobotsController & ActionController::Base

  def   index
    respond_to do |format|
      format.txt do
        log_user_agent(:short, logger) # :short is the datetime format, logger specifies to use Rails.logger instead
        @page_content = robots dot text do |rules|
          rules.add :all
          rules.sitemap sitemap_url, google_news_sitemap_url
          rules.br
          rules.comment "Google ignores most directives so here are some rules for Google"
          rules.add [:google, :google_image, :google_mobile]
          rules.allow article_path("*")
          rules.block articles_path
          rules.line_break
          rules.comment "These crawlers respect the Crawl-delay directive"
          rules.add [:yahoo, :msn, :cuil, :ask], private_path, admin_path
          rules.rate "1/500s"
          rules.delay 10
          rules.comment < @page_content, :layout => false
      end
    end
  end

end



will render:

User-agent: *
Sitemap: http://handyrailstips.com/sitemap.xml
Sitemap: http://handyrailstips.com/google_news_sitemap.xml

# Google ignores most directives so here are some rules for Google
User-agent: Googlebot
User-agent: Googlebot-Image
User-agent: Googlebot-Mobile
Allow: /articles/*
Disallow: /articles

# These crawlers respect the Crawl-delay directive
User-agent: Slurp
User-agent: MSNBot
User-agent: Twiceler
User-agent: Teoma
Disallow: /private
Disallow: /admin
Request-rate: 1/500s
Crawl-delay: 10
# Request robots only crawl between 2am and 8am.
# (Those are our quiet times)
Visit-time: 0200-0800

Further Documentation

There is currently no advanced documentation for this plugin.

New documentation

Edit plugin | (0 older versions) | Last edited by: hardway, about 1 year ago