The Crawl Robot Set-up fieldset is used to provide websites that you crawl with information about who is crawling them.
  • The field Crawl Robot Name is used to the USER-AGENT header sent by your robot. It has the format:
    <code>
     Mozilla/5.0 (compatible; NAME_FROM_THIS_FIELD; YOUR_SITES_URL/bot)
    
    </code>
    The value sent will be common to all fetcher traffic from the same queue server on the site when downloading webpages.
    If you are doing crawls using multiple queue servers you should give the same value to each queue server. The value of YOUR_SITES_URL comes from the Server Settings - Name Server URL field.
  • The Robot Instance field is used for web communication internal to a single yioop instance to help identify which queue server or fetcher under that queue server was involved. This string should be unique for each queue server in your Yioop set-up. The value of this string is written when logging requests between fetchers and queue servers and can be helpful in debugging.
  • The Robot Description field is used to specify the Public bot wiki page. This page can also be accessed and edited under Manage Groups by clicking on the wiki link for the Public group and then editing its Bot page. This wiki page is what's display when someone goes to the URL:
    YOUR_SITES_URL/bot

The point of this page is to give web owners both contact info for your bot as well as a description of how your bot crawls web sites.
X