Disallowed to Crawl Sites are urls or domains (listed one-per-line) that Yioop should not crawl.

A line like:

  http://www.somewhere.com/foo/

would disallow the url

  http://www.somewhere.com/foo/goo.jpg

to be crawled.

A line like:

 domain:foo.com

would disallow the url

  http://a.b.c.foo.com/blah/

to be crawled.
It is also possible to disallow a site using a regular expression:

 regex:/foo\d+/

would disallow any url containing the string "foo" followed by 1 or more digits.

Sites with Quotas are urls or domains that Yioop should at most crawl some fixed number of urls from in an hour. These are listed in the same text area as Disallowed to Crawl Sites. To indicate the quota one lists after the url a fragment #some_number. For example,

  http://www.yelp.com/#100

would restrict crawling of urls from Yelp to 100/hour.