Disallowed to Crawl Sites are urls or domains (listed one-per-line) that Yioop should not crawl.

A line like:
  http://www.somewhere.com/foo/
would disallow the url
  http://www.somewhere.com/foo/goo.jpg
to be crawled.

A line like:
 domain:foo.com
would disallow the url
  http://a.b.c.foo.com/blah/
to be crawled.
It is also possible to disallow a site using a regular expression:
 regex:/foo\d+/
would disallow any url containing the string "foo" followed by 1 or more digits.
Sites with Quotas are urls or domains that Yioop should at most crawl some fixed number of urls from in an hour. These are listed in the same text area as Disallowed to Crawl Sites. To indicate the quota one lists after the url a fragment #some_number. For example,
  http://www.yelp.com/#100
would restrict crawling of urls from Yelp to 100/hour.
X