A Brief Description of FindCanBot
How to identify FindCanBot
Presumably, you arrived at this site because you noticed traffic from a User-Agent that identified itself with the string:
Mozilla/5.0 (compatible; FindCanBot +https://findcan.ca/bot.php)
If the IP Address was also 18.104.22.168 to 78, then you have come to the right place to find out about who was probably crawling your site.
If it was a different IP address then someone else is hijacking my crawler's name.
Who runs FindCanBot
FindCanBot is run by Allan Pollett and Chris Pollett using technology developed at seekquarry.com
How FindCanBot crawls a site
The FindCanBot is currently run sporadically (not continuously). Each machine in a crawl has about four fetcher processes. Each fetcher has open at most 100-300 connections at any given time. In a typical situation, these connections would not all be to the same host.
How to change how FindCanBot crawls your site
The FindCanBot does understand robots.txt (it has to be robots.txt not robot.txt ) files and it also obeys X-Robots-Tag HTTP headers, html meta tag noindex and nofollow, as well as anchor rel="nofollow" directives. FindCanBot further understands the Crawl-delay and Google and Bing * and $ syntax within Allow and Disallow line extensions to the robots.txt standard. If you want to restrict FindCanBot's access to your site the easiest way is to just add a directive for it to follow in your robots.txt file. For example, in your document root you could put a robots.txt file with lines like:
Of course, if you have general robot directives using expressions like "User-Agent: *", these will be understood by FindCanBot as well. FindCanBot caches the robots.txt file for 1 day. They use the cached directives rather than re-requesting the robots.txt file for 24 hours before making a new request of the robots.txt file again. So if you change your robots.txt file it might take a little while before the changes are noticed by
If you have any questions about FindCanBot, please feel free to contact (email@example.com).