Understanding and Managing Bots on Your Website
Have you ever considered the traffic on your website that isn’t human? Every day, hundreds of robots visit and navigate through your website. Let’s delve into the details.
What is a Bot (aka Crawlers)?
A bot, short for robot, is generally described in the computer world as software that performs a given task. The bots we’ll be discussing today are internet bots programmed to perform specific tasks and visit our website daily.
What Do Bots Do?
Bots can have various roles on internet sites. While some bots analyze our content regularly, others may be present to measure our website’s performance. Here are some of the most common tasks performed by bots:
Content Analysis
One of the most popular tasks for bots is content analysis, which can serve various purposes. For example, search engine bots from Google, Bing, Yandex, and others periodically visit your website to analyze content and include it in their search results according to their algorithms. Similarly, when you share a link on social media, you immediately see a preview box of the site; this happens because the social media platform’s bot visits and extracts data from the shared site. Content extraction is not limited to these examples and can be used for many different purposes, but these are among the most popular reasons for content analysis.
Website Performance Analysis
Bots may also visit your site to gather metrics and measure its performance. Examples include bots from Observer or Google PageSpeed.
Website Testing
Another purpose of bots is to test specific functions on websites. These bots are usually prepared by the website’s software team according to scenarios. For instance, if our site has a login screen, we can assign our bot to test login scenarios. The bot will act like a human and perform the given task.
Other Purposes
Of course, the tasks that a bot can perform are not limited to these categories; bots can be developed for hundreds of different scenarios serving various purposes.
Distinguishing Between Different Bots
Just as almost everything has an identity, bots carry an identity too, revealed in every request they make through the User-Agent. The User-Agent, simply put, is how our browsers introduce themselves to a website, and we humans carry User-Agents too. Typically, a User-Agent includes information like the browser version and operating system. Below are some examples for you to review.
Observer Uptime Bot
UptimeBot-Observer (+https://siteobserver.co)
Observer Browser Bot
BrowserBot-Observer (+https://siteobserver.co)
GoogleBot Desktop
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/W.X.Y.Z Safari/537.36
As seen in the examples above, well-intentioned bots do not hide their identities, making it easy to distinguish them from one another. However, some bots might hide their identities. Let’s examine this in detail.
Are Bots Safe?
Although the bots and types of bots mentioned above are safe and necessary for our website, the internet is filled with many more harmful bots that, as you might guess, hide their identities.
Malicious bots can create spam memberships on your website, produce spam content if your site generates content, and disrupt or halt your site’s operation by overwhelming it with requests. Of course, there are ways to significantly block these bots.
Blocking Unwanted Bots
There are several methods to protect your site from bots and attacks. Let’s explore some of the popular ones:
Using Cloudflare
Cloudflare is a free tool that can be set up as a security layer in front of your website. It filters incoming requests through its layer, protecting your site from both DDoS attacks and bots. Generally, installing the default setup significantly reduces bots and attacks. However, if you set Cloudflare to be more aggressive, you might even block beneficial bots like the Observer bot, which we will discuss later. You can use Cloudflare by visiting their website (https://www.cloudflare.com).
Using Captcha
You’ve likely encountered “I am not a robot” tests while browsing the internet. We call these captchas. Captchas are software designed to differentiate bots from humans based on certain metrics, especially when interacting with websites (logging in, commenting, making appointments, etc.). One of the most popular captcha services today is Google’s free reCAPTCHA service (https://www.google.com/recaptcha/about/). By using this service, you can secure forms and queries on your website.
Other Methods and Conclusion
There are other ways to block bots besides those mentioned above. If you cannot resolve your issue with these methods, we recommend seeking support from professionals. However, it’s important to remember that even though we can significantly block bots, there might still be bots that can bypass all our security measures. With today’s technology, we can’t block malicious bots 100%, but we can significantly reduce the damage they can cause.
Managing Your Website’s Bot Traffic
Managing the bot traffic on your website is crucial, especially when using an application like Observer. Sometimes, you might want to allow certain bots access. To do this, you should allow their IP address or the User-Agent information we discussed earlier in your firewall. Let’s walk through an example using Cloudflare.
Allowing Observer Bot Access Through Cloudflare
Remember, the Observer bot has its identity. If we want to permit the Observer bot to access our site through Cloudflare, we should follow these steps:
- Navigate to Security -> WAF from the left menu.
- Click on the “Create Rule” button.
- In the new window:
Rule Name: Choose any name for the rule.
Field: User-Agent
Operator: Contains
Value: siteobserver.co
Then take action: Skip
WAF components to skip: Check all
By making these settings, we’ve allowed all bots containing “siteobserver.co” in the “User-Agent” field to access our site. Observer will be able to access your site after this configuration. The setup would be similar for services other than Cloudflare.
Resources
- The most active crawlers and bots on the web (https://deviceatlas.com/blog/most-active-crawlers-list)
Select free plan to meet or have a powerful assistant that includes all our features.