What are bots?
👉 Bots are software applications that run automated tasks over the Internet. They are used to index internet content or to automatically gather information from websites.
Some bots work for legit purposes, whereas some collect data for malicious purposes, such as:
- Content reselling
- Click generation
- Price undercutting
Like any client-based web solution, Didomi is impacted by the bot traffic that generates “false” data. As a consequence, it can generate inaccurate CMP analytics.
Impact on CMP Analytics Indicators
The most impacted metric is the total notices (with an increase in volume), which directly inflates the notice bounce rate and addressability rate performance indicators.
Provide analytics data without bots
👉 Bots impact Web data, so they generate false user data. They deteriorate the addressability rate, as well as the pageview consent rate by increasing the volume of notice bounces and the number of pageviews without consent.
In order not to deteriorate the compliance of your reports, we advise you not to exclude all UA (user agents). These UAs can be hiding bots, but also users who have given their consent.
In this case, excluding UAs represents both a compliance and legal risk.
There are two types of bots:Declared Bots: they can be detected thanks to their user agent (UA). They are excluded with the user agent filtering method. A few examples of bots:
- Scraper bots: programmed to capture the content offline, such as names, prices, and product details on e-commerce websites.
- Crawler bots: used by large companies, such as Google, Yahoo etc, for content indexing purposes.
- Performance/audit bots: used by website performance tools to perform SEO audit or to evaluate page loading time performance. Didomi also uses a bot to evaluate the compliance of websites.
Hiding Bots: they use standard user agents and therefore can’t be identified with the UA filtering method.
A specialized solution/technology is required to detect then to exclude them from analytics data.
Example of user agents
- Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) TagInspector/500.1 Chrome/90.0.4430.72 Safari/537.36 Edg/90.0.818.42
- Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/85.0.4183.102 Safari/537.36
- Mozilla/5.0 (iplabel; Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36
Elements that are not part of a standard user agent.
Hiding Bot User agents
- Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36
- Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36 Edg/91.0.864.64
Even if the user agents above are used by bots, they are also used by regular visitors: user agents can’t be excluded.
Be careful with your own bots
If you are using tools to evaluate the performance of your website: page loading time, SEO audit, etc.
They probably use bots to do it. As a consequence, they generate data if they are not identified by our technology. You can:
- Check the bots we detect (see the list below).
- Verify with your solutions if the bots have a UA pattern.
- Add the patterns in your bot management custom feature.
Behavior of the CMP with Bots
⚙️ By default, (when the box "Bots" is not ticked on the console), we consider that consent is already given for robots and all scripts are fired, along with consent events. The banner is not deployed and doesn't collect any consent from the robot.
As long as the box is not ticked, the consent notice won't be deployed, and there won't be any storing of consent by the bot (negative consent = refusal).
If you need to collect consent for bots:
- Go to your consent notice.
- Go to the BEHAVIOR tab.
- Check the bot in the BOTS section. By default, it is unchecked.
- Click on SAVE & PUBLISH in the top right of your screen in order to confirm your choices.
The banner is deployed for bots but it doesn't collect any consent: there is just a consent notice with the consent string by default. No consent is collected, but the bot won't be able to browse the website.
Custom bot management, bypass consent collection for bots
👉 You can directly customize the bot management with custom json in your SDK implementation.
The features offers the following capabilities:
- Defining the category of bots to block
- Adding user agent patterns (terms) for exclusion purposes
Here are all the details in the developer documentation.
Didomi’s bot list
👉 +90 bots are automatically detected at the CMP level and during data cleaning processing. Below the lists of the bot patterns (terms) used to identify the bot traffic. All the visitors with a user agent containing the following terms are identified as bots.
Googlebot, adsbot, feedfetcher, mediapartners, bingbot, bingpreview, slurp, linkedin, msnbot, teoma, alexabot, exabot, facebot, facebook, twitter, yandex, baidu, duckduckbot, qwant, archive, applebot, addthis, slackbot, reddit, whatsapp, pinterest, moatbot, google-xrawler, NETVIGIE, PetalBot, PhantomJS, NativeAIBot, Cocolyzebot, SMTBot, EchoboxBot, Quora-Bot, BLP_bbot, MAZBot, ScooperBot, BublupBot, Cincraw, HeadlessChrome, diffbot, Google Web Preview, Doximity-Diffbot, Rely Bot, pingbot, cXensebot, PingdomTMS, AhrefsBot, semrush, seenaptic, netvibes, taboolabot, SimplePie, APIs-Google, Google-Read-Aloud, googleweblight, DuplexWeb-Google, Google Favicon, Storebot-Google, TagInspector, Rigor, Bazaarvoice, KlarnaBot, pageburst, naver, iplabel, plus generic terms like “robot”, “scraper”, “crawler”, “spider”, “crawling” and “oncrawl”.
Chrome-Lighthouse, gtmetrix, speedcurve, DareBoost, PTST, StatusCake_Pagespeed_Indev.
Bot management diagram
(1) SDK is loaded
(2) Notice triggering rules verification:
- SDK scans the user agent to identify if it’s a bot or not.
- If a bot is detected, the behavior of the notice is defined by the notice config (trigger or not the notice).
- If the visitor is not labelled as a bot, the notice is triggered.
(3) CMP events (notice display) are triggered
(4) Data Processing (turn events into analytics)
👉 All the events (data) collected from (identified) bots are excluded from the analytics, even if the notice has been displayed to the bot on purpose.
(5) Analytics data is displayed in the dashboards
Bot protection tools
Some solutions are specialized in bot detection and protection. They protect your website from bot traffic.
As these solutions detect bots before they reach the website (see drawing), they can prevent the bot to load any page and therefore prevent for impacting the CMP analytics data.
For more information, see solutions such as Datadome, Human, Cloudflare, Netacea, etc.