Author: Jamie Hoyle
Date: June 27, 2025
🧠 Summary / Purpose
This guide outlines how to configure Google Tag Manager (GTM) and Google Analytics 4 (GA4) to identify and exclude visits from specific crawler user-agents, such as MirrorWeb's. This ensures crawler traffic does not skew your analytics reporting.
🔍 Use Case / Scenario
Organizations that use crawlers (like MirrorWeb) to archive or scan their websites may see inflated or misleading metrics in GA4 if these visits are not filtered out. This process allows you to label that traffic as "crawler" and permanently exclude it from GA4 reporting.
🛠️ Step-by-Step Instructions
🔐 Prerequisites
Editor access to your GA4 property
Access and publishing rights for the GTM container that handles GA4 tagging
1. Create a Custom JavaScript Variable in GTM
In GTM, go to Variables ▸ User-defined variables ▸ New.
Select Custom JavaScript and paste the following code:
function () { var ua = navigator.userAgent || ''; return /mirrorweb/i.test(ua) ? 'crawler' : undefined; }Name the variable:
cjs – traffic_type (crawler)Click Save
This function returns the string crawler only when the user-agent contains "mirrorweb".
2. Pass the Variable to GA4 as a Parameter
Open your GA4 Configuration tag in GTM
Under Configuration Parameters, click Add Parameter:
Field / Parameter Name:
traffic_typeValue:
#{{cjs – traffic_type (crawler)}}
Save and Publish the container after testing
3. (Optional) Define a Rule in GA4 for Organization
In Google Analytics, navigate to:
Admin ▸ Data Streams ▸ Web stream ▸ Configure tag settings ▸ Show all ▸ Define internal traffic ▸ Create
Set:
Rule name: Crawler traffic
traffic_type value:
crawlerLeave IP conditions empty
Click Create
This step helps organize internal filters but is not mandatory.
4. Create a Data Filter in GA4 to Exclude Crawler Traffic
Navigate to Admin ▸ Data Filters ▸ Create Filter
Set the following:
Filter type: Internal traffic
Filter name: Exclude crawler traffic
Filter operation: Exclude
traffic_type equals:
crawlerFilter state: Start with Testing, verify results, then change to Active
Click Create
⚠️ Once active, GA4 will permanently exclude traffic labeled with traffic_type=crawler.
5. Validate and Monitor
Publish your GTM container
Wait for the crawler to revisit the site (MirrorWeb crawlers usually scan daily)
Use DebugView in GA4 to verify that the
traffic_type=crawlerparameter is being sentAllow 24–36 hours for the exclusion filter to take full effect