The hidden tricks websites use to catch scraping bots

A deep dive into the fingerprints, leaks, and behavioral signals websites use to unmask automation.

Aug 19, 2025

Websites today don’t just look at what requests are coming in — they look at how the browser behaves. Traditional signals like IP addresses or cookies are no longer enough to separate humans from automation. Instead, detection systems combine low-level browser fingerprints with behavioral analysis to build a unique profile of each visitor.

These methods rely on subtle differences in how a browser renders graphics, handles audio, reports system settings, or even how a user moves their mouse and types on a keyboard. For scraping bots, it’s no longer just about fetching data — it’s about imitating normal browsing behavior without leaving signs of automation.

Browser automation detection methods

Canvas 2D fingerprint

The Canvas API, which is designed for drawing graphics via JavaScript and HTML, can also be used for online tracking via browser fingerprinting. This technique relies on variations in how canvas images are rendered on different web browsers and platforms to create a personalized digital fingerprint of a user's browser.

The way an image is rendered on a canvas can vary based on the web browser, operating system, graphics card, and other factors, resulting in a unique image that can be used to create a fingerprint. The way that text is rendered on a canvas can also vary based on the font rendering settings and anti-aliasing algorithms used by different web browsers and operating systems.

More info:

WebGL fingerprint

WebGL (Web Graphics Library) fingerprinting involves creating a unique identifier based on the rendering characteristics and capabilities of a device’s graphics hardware using the WebGL API. This fingerprint can be used to track users across different websites and sessions.

The entire WebGL fingerprinting process starts with a rendering test and ends with hashing and identification. Here's the typical WebGL fingerprinting workflow:

Rendering test: The browser loads the WebGL fingerprinting script from the website and creates the WebGL context within a hidden HTML Canvas. The script then prompts the browser to produce a specific 2D or 3D graphic on the Canvas, with a scene capturing subtle details in rendering behavior.
Reading pixel data: The next step is to extract pixel data from the rendered output. This includes capturing shader and texture information to analyze pixel color, image patterns, and shape gradients.
WebGL parameter collection: The fingerprinting script queries various WebGL parameters generated throughout the rendering pipeline. These include supported extensions, maximum texture size, shading language version, vendor and renderer strings, and more.
Fingerprint hashing: The generated parameters are combined and hashed into a unique value, making the fingerprint transferable to a server.
Fingerprint identification: The hashed fingerprint is sent to a target website's server, which creates a unique identity for the incoming request. This way, the server can recognize the browser on subsequent visits.

More info:

GPU fingerprint mismatch

Bots may spoof WebGL vendor/renderer strings, but:

Forgot to spoof related shaders
Use inconsistent performance values in rendering
Skip GPU-intensive tests that run slowly or fail in headless environments

Audio fingerprint

Audio fingerprinting is a browser-based tracking technique that leverages the subtle differences in how each device processes sound to generate a unique identifier. Unlike cookies or IP-based tracking, it does not rely on stored client data. Instead, it exploits the Web Audio API, specifically the AudioContext interface, which allows JavaScript to synthesize, process, and analyze sound in the browser.

Here’s how it works:

A script generates an inaudible sound or waveform using the OscillatorNode and GainNode.
The output is processed by an audio graph, often through filters or AnalyserNodes.
The resulting values (e.g., from getChannelData or getFloatFrequencyData) are hashed into a fingerprint.

More info:

Audio Fingerprinting: Browser-Based Device Tracking Method

WebRTC IP leak

WebRTC is a set of JavaScript API’s that allow us to establish a peer-to-peer connection between two browsers to exchange data such as audio and video, allowing us to create applications with audio and video calling features.

What makes WebRTC special is that once a connection is established, data can be transmitted directly between browsers in real time without touching the server. By bypassing the server, we reduce latency since the data doesn’t have to go to the server first. This makes WebRTC great for exchanging audio and video.

Any two devices talking to each other directly via WebRTC, however, need to know each other’s real IP addresses. This allows a third-party website to exploit the WebRTC in your browser to detect your real IP address and use it to identify you. This is what we call a WebRTC leak.

More info:

Fonts fingerprinting

Font fingerprinting techniques rely on measuring the dimensions of HTML elements filled with text or single Unicode glyphs. However, font rendering in web browsers can be affected by multiple factors, leading to subtle differences in these measurements.

Font metrics measurement is a brute force method that tries different fonts from a large dictionary of known typefaces. By comparing the size of the rendered element to the default values, this method can determine if a font is present on the system.

Unicode glyphs measurement technique uses special Unicode characters with a large font size and default letterforms as a font-family to create fingerprints by hashing the obtained measurement results.

More info:

Plugin detection

Browser plugins—such as Flash, Java, or PDF viewers—can be queried via JavaScript using navigator.plugins or navigator.mimeTypes. The presence, order, and metadata of these plugins vary across operating systems, browsers, and user settings, making them useful as part of a browser fingerprint.

Even though modern browsers have phased out many traditional plugins, this method still contributes to the uniqueness of a browser environment and is often used in combination with other fingerprinting techniques.

More info:

JavaScript Browser Information

Media devices detection

JavaScript APIs such as navigator.mediaDevices.enumerateDevices() can reveal detailed information about connected audio and video hardware, including:

Number and types of microphones, webcams, and audio outputs
Device IDs (pseudonymous but persistent)
Labels (if media permissions were previously granted)

These attributes vary significantly across systems and can be used to fingerprint a user.

More info:

MediaDevices: enumerateDevices() method - Web APIs | MDN

WebDriver flag detection

Headless browsers or automation frameworks like Selenium or Playwright often expose telltale signs. One of the most commonly checked is the navigator.webdriver flag:

console.log(navigator.webdriver); // true if under automation

Other anti-bot systems check for:

Presence of Selenium-specific properties (e.g., window.__nightmare, window.domAutomation)
Overridden or missing native functions (e.g., toString, permissions.query)
Unusual behavior of navigator.languages, navigator.plugins, and screen properties

More info:

It is *not* possible to detect and block Chrome headless

Profile-Based browsing vs Incognito detection

Most anti-bot systems—including Google reCAPTCHA v3—assign higher trust to sessions that exhibit “aged” user profiles. Launching a browser with a clean slate each time (i.e., no persistent storage) is a strong signal of automation. Using a userDataDir mimics real users better by maintaining state across sessions.

userDataDir persist browser state between sessions. This includes:

Cookies and session storage
Local storage
Installed extensions
Login sessions (e.g., Google, Facebook)
Cache, autofill, and browsing history

More info:

BrowserType | Playwright

Timezone & locale fingerprinting with proxies

Modern anti-bot systems often cross-check IP-based geolocation (from your proxy) against browser-level locale settings to detect inconsistencies that indicate automation.

Key Fingerprint Signals:

Intl.DateTimeFormat().resolvedOptions().timeZone
Reveals the browser's configured timezone (e.g., "America/New_York")
navigator.language & navigator.languages
Report the browser’s primary and preferred languages (e.g., "en-US")
new Date().getTimezoneOffset()
Indicates local time difference from UTC (in minutes)
Accept-Language HTTP header
Often used to validate browser language settings against UI behavior

Detection Example:

If your proxy IP is from Germany, but your browser reports:

Timezone: America/Los_Angeles
Language: en-US
Offset: UTC-8

…you’ll likely be flagged due to geolocation mismatch.

Storage probing

Storage probing is a technique used to detect or fingerprint browsers based on the availability, size, and behavior of different client-side storage mechanisms. This method leverages inconsistencies or limitations in storage APIs across browsers, devices, and privacy modes.

The technique typically checks for:

LocalStorage and SessionStorage availability and quota
IndexedDB existence and performance
WebSQL support (deprecated but still used for detection)
QuotaManager usage via navigator.storage.estimate()

These APIs may behave differently in incognito or automation environments. For example, some headless contexts may report reduced storage quotas or fail to initialize IndexedDB, which can reveal the presence of automation.

`crypto.subtle` fingerprint

The window.crypto.subtle API, part of the Web Cryptography API, is intended for performing low-level cryptographic operations like hashing, encryption, and key generation. However, its behavior can also be used to detect automation or identify inconsistencies in the execution environment.

Fingerprinting techniques exploit:

Timing discrepancies in hashing operations (e.g., subtle differences in how long SHA-256 takes)
Feature availability in different browsers or automation environments
Promise resolution behavior and stack traces in headless or stealth contexts

Window & Screen dimension mismatches

Bots often run in non-standard resolutions or minimized/hidden windows.

Detection includes:

window.outerHeight vs innerHeight mismatch
screen.width inconsistent with user-agent (e.g. mobile UA with desktop screen)
window.screen.availWidth = 0 (common in virtual/remote sessions)
Missing or unrealistic values for DPI and pixel ratio

More info:

Screen Resolution Stats Worldwide | Statcounter Global Stats

Touch Capabilities and Pointer Events

Mobile users typically have touch input, and automation often fails to emulate this correctly.

Detection includes:

ontouchstart in window
navigator.maxTouchPoints = 0 on mobile UA
No PointerEvent or TouchEvent support

Battery API & Hardware APIs

Some headless or emulated environments don’t support APIs like:

navigator.getBattery()
navigator.deviceMemory
navigator.connection

Inconsistent values (e.g., 0 GB RAM, no battery, no network info) can reveal non-standard environments.

Unusual Behavior in Web Animations & Timers

Headless browsers sometimes throttle requestAnimationFrame, setTimeout, or CSS transitions.

Detection may involve:

Measuring animation jitter
Timing resolution anomalies
Long delays in event execution

Human-like Interaction Detection

Beyond static fingerprinting, many advanced anti-bot systems now evaluate how a user behaves on a page. Instead of just checking browser attributes, they monitor real-time interactions to determine if a visitor is human or scripted. These behavioral signals are harder to spoof and are often used in tandem with fingerprinting to boost detection accuracy.

Mouse movements

Human cursor movement tends to be nonlinear, imprecise, and often exhibits slight jitter and hesitation. Bots, in contrast, often move the mouse in straight lines, jump directly to coordinates, or follow perfectly smooth paths.

Detection techniques include:

Velocity and acceleration profiles (bots often move with a fixed speed)
Hover behavior over interactive elements
Path curvature and entropy (low entropy = likely automation)
Movement granularity and frequency of mousemove events

Typing patterns

Humans type with natural variations — including pauses, mistakes, and corrections. Bots, however, tend to inject characters instantly, with no delay or typos.

Detection patterns:

Keypress timing (keystroke dynamics): Measured via keydown, keypress, and keyup event intervals
Use of backspace and delete: Humans often correct mistakes
Typing latency and total input duration
Copy-paste detection via paste events

Some systems even use keystroke cadence as a biometric signature.

Scrolling behavior

Scroll events can reveal whether a visitor is manually exploring the page or scrolling in a scripted or robotic manner.

Human scroll characteristics:

Inertial scrolling with deceleration
Mouse wheel vs trackpad deltas
Scroll pauses and random interruptions
Scroll direction changes and overshooting

Bots often scroll to the bottom instantly, at fixed intervals, or in uniform chunks.

Form submission timing

Human users typically take a few seconds (or more) to fill out a form. Bots, however, often complete all fields and submit within milliseconds — sometimes without triggering focus or input events.

Common form submission signals:

Time between form load and submit
Field focus/blur event sequences
Use of autofill or programmatic .value assignment
Absence of typing or change events

Detection logic may flag forms submitted without interaction or with identical submission timing across sessions.

Focus and visibility detection

Many bots run in background tabs or off-screen.

They can check for:

document.visibilityState !== 'visible'
document.hasFocus() = false
Lack of expected focus / blur event sequence

Browser automation with anti-detection capabilities

As detection techniques grow more advanced, automation tools have evolved to stay under the radar. Simply launching a headless browser is no longer enough — effective scraping now requires frameworks and plugins that simulate human behavior, spoof system attributes, and avoid common fingerprinting traps.

This section outlines the most widely used browser automation frameworks, scraping libraries, and stealth-enhancing tools, along with benchmarks and notes on their detection profiles. Whether you're working with Playwright, Puppeteer, or hybrid solutions like Camoufox or Botasaurus, the tools listed below represent the current landscape of stealth automation.

Benchmark: GitHub - techinz/browsers-benchmark: Browser automation engine benchmark - Test bypass rates, performance & stealth against Cloudflare, DataDome, reCAPTCHA and other bot detection systems

Automation frameworks:

playwright: Fast and reliable end-to-end testing for modern web apps | Playwright
- Multi-language library to control Chrome and Firefox
puppeteer: Puppeteer | Puppeteer
- JavaScript library that provides a high-level API to control Chrome
selenium: Selenium
- It is the old standard library, not used anymore. Use it if you want to be flagged as a bot.

Scrapping frameworks

botasaurus: GitHub - omkarcloud/botasaurus: The All in One Framework to Build Undefeatable Scrapers
- Undetectable scraping framework with Human-like interaction that bypasses bot detections.
scrapy: Scrapy
- An application framework for crawling websites and extracting structured data without the need for a browser

Playwright + plugin options

patchright: GitHub - Kaliiiiiiiiii-Vinyzu/patchright: Undetected version of the Playwright testing and automation library.
- Work well in JS with playwright using Chrome as the browser (instead of chromium)
- Leaks WebRTC IP, can be bypassed with this: improvement: prevent webrtc ip leak · Issue #47 · Kaliiiiiiiiii-Vinyzu/patchright-python
tf-playwright-stealth: GitHub - tinyfish-io/tf-playwright-stealth: A fork of https://github.com/AtuboDad/playwright_stealth

Browser compiled from source with stealth patches

Camoufox: Introduction | Camoufox
- Based on Firefox, it only works with Python.
- Has a humanize feature to make the mouse cursor movement look like a human.
- There is a JS port, but it does not work properly yet (July 2025)GitHub - apify/camoufox-js: Experimental Camoufox JS port .
BotBrowser: GitHub - botswin/BotBrowser: 🤖 Bypasses Cloudflare, Shape, PerimeterX, Datadome, Akamai, Kasada, hCaptcha, FunCaptcha and reCAPTCHA with unmatched reliability - powered by a modified Chromium core
- Required a licence. I contacted the autor and this was his response: Hello thanks for interested in BotBrowser, we charge for licenses. The starter license is $299 and includes 30 profiles, the pro license is $499 and includes 100 profiles. The ent license is customized based on specific requirements. The profiles can run on any devices without restrictions. (lifetime usage), no instances limited, no windows limited. Payment can be usdt via erc20 or trc20

Test how detectable your browser really is

Browser Bot Detection Software | Fingerprint

My Fingerprint- Am I Unique ?

Browserleaks - Check your browser for privacy leaks: a suite of tools that offers a range of tests to evaluate the security and privacy of your web browser
IP/DNS Detect: check your IP address for WebRTC leaks

BrowserScan - Robot Detection/WebDriver

Bot / Headless Chrome Detection Tests: This page attempts to detect if you are a Bot or Not.
CreepJS: Creepy device and browser fingerprinting
Know Your Visitors | OverpoweredJS: bot detection tool
detectIncognito - JavaScript Private Browsing Detection : detect if you are using incognito mode

What this means for bots and defenders

The detection methods outlined here — Canvas, WebGL, audio, fonts, plugins, media devices, and beyond — show how websites collect small signals and combine them into a unique profile. Modern anti-bot systems don’t rely on a single indicator but on consistency across dozens of parameters.

For scraping bots, this means the challenge goes far beyond rotating proxies or tweaking headers. Effective automation has to mimic a full browsing environment, maintain persistent state, and reproduce human interaction patterns to avoid being flagged.

For defenders, these techniques highlight why combining fingerprinting with behavioral analysis provides stronger protection against automated traffic. The focus of detection has already shifted from static browser properties to real-time user behavior — and that’s where the next phase of the bot vs. detection arms race will take place.

Franco Morero

Discussion about this post

Ready for more?

Franco Morero

The hidden tricks websites use to catch scraping bots

A deep dive into the fingerprints, leaks, and behavioral signals websites use to unmask automation.

Browser automation detection methods

Canvas 2D fingerprint

WebGL fingerprint

GPU fingerprint mismatch

Audio fingerprint

WebRTC IP leak

Fonts fingerprinting

Plugin detection

Media devices detection

WebDriver flag detection

Profile-Based browsing vs Incognito detection

Timezone & locale fingerprinting with proxies

Key Fingerprint Signals:

Detection Example:

Storage probing

crypto.subtle fingerprint

Window & Screen dimension mismatches

Touch Capabilities and Pointer Events

Battery API & Hardware APIs

Unusual Behavior in Web Animations & Timers

Human-like Interaction Detection

Mouse movements

Typing patterns

Scrolling behavior

Form submission timing

Focus and visibility detection

Browser automation with anti-detection capabilities

Automation frameworks:

Scrapping frameworks

Playwright + plugin options

Browser compiled from source with stealth patches

Test how detectable your browser really is

What this means for bots and defenders

Discussion about this post

Ready for more?

`crypto.subtle` fingerprint