Skip to main content
Article
securityai-securityweb-scraping-defensebot-detectiondata-protectiondeception-technology

Miasma: A Tool to Trap AI Web Scrapers

Miasma detects and traps AI web scrapers by serving deceptive, resource-intensive content. This prevents unauthorized data extraction and intellectual property theft, protecting your website from malicious AI agents.

intermediate1 hour5 steps
The play
  1. Identify AI Scraper Signatures
    Analyze incoming traffic for patterns indicative of AI scrapers. Look for headless browser user agents (e.g., 'HeadlessChrome', 'Puppeteer'), non-human-like behavior (rapid page access, no interaction with UI elements), or ignored 'robots.txt' directives.
  2. Implement HTML Honeypots
    Embed hidden links, form fields, or invisible elements within your HTML that are styled to be invisible to human users. If a bot interacts with these elements, it's a strong indicator of automated scraping.
  3. Deploy JavaScript Challenges
    Integrate client-side JavaScript challenges, such as CAPTCHAs, proof-of-work puzzles, or complex DOM manipulation checks. These are harder for headless browsers or basic HTTP clients to solve without significant computational overhead.
  4. Generate Deceptive Content (The 'Poison Pit')
    Once an AI scraper is detected, serve it dynamically generated, low-value content or create 'infinite loops' through pagination or recursive links. This wastes the scraper's resources and bandwidth, providing no valuable data.
  5. Block or Rate-Limit Scrapers
    Based on detection, implement server-side rules to block the scraper's IP address, serve a CAPTCHA, or severely rate-limit its access to protect your valuable content and server resources.
Starter code
from flask import Flask, request, Response

app = Flask(__name__)

@app.route('/')
def index():
    user_agent = request.headers.get('User-Agent', '').lower()
    if 'headlesschrome' in user_agent or 'puppeteer' in user_agent:
        # Serve deceptive content for detected AI scraper
        return Response("<h1>Welcome to the Endless Maze!</h1><p>Keep searching for your prize...</p><a href='/trap'>Next Page</a>", mimetype='text/html')
    else:
        # Serve regular content for legitimate users
        return "<h1>Welcome to our site!</h1><p>Enjoy our content.</p>"

@app.route('/trap')
def trap():
    # Simulate an endless loop for scrapers
    return Response("<h1>You found another page!</h1><p>The journey never ends...</p><a href='/trap'>Next Page</a>", mimetype='text/html')

if __name__ == '__main__':
    app.run(debug=True)
Miasma: A Tool to Trap AI Web Scrapers — Action Pack