Crawl Behavior
Access Frequency
Assaybot is designed to minimize impact on publisher infrastructure:
Deduplication: Assaybot maintains a multi-layer deduplication system to prevent redundant requests. Each URL is analyzed at most once within a 30-day rolling window. A short-term cache prevents duplicate requests within the same day, while a long-term filter ensures URLs are not re-crawled for up to 30 days
Per-Domain Concurrency: Assaybot limits the number of simultaneous requests to any single domain, ensuring no individual site experiences excessive load
Timeout Period: Each page request has a 30-second timeout — if your server does not respond within that window, Assaybot moves on
Retry Logic: Failed requests (server errors or rate limit responses) are retried up to 3 times with exponential backoff, increasing the delay between each attempt to avoid adding pressure to an already-strained server
Rate Limit Compliance: If your server returns a
429 Too Many Requestsresponse, Assaybot will back off automatically and retry with increasing delays
What Assaybot Accesses
Assaybot analyzes URLs that appear in ad request traffic flowing through Index Exchange. The system:
Processes page URLs and referrer URLs observed in ad request data
Makes a single HTTP GET request per URL to retrieve the page content
Extracts visible text content for brand safety analysis
Stores analysis results internally for quality assurance reporting
Does not index content for public search or external redistribution
Does not execute JavaScript, submit forms, or interact with page elements
Does not follow links on the page to discover new URLs — it only visits URLs already observed in ad traffic
Content Analysis Method
Assaybot uses a straightforward content retrieval approach:
Makes standard HTTP GET requests using the documented user-agent string
Extracts visible text content from the HTML response
Strips scripts, styles, navigation elements, and other non-visible content
Follows redirects automatically (up to 5 hops)
Timeout: 30 seconds per request
Cookies are disabled — Assaybot does not send or store cookies
The extracted text is then passed through a multi-stage AI classification pipeline to assess brand safety. No images, videos, or other media are downloaded or analyzed as of the last update to this guide.
Domain Safe List and Block List
Assaybot maintains curated domain lists to optimize system resources and focus analysis where it is most needed:
Safe List: Well-known, trusted publisher domains (such as major news outlets) are automatically classified as safe and are not crawled, saving resources for both Assaybot and the publisher
Block List: Domains that are already known to be non-compliant are excluded from crawling
These lists are maintained by Index Exchange and updated on a regular basis. If you believe your domain has been incorrectly categorized, please contact your account representative.
Last updated
Was this helpful?