About Assaybot
Information for web publishers on Index Exchange's site crawler bot
Assaybot is Index Exchange's AI powered automated content analysis crawler designed to ensure brand safety across our advertising exchange. The bot analyzes web page content to detect potential brand safety violations, helping to maintain high-quality inventory standards and protect advertiser interests.
Purpose
Assaybot operates as part of Index Exchange's quality assurance infrastructure. The system:
Analyzes publisher page content for brand safety compliance
Identifies potential violations including adult content, hate speech, and other harmful material
Maintains inventory quality standards across Index Exchange's supply network
Operates independently of the latency-sensitive ad serving path
Assaybot does not affect your site's search engine rankings or visibility. It is exclusively used for content moderation and brand safety assessment within Index Exchange's advertising ecosystem.
User Agent
Assaybot identifies itself using the following HTTP user-agent request header:
Mozilla/5.0 (compatible; Assaybot/0.1; +http://www.indexexchange.dev/bot.html)Important Security Note: The HTTP user-agent request header can be spoofed by other crawlers. For verification purposes, publishers should validate requests using IP address verification or contact Index Exchange directly through your account representative.
Authorized IP Address CIDR
All other requests for Assaybot outside this address space can be considered user-agent spoofed requests.
192.168.80.0/24Crawl Behavior
Access Frequency
Assaybot is designed to minimize impact on publisher infrastructure:
Deduplication: Distinct URLs may be analyzed up to once per 24-hour period by preventing excessive load and redundant processing
Concurrent Requests: The system operates with controlled concurrency to distribute load
Timeout Period: Each page request has a 30-second timeout
Retry Logic: Failed requests are retried up to 2 times with exponential backoff
What Assaybot Accesses
Assaybot processes URLs that appear in ad request traffic flowing through Index Exchange's supply ingress point then:
Extracts page URLs and referrer URLs from ad request data
Analyzes the rendered text content of web pages
Stores analysis results for quality assurance reporting
Does not index content for public search or external redistribution
Content Analysis Method
Primary Method:
Makes standard HTTP GET requests
Extracts visible text content from HTML
Strips scripts, styles, and non-visible elements
Timeout: 30 seconds
Follows redirects automatically
Advanced Secondary Methods (Deep Scanning):
Uses headless Chrome browser for JavaScript-heavy sites
Renders dynamic content
Captures screenshots for future image analysis capabilities
Reserved for specialized scanning needs
The scraping method can be configured based on publisher risk thresholds.
Domain Blocklist
Assaybot automatically excludes certain high-traffic domains from analysis to optimize system resources:
Major social media platforms (facebook.com, twitter.com/x.com, instagram.com, linkedin.com, tiktok.com)
Video platforms (youtube.com)
Search engines (google.com)
Data Collection & Privacy
Information Collected
URL: The full URL of the analyzed page
Domain: The root domain of the page
Publisher ID: Internal Index Exchange identifier linking to your account
Raw Text Content: Extracted visible text for content analysis
HTTP Status: Response code from the page request
Analysis Results: Brand safety assessment results from resultant AI inference
Timestamp: Date and time of analysis
Data Retention
Analysis results are stored in Elasticsearch with monthly indices (format:
assay-moderation-YYYY-MM)Historical data is retained for reporting, trend analysis, and quality improvement
Data is accessible only to authorized Index Exchange personnel and relevant publisher account teams
Data Usage
Analysis results are used exclusively for:
Brand safety quality assurance
Publisher account management
Advertiser protection
System performance optimization
Regulatory compliance reporting
Technical Specifications
Request Characteristics
Protocol: HTTPS only
HTTP Method: GET
Connection: Keep-alive
Accept-Encoding: gzip
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8
Accept-Language: en-US,en;q=0.5
DNT: 1 (Do Not Track enabled)
Content Size Limits
HTML Processing: Assaybot processes the full HTML response up to reasonable limits
Text Extraction: Extracted text content is analyzed in full for moderation
HTTP Status Handling
2xx Success: Content is analyzed normally
3xx Redirects: Followed automatically (up to standard limits)
4xx Client Errors: Logged and not retried
5xx Server Errors: Retried up to 2 times with backoff and logged
robots.txt Compliance
Current Status: robots.txt compliance is under active development and planned for implementation.
Future Behavior: Once implemented, Assaybot will respect standard robots.txt directives including:
User-agent: AssaybotandUser-agent: *rulesDisallowandAllowdirectivesCrawl-delayspecifications
Publisher Recommendation: Publishers may prepare for future robots.txt support by adding appropriate rules to their robots.txt file. However, these rules are not currently enforced. Note this may impact eligibaility to transact on Index Exchange for certain publishers. Exceptions to the robots.txt policy will be handled on a case by case basis.
Important: If you need to restrict Assaybot access before robots.txt compliance is implemented, please contact your Index Exchange account representative to discuss alternative solutions.
Managing Assaybot Access
Allowing Access
To ensure optimal brand safety monitoring and maintain good standing in Index Exchange's supply network, we recommend allowing Assaybot access to your content.
Benefits of allowing access:
Proactive identification of potential content issues
Faster resolution of brand safety concerns
Maintained access to demand across Index Exchange's network
Transparency in content quality assessment
Blocking Access (Not Recommended)
Publishers who choose to block Assaybot should be aware:
Quality Assurance Impact: Manual review processes may be required, potentially causing delays
Operational Communication: Blocking may necessitate additional coordination with your account team
Future robots.txt Support: Once implemented, standard robots.txt rules can be used for access control
How to Block (When robots.txt is supported):
Add the following to your robots.txt file:
To block specific sections while allowing others:
Current Workaround: Contact your Index Exchange account representative for alternative blocking arrangements.
Troubleshooting & Common Issues
High Request Volume
If you notice unexpectedly high request volume from Assaybot:
Verify Authenticity: Confirm requests are legitimate by checking user-agent and timing patterns. Index Exchange can provide a list of IP address exit nodes to assist with any investigations.
Check Deduplication: The system should not request the same URL more than once per 24 hours
Contact Support: Reach out to your account representative if issues persist
Access Errors
If Assaybot encounters repeated access errors (403, 401, etc.):
WAF/CDN Blocking: Check if your Web Application Firewall or CDN is blocking the bot
Rate Limiting: Verify rate limits are not overly restrictive for automated access
Authentication: Ensure publicly advertised pages are accessible without authentication
IP Allowlisting: Contact Index Exchange for IP ranges if allowlisting is required
Content Analysis Issues
If you believe Assaybot is incorrectly flagging content:
Review Flagged Content: Index Exchange can provide specific examples of flagged content for your review
Understand Criteria: Brand safety criteria include explicit sexual content, hate speech, violence, and illegal content
Request Review: Contact your account representative to request a manual review
Appeal Process: Work with the Exchange Quality team for remediation guidance
Privacy & Compliance
Assaybot's content analysis is designed to comply with:
GDPR: No personal data is intentionally collected; analysis focuses on published content
CCPA: Text content analysis falls under business operations exemptions
Industry Standards: Aligned with IAB brand safety guidelines and frameworks
Publishers with specific privacy concerns should contact their Index Exchange account representative.
System Status & Updates
Current Version
Assaybot v0.1 - Initial production release
This documentation is maintained by Index Exchange and reflects the current state of the Assaybot system. Publishers will be notified of significant changes to crawl behavior or capabilities.
Last updated
Was this helpful?

