Page cover

About Assaybot

Information for web publishers on Index Exchange's site crawler bot

Assaybot is Index Exchange's AI powered automated content analysis crawler designed to ensure brand safety across our advertising exchange. The bot analyzes web page content to detect potential brand safety violations, helping to maintain high-quality inventory standards and protect advertiser interests.

Purpose

Assaybot operates as part of Index Exchange's quality assurance infrastructure. The system:

  • Analyzes publisher page content for brand safety compliance

  • Identifies potential violations including adult content, hate speech, and other harmful material

  • Maintains inventory quality standards across Index Exchange's supply network

  • Operates independently of the latency-sensitive ad serving path

Assaybot does not affect your site's search engine rankings or visibility. It is exclusively used for content moderation and brand safety assessment within Index Exchange's advertising ecosystem.

User Agent

Assaybot identifies itself using the following HTTP user-agent request header:

Mozilla/5.0 (compatible; Assaybot/0.1; +http://www.indexexchange.dev/bot.html)

Important Security Note: The HTTP user-agent request header can be spoofed by other crawlers. For verification purposes, publishers should validate requests using IP address verification or contact Index Exchange directly through your account representative.

Authorized IP Address CIDR

All other requests for Assaybot outside this address space can be considered user-agent spoofed requests.

192.168.80.0/24

Crawl Behavior

Access Frequency

Assaybot is designed to minimize impact on publisher infrastructure:

  • Deduplication: Distinct URLs may be analyzed up to once per 24-hour period by preventing excessive load and redundant processing

  • Concurrent Requests: The system operates with controlled concurrency to distribute load

  • Timeout Period: Each page request has a 30-second timeout

  • Retry Logic: Failed requests are retried up to 2 times with exponential backoff

What Assaybot Accesses

Assaybot processes URLs that appear in ad request traffic flowing through Index Exchange's supply ingress point then:

  • Extracts page URLs and referrer URLs from ad request data

  • Analyzes the rendered text content of web pages

  • Stores analysis results for quality assurance reporting

  • Does not index content for public search or external redistribution

Content Analysis Method

Primary Method:

  • Makes standard HTTP GET requests

  • Extracts visible text content from HTML

  • Strips scripts, styles, and non-visible elements

  • Timeout: 30 seconds

  • Follows redirects automatically

Advanced Secondary Methods (Deep Scanning):

  • Uses headless Chrome browser for JavaScript-heavy sites

  • Renders dynamic content

  • Captures screenshots for future image analysis capabilities

  • Reserved for specialized scanning needs

The scraping method can be configured based on publisher risk thresholds.

Domain Blocklist

Assaybot automatically excludes certain high-traffic domains from analysis to optimize system resources:

  • Major social media platforms (facebook.com, twitter.com/x.com, instagram.com, linkedin.com, tiktok.com)

  • Video platforms (youtube.com)

  • Search engines (google.com)

Data Collection & Privacy

Information Collected

  • URL: The full URL of the analyzed page

  • Domain: The root domain of the page

  • Publisher ID: Internal Index Exchange identifier linking to your account

  • Raw Text Content: Extracted visible text for content analysis

  • HTTP Status: Response code from the page request

  • Analysis Results: Brand safety assessment results from resultant AI inference

  • Timestamp: Date and time of analysis

Data Retention

  • Analysis results are stored in Elasticsearch with monthly indices (format: assay-moderation-YYYY-MM)

  • Historical data is retained for reporting, trend analysis, and quality improvement

  • Data is accessible only to authorized Index Exchange personnel and relevant publisher account teams

Data Usage

Analysis results are used exclusively for:

  • Brand safety quality assurance

  • Publisher account management

  • Advertiser protection

  • System performance optimization

  • Regulatory compliance reporting

Technical Specifications

Request Characteristics

  • Protocol: HTTPS only

  • HTTP Method: GET

  • Connection: Keep-alive

  • Accept-Encoding: gzip

  • Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8

  • Accept-Language: en-US,en;q=0.5

  • DNT: 1 (Do Not Track enabled)

Content Size Limits

  • HTML Processing: Assaybot processes the full HTML response up to reasonable limits

  • Text Extraction: Extracted text content is analyzed in full for moderation

HTTP Status Handling

  • 2xx Success: Content is analyzed normally

  • 3xx Redirects: Followed automatically (up to standard limits)

  • 4xx Client Errors: Logged and not retried

  • 5xx Server Errors: Retried up to 2 times with backoff and logged

robots.txt Compliance

Current Status: robots.txt compliance is under active development and planned for implementation.

Future Behavior: Once implemented, Assaybot will respect standard robots.txt directives including:

  • User-agent: Assaybot and User-agent: * rules

  • Disallow and Allow directives

  • Crawl-delay specifications

Publisher Recommendation: Publishers may prepare for future robots.txt support by adding appropriate rules to their robots.txt file. However, these rules are not currently enforced. Note this may impact eligibaility to transact on Index Exchange for certain publishers. Exceptions to the robots.txt policy will be handled on a case by case basis.

Important: If you need to restrict Assaybot access before robots.txt compliance is implemented, please contact your Index Exchange account representative to discuss alternative solutions.

Managing Assaybot Access

Allowing Access

To ensure optimal brand safety monitoring and maintain good standing in Index Exchange's supply network, we recommend allowing Assaybot access to your content.

Benefits of allowing access:

  • Proactive identification of potential content issues

  • Faster resolution of brand safety concerns

  • Maintained access to demand across Index Exchange's network

  • Transparency in content quality assessment

Publishers who choose to block Assaybot should be aware:

  • Quality Assurance Impact: Manual review processes may be required, potentially causing delays

  • Operational Communication: Blocking may necessitate additional coordination with your account team

  • Future robots.txt Support: Once implemented, standard robots.txt rules can be used for access control

How to Block (When robots.txt is supported):

Add the following to your robots.txt file:

To block specific sections while allowing others:

Current Workaround: Contact your Index Exchange account representative for alternative blocking arrangements.

Troubleshooting & Common Issues

High Request Volume

If you notice unexpectedly high request volume from Assaybot:

  1. Verify Authenticity: Confirm requests are legitimate by checking user-agent and timing patterns. Index Exchange can provide a list of IP address exit nodes to assist with any investigations.

  2. Check Deduplication: The system should not request the same URL more than once per 24 hours

  3. Contact Support: Reach out to your account representative if issues persist

Access Errors

If Assaybot encounters repeated access errors (403, 401, etc.):

  • WAF/CDN Blocking: Check if your Web Application Firewall or CDN is blocking the bot

  • Rate Limiting: Verify rate limits are not overly restrictive for automated access

  • Authentication: Ensure publicly advertised pages are accessible without authentication

  • IP Allowlisting: Contact Index Exchange for IP ranges if allowlisting is required

Content Analysis Issues

If you believe Assaybot is incorrectly flagging content:

  1. Review Flagged Content: Index Exchange can provide specific examples of flagged content for your review

  2. Understand Criteria: Brand safety criteria include explicit sexual content, hate speech, violence, and illegal content

  3. Request Review: Contact your account representative to request a manual review

  4. Appeal Process: Work with the Exchange Quality team for remediation guidance

Privacy & Compliance

Assaybot's content analysis is designed to comply with:

  • GDPR: No personal data is intentionally collected; analysis focuses on published content

  • CCPA: Text content analysis falls under business operations exemptions

  • Industry Standards: Aligned with IAB brand safety guidelines and frameworks

Publishers with specific privacy concerns should contact their Index Exchange account representative.

System Status & Updates

Current Version

Assaybot v0.1 - Initial production release


This documentation is maintained by Index Exchange and reflects the current state of the Assaybot system. Publishers will be notified of significant changes to crawl behavior or capabilities.

Last updated

Was this helpful?