Web Scraping

Import content from public websites to power agent responses. Scrape individual pages or crawl entire documentation sites to build knowledge from help centers, docs, and web resources.

What you can do

Import single pages or crawl up to 100 pages from a domain
Scrape JavaScript-rendered and formatted content
Auto-sync to keep website content up-to-date
Organize crawled sites with automatic folder structures

Setup

Select Website as your source

When adding documents to a knowledge folder, choose Website from the available sources.

Enter the URL

Provide the URL of the webpage you want to import.

Enable crawling (optional)

Toggle crawling on to import up to 100 pages from the same domain, or leave it off to import only the single page.

Import content

Content is scraped and imported from the specified URL.

Learn more about managing knowledge folders

How it works

Browser rendering scrapes website content for accurate extraction:

Main page content and text.
Formatted content (headings, lists, paragraphs).
JavaScript-rendered content.
Publicly accessible information only.

Single page import

By default, content is imported from only the single URL you provide. This is ideal for:

Specific documentation pages.
Help articles.
FAQ pages.
Individual blog posts.

Website crawling

Enable crawling to import entire documentation sites automatically:

Automatic discovery: Links within the same domain are followed to discover and import connected pages.
Breadth-first crawling: Pages are crawled level by level for comprehensive coverage.
Page limit: Maximum of 100 pages crawled to prevent overloading your website.
Folder structure: Crawled sites are organized under a root folder named after the domain.
Same-domain only: Crawling stays within the original domain, external links are filtered out.

Crawling respects the same domain as the starting URL. External links are automatically filtered out.

Requirements

Website must be publicly accessible (no authentication required).
Content must be available without login or paywalls.

Managing imported content

After import:

Archive pages to exclude them from agent responses.
Delete pages that are no longer needed.

Learn more about managing documents

Auto-sync

When auto-sync is enabled, website content stays up-to-date:

Page content is re-scraped during sync to capture updates.
Changes to the webpage are reflected in your knowledge base.
If the page becomes unavailable, the sync fails and you’re notified.

Auto-sync re-scrapes the same URLs only. It does not discover new pages or follow links, even if crawling was initially enabled.

Get started

Platform

Tickets

Automate

Measure

What you can do

Setup

How it works

Requirements

Managing imported content

Auto-sync

Get started

Platform

Tickets

Automate

Measure

​What you can do

​Setup

​How it works

​Requirements

​Managing imported content

​Auto-sync

What you can do

Setup

How it works

Requirements

Managing imported content

Auto-sync