What you can do
- Import single pages or crawl up to 100 pages from a domain
- Scrape JavaScript-rendered and formatted content
- Auto-sync to keep website content up-to-date
- Organize crawled sites with automatic folder structures
Setup
1
Select Website as your source
When adding documents to a knowledge folder, choose Website from the available sources.
2
Enter the URL
Provide the URL of the webpage you want to import.
3
Enable crawling (optional)
Toggle crawling on to import up to 100 pages from the same domain, or leave it off to import only the single page.
4
Import content
Content is scraped and imported from the specified URL.
Learn more about managing knowledge folders
How it works
Browser rendering scrapes website content for accurate extraction:- Main page content and text.
- Formatted content (headings, lists, paragraphs).
- JavaScript-rendered content.
- Publicly accessible information only.
Single page import
Single page import
By default, content is imported from only the single URL you provide. This is ideal for:
- Specific documentation pages.
- Help articles.
- FAQ pages.
- Individual blog posts.
Website crawling
Website crawling
Enable crawling to import entire documentation sites automatically:
- Automatic discovery: Links within the same domain are followed to discover and import connected pages.
- Breadth-first crawling: Pages are crawled level by level for comprehensive coverage.
- Page limit: Maximum of 100 pages crawled to prevent overloading your website.
- Folder structure: Crawled sites are organized under a root folder named after the domain.
- Same-domain only: Crawling stays within the original domain, external links are filtered out.
Crawling respects the same domain as the starting URL. External links are automatically filtered out.
Requirements
Managing imported content
After import:- Archive pages to exclude them from agent responses.
- Move pages between knowledge folders.
- Delete pages that are no longer needed.
Learn more about managing documents
Auto-sync
When auto-sync is enabled, website content stays up-to-date:- Page content is re-scraped during sync to capture updates.
- Changes to the webpage are reflected in your knowledge base.
- If the page becomes unavailable, the sync fails and you’re notified.