Nbliq Crawler
About Nbliq Crawler
Nbliq Crawler is the web crawling system used by Nbliq Search to discover,
fetch, and index publicly available web pages. It helps Nbliq understand website content and
make relevant pages available in Nbliq Search results.
The crawler visits websites, reads publicly accessible pages, follows
allowed links, and stores useful information such as page title, description, headings, text
content, images with alt text, and other metadata.
Nbliq Crawler is designed to respect website rules, including robots.txt,
crawl permissions, and standard web crawling best practices.
Why Nbliq Crawls Websites
Nbliq crawls websites to improve search discovery and provide users with
accurate, fresh, and useful results.
The crawler may visit your website to:
- Discover new pages
- Update existing indexed pages
- Detect changed or removed content
- Understand page title, headings, and descriptions
- Improve search result quality
- Identify useful public content for AI-powered search answers
Nbliq only crawls publicly available pages unless access restrictions are placed by the
website owner.
Nbliq Crawler User-Agent
Website owners can identify Nbliq Crawler using its user-agent : NbliQ Bot
Recommended full user-agent format:
Mozilla/5.0 (compatible; NbliqBot/1.0;
+https://nbliq.com/crawler)
You may see requests from Nbliq Crawler in your server logs with this user-agent.
How Nbliq Crawler Works
Nbliq Crawler works by fetching public web pages and analyzing their content for
search indexing.
The crawler may process:
- HTML pages
- Meta descriptions
- Headings such as H1, H2, and H3
- Public images with alt text
- Sitemap URLs
- Page titles
- Canonical URLs
- Internal links
- Structured data where available
- Last modified information
Nbliq may use sitemap files when available to better understand your website structure.
Example sitemap location:
https://example.com/sitemap.xml
Controlling Nbliq Crawler with robots.txt
Website owners can control Nbliq Crawler using the standard robots.txt file.
To allow Nbliq Crawler:
User-agent: NbliqBot
Allow: /
To block Nbliq Crawler completely:
User-agent: NbliqBot
Disallow: /
To block a specific folder:
User-agent: NbliqBot
Disallow: /private/
Disallow: /admin/
To allow all crawlers including NbliqBot:
User-agent: *
Allow: /
Your robots.txt file should be placed at:
https://example.com/robots.txt
Sitemap Support
Nbliq Crawler supports sitemap discovery through robots.txt and common sitemap
locations.
Example:
Sitemap: https://example.com/sitemap.xml
Using a sitemap helps Nbliq discover your important pages faster and understand
when content was last updated.
A sitemap may include:
<url>
<loc>example.com/page.html;</loc>
<lastmod>2026-05-20</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
Crawl Frequency
Nbliq Crawler may visit websites periodically based on page importance, update frequency, website structure, and crawl permissions.
Frequently updated pages may be checked more often, while rarely changed pages may be crawled less frequently.
Nbliq tries to avoid unnecessary load on websites and may reduce crawl frequency if a website responds slowly or returns errors.
Blocking Nbliq from Indexing a Page
To prevent a page from appearing in Nbliq Search results, you can use a noindex robots meta tag.
Add this inside the
section of your page:
<meta name="robots" content="noindex">
To prevent NbliqBot specifically:
<meta name="NbliqBot" content="noindex">
You can also use HTTP headers:
X-Robots-Tag: noindex
Preventing Link Following
To prevent crawler link following on a page:
<meta name="robots" content="nofollow">
To prevent indexing and link following:
<meta name="robots" content="noindex, nofollow">
Recommended Website Settings for Nbliq Search
To help Nbliq understand your website better, we recommend:
- Keep your robots.txt file updated
- Use clear page titles
- Use proper H1 and H2 headings
- Avoid duplicate pages
- Use structured data where relevant
- Avoid blocking important CSS or page content
- Submit or maintain a valid sitemap
- Add unique meta descriptions
- Use canonical URLs
- Add alt text for important images
- Return proper HTTP status codes
HTTP Status Codes
Nbliq Crawler respects standard HTTP status codes.
Common examples:
| STATUS CODE
|
MEANING |
| 200 |
Page is available and may be indexed |
| 301 / 308 |
Permanent redirect |
| 302 / 307 |
Temporary redirect |
| 404 |
Page not found |
| 410 |
Page permanently removed |
| 429 |
Too many requests |
| 500 |
Server error |
| 503 |
Service unavailable |
If a page returns 404 or 410, Nbliq may remove it from search results over time.
Server Load and Crawl Rate
Nbliq Crawler is designed to crawl responsibly. If your website experiences high traffic from NbliqBot, you can limit access using robots.txt or server-level rules.
Example using crawl delay:
User-agent: NbliqBot
Crawl-delay: 10
Note: Crawl-delay support may vary depending on crawler configuration, but Nbliq aims to respect reasonable crawl rate instructions where possible.
Server Load and Crawl Rate
User-agent: NbliqBot
Allow: /
Disallow: /admin/
Disallow: /login/
Disallow: /cart/
Disallow: /checkout/
Disallow: /user/
Disallow: /account/
Sitemap: https://example.com/sitemap.xml
Server Load and Crawl Rate
You can identify Nbliq Crawler by checking your server access logs for the NbliqBot user-agent.
Example log entry:
GET /page.html HTTP/1.1
User-Agent: Mozilla/5.0 (compatible; NbliqBot/1.0;
+https://nbliq.com/crawler)
For security, do not rely only on user-agent strings, because they can be copied by other clients. If needed, Nbliq may provide additional verification methods in the future.
Contact Nbliq
If you are a website owner and have questions about Nbliq Crawler, crawl behavior, indexing, or removal requests,
please contact us.
Email: [email protected]
Website: https://nbliq.com
Crawler Info: https://nbliq.com/crawler