Nbliq Crawler

About Nbliq Crawler

Nbliq Crawler is the web crawling system used by Nbliq Search to discover, fetch, and index publicly available web pages. It helps Nbliq understand website content and make relevant pages available in Nbliq Search results.

The crawler visits websites, reads publicly accessible pages, follows allowed links, and stores useful information such as page title, description, headings, text content, images with alt text, and other metadata.

Nbliq Crawler is designed to respect website rules, including robots.txt, crawl permissions, and standard web crawling best practices.

Why Nbliq Crawls Websites

Nbliq crawls websites to improve search discovery and provide users with accurate, fresh, and useful results.

The crawler may visit your website to:

  • Discover new pages
  • Update existing indexed pages
  • Detect changed or removed content
  • Understand page title, headings, and descriptions
  • Improve search result quality
  • Identify useful public content for AI-powered search answers

Nbliq only crawls publicly available pages unless access restrictions are placed by the website owner.

Nbliq Crawler User-Agent

Website owners can identify Nbliq Crawler using its user-agent : NbliQ Bot

Recommended full user-agent format:

Mozilla/5.0 (compatible; NbliqBot/1.0; +https://nbliq.com/crawler)
You may see requests from Nbliq Crawler in your server logs with this user-agent.

How Nbliq Crawler Works

Nbliq Crawler works by fetching public web pages and analyzing their content for search indexing.

The crawler may process:

  • HTML pages
  • Meta descriptions
  • Headings such as H1, H2, and H3
  • Public images with alt text
  • Sitemap URLs
  • Page titles
  • Canonical URLs
  • Internal links
  • Structured data where available
  • Last modified information

Nbliq may use sitemap files when available to better understand your website structure.

Example sitemap location:
https://example.com/sitemap.xml

Controlling Nbliq Crawler with robots.txt

Website owners can control Nbliq Crawler using the standard robots.txt file.

To allow Nbliq Crawler:

User-agent: NbliqBot
Allow: /

To block Nbliq Crawler completely:

User-agent: NbliqBot
Disallow: /

To block a specific folder:

User-agent: NbliqBot
Disallow: /private/
Disallow: /admin/

To allow all crawlers including NbliqBot:

User-agent: *
Allow: /

Your robots.txt file should be placed at:
https://example.com/robots.txt

Sitemap Support

Nbliq Crawler supports sitemap discovery through robots.txt and common sitemap locations.

Example:
Sitemap: https://example.com/sitemap.xml

Using a sitemap helps Nbliq discover your important pages faster and understand when content was last updated.

A sitemap may include:

<url>
 <loc>example.com/page.html;</loc>
 <lastmod>2026-05-20</lastmod>
 <changefreq>monthly</changefreq>
 <priority>0.8</priority>
</url>

Crawl Frequency

Nbliq Crawler may visit websites periodically based on page importance, update frequency, website structure, and crawl permissions.

Frequently updated pages may be checked more often, while rarely changed pages may be crawled less frequently.

Nbliq tries to avoid unnecessary load on websites and may reduce crawl frequency if a website responds slowly or returns errors.

Blocking Nbliq from Indexing a Page

To prevent a page from appearing in Nbliq Search results, you can use a noindex robots meta tag.

Add this inside the section of your page:

<meta name="robots" content="noindex">
To prevent NbliqBot specifically:
<meta name="NbliqBot" content="noindex">

You can also use HTTP headers:

X-Robots-Tag: noindex

Preventing Link Following

To prevent crawler link following on a page:

<meta name="robots" content="nofollow">

To prevent indexing and link following:

<meta name="robots" content="noindex, nofollow">

Recommended Website Settings for Nbliq Search

To help Nbliq understand your website better, we recommend:

  • Keep your robots.txt file updated
  • Use clear page titles
  • Use proper H1 and H2 headings
  • Avoid duplicate pages
  • Use structured data where relevant
  • Avoid blocking important CSS or page content
  • Submit or maintain a valid sitemap
  • Add unique meta descriptions
  • Use canonical URLs
  • Add alt text for important images
  • Return proper HTTP status codes

HTTP Status Codes

Nbliq Crawler respects standard HTTP status codes.
Common examples:

STATUS CODE MEANING
200 Page is available and may be indexed
301 / 308 Permanent redirect
302 / 307 Temporary redirect
404 Page not found
410 Page permanently removed
429 Too many requests
500 Server error
503 Service unavailable

If a page returns 404 or 410, Nbliq may remove it from search results over time.

Server Load and Crawl Rate

Nbliq Crawler is designed to crawl responsibly. If your website experiences high traffic from NbliqBot, you can limit access using robots.txt or server-level rules.

Example using crawl delay:

User-agent: NbliqBot
Crawl-delay: 10

Note: Crawl-delay support may vary depending on crawler configuration, but Nbliq aims to respect reasonable crawl rate instructions where possible.

Server Load and Crawl Rate

User-agent: NbliqBot
Allow: /

Disallow: /admin/
Disallow: /login/
Disallow: /cart/
Disallow: /checkout/
Disallow: /user/
Disallow: /account/

Sitemap: https://example.com/sitemap.xml

Server Load and Crawl Rate

You can identify Nbliq Crawler by checking your server access logs for the NbliqBot user-agent.
Example log entry:

GET /page.html HTTP/1.1
User-Agent: Mozilla/5.0 (compatible; NbliqBot/1.0;
+https://nbliq.com/crawler)

For security, do not rely only on user-agent strings, because they can be copied by other clients. If needed, Nbliq may provide additional verification methods in the future.

Contact Nbliq

If you are a website owner and have questions about Nbliq Crawler, crawl behavior, indexing, or removal requests,
please contact us.

Email: [email protected]
Website: https://nbliq.com
Crawler Info: https://nbliq.com/crawler


Copyright copyright 2026, Airo Global Software (P) Ltd, All rights
reserved