Nbliq Crawler

About Nbliq Crawler

Nbliq Crawler is the web crawling system used by Nbliq Search to discover, fetch, and index publicly available web pages. It helps Nbliq understand website content and make relevant pages available in Nbliq Search results.

The crawler visits websites, reads publicly accessible pages, follows allowed links, and stores useful information such as page title, description, headings, text content, images with alt text, and other metadata.

Nbliq Crawler is designed to respect website rules, including robots.txt, crawl permissions, and standard web crawling best practices.

Why Nbliq Crawls Websites

Nbliq crawls websites to improve search discovery and provide users with accurate, fresh, and useful results.

The crawler may visit your website to:

Discover new pages
Update existing indexed pages
Detect changed or removed content
Understand page title, headings, and descriptions
Improve search result quality
Identify useful public content for AI-powered search answers

Nbliq only crawls publicly available pages unless access restrictions are placed by the website owner.

Nbliq Crawler User-Agent

Website owners can identify Nbliq Crawler using its user-agent : NbliQ Bot

Recommended full user-agent format:


                        Mozilla/5.0 (compatible; NbliqBot/1.0;
                        +https://nbliq.com/crawler) 

                        You may see requests from Nbliq Crawler in your server logs with this user-agent.

How Nbliq Crawler Works

Nbliq Crawler works by fetching public web pages and analyzing their content for search indexing.

The crawler may process:

HTML pages
Meta descriptions
Headings such as H1, H2, and H3
Public images with alt text
Sitemap URLs

Page titles
Canonical URLs
Internal links
Structured data where available
Last modified information

Nbliq may use sitemap files when available to better understand your website structure.

Example sitemap location:
https://example.com/sitemap.xml

Controlling Nbliq Crawler with robots.txt

Website owners can control Nbliq Crawler using the standard robots.txt file.

To allow Nbliq Crawler:


                User-agent: NbliqBot

                        Allow: /

To block Nbliq Crawler completely:


                    User-agent: NbliqBot
Disallow: /

To block a specific folder:


                        User-agent: NbliqBot

                        
Disallow: /private/ 

                        Disallow: /admin/

To allow all crawlers including NbliqBot:


                User-agent: * 

                            Allow: /

Your robots.txt file should be placed at:
https://example.com/robots.txt

Sitemap Support

Nbliq Crawler supports sitemap discovery through robots.txt and common sitemap locations.

Example:
Sitemap: https://example.com/sitemap.xml

Using a sitemap helps Nbliq discover your important pages faster and understand when content was last updated.

A sitemap may include:

<url> 
  <loc>example.com/page.html;</loc>
  <lastmod>2026-05-20</lastmod>
  <changefreq>monthly</changefreq>
  <priority>0.8</priority>
 </url>

Crawl Frequency

Nbliq Crawler may visit websites periodically based on page importance, update frequency, website structure, and crawl permissions.

Frequently updated pages may be checked more often, while rarely changed pages may be crawled less frequently.

Nbliq tries to avoid unnecessary load on websites and may reduce crawl frequency if a website responds slowly or returns errors.

Blocking Nbliq from Indexing a Page

To prevent a page from appearing in Nbliq Search results, you can use a noindex robots meta tag.

Add this inside the section of your page:


                    <meta name="robots" content="noindex"> 

                    To prevent NbliqBot specifically: 

                    <meta name="NbliqBot" content="noindex">

You can also use HTTP headers:


                        X-Robots-Tag: noindex

Preventing Link Following

To prevent crawler link following on a page:


                        <meta name="robots" content="nofollow">

To prevent indexing and link following:


                                <meta name="robots" content="noindex, nofollow">

Recommended Website Settings for Nbliq Search

To help Nbliq understand your website better, we recommend:

Keep your robots.txt file updated
Use clear page titles
Use proper H1 and H2 headings
Avoid duplicate pages
Use structured data where relevant
Avoid blocking important CSS or page content

Submit or maintain a valid sitemap
Add unique meta descriptions
Use canonical URLs
Add alt text for important images
Return proper HTTP status codes

HTTP Status Codes

Nbliq Crawler respects standard HTTP status codes.
Common examples:

STATUS CODE	MEANING
200	Page is available and may be indexed
301 / 308	Permanent redirect
302 / 307	Temporary redirect
404	Page not found
410	Page permanently removed
429	Too many requests
500	Server error
503	Service unavailable

If a page returns 404 or 410, Nbliq may remove it from search results over time.

Server Load and Crawl Rate

Nbliq Crawler is designed to crawl responsibly. If your website experiences high traffic from NbliqBot, you can limit access using robots.txt or server-level rules.

Example using crawl delay:


                User-agent: NbliqBot 

                Crawl-delay: 10

Note: Crawl-delay support may vary depending on crawler configuration, but Nbliq aims to respect reasonable crawl rate instructions where possible.

Server Load and Crawl Rate


                    User-agent: NbliqBot 

                    Allow: / 



                    Disallow: /admin/ 

                    Disallow: /login/ 

                    Disallow: /cart/ 

                    Disallow: /checkout/ 

                    Disallow: /user/ 

                    Disallow: /account/ 


                    Sitemap: https://example.com/sitemap.xml

Server Load and Crawl Rate

You can identify Nbliq Crawler by checking your server access logs for the NbliqBot user-agent.
Example log entry:


                GET /page.html HTTP/1.1 

                User-Agent: Mozilla/5.0 (compatible; NbliqBot/1.0; 
 +https://nbliq.com/crawler)

For security, do not rely only on user-agent strings, because they can be copied by other clients. If needed, Nbliq may provide additional verification methods in the future.

Contact Nbliq

If you are a website owner and have questions about Nbliq Crawler, crawl behavior, indexing, or removal requests,
please contact us.

Email: [email protected]
Website: https://nbliq.com
Crawler Info: https://nbliq.com/crawler