StrandVision Digital Signage

715-235-SIGN (7446) | www.StrandVision.com

Home > About Us > Web Crawling

SVDSBot

A friendly web information gathering system

What is it

SVDSBot is an automated robot system (sometimes also called a "spider") that visits web site pages to gather pieces of information much like the robots used by major search engine companies.

The SVDSBot crawler is identified by having a user-agent of the following form:
Mozilla/5.0 (compatible; SVDSBot/1.0; +http://www.strandvision.com/bot)

SVDSBot is designed to be distributed on several servers to improve performance and scale as needed. Also, to cut down on bandwidth usage, we run many crawlers on servers located near the sites they're indexing in the network. Therefore, your logs may show visits from a variety of sources, all with the user-agent SVDSBot. Our goal is to crawl one page from your site on each visit so that we do not overwhelm your server's bandwidth.

We care about your site's performance and will never hurt it!

SVDSBot is a very site-friendly crawler. We made it as "gentle" as possible when crawling sites: for most sites, our system shouldn't access your site more than once per 5 minutes. If your site has a lot of information to gather from multiple pages, the access may increase to as much as once per 3 seconds, or even less frequently, if another crawl delay is specified in your robots.txt file. SVDSBot respects rules you specify in your robots.txt file.

If any problems arise, they may be due to peculiarities of your particular site, or a bug on another site linking to you. Therefore, we would like to ask you, if you noticed any problem with SVDSBot, please report it to strandadmin at StrandVision.com. We will quickly make unique settings for your particular site, so that the crawling will never affect your site's performance.

Why is it crawling my site

The StrandVision Digital Signage system allows users to request information from other web sites. The SVDSBot works in the background to gather this information on a consistent and site-friendly basis.

As our system crawls your site, any new and updated information discovered is added to the StrandVision Digital Signage cache to be used by all electronic signage users accessing that information. In most cases, this cache is not updated more than once every 15 minutes and many times it is only updated once per day. If the system determines that the information is not being used, it will discontinue crawling those pages of your site.

In some situations, SVDSBot will add other web sites and pages to the list of pages to crawl based on links (SRC and HREF) that it finds on each page. In this situation, new sites, changes to existing sites, and dead links are also noted.

The crawler systems are engineered to be as friendly as possible, such as limiting request rates to any specific site (SVDSBot doesn't make more than one hit per 3 seconds), automatically backing away if a site is down or slow.

For WebMasters

Blocking with robots.txt

Please note that SVDSBot is collecting only publicly available information that can be accessed by any random visitor. In case you think the crawler collects some sensitive information please remove it from public access. SVDSBot is designed to be very polite and it can make only 1 hit per 3 seconds max. You can easily slow SVDSBot (and any other robot / crawler which takes directions from the robots.txt file that should be on your site).

With a robots.txt file you may block the SVDSBot crawler from parts or all of your site or slow it, as shown in the following examples:

Block specific parts of your site:

Block entire site:

Slow the crawler:

Note: Once you've created your robots.txt file, there may be a several minute delay before SVDSBot discovers your changes. If our bot is still crawling content you've blocked in robots.txt, check that the robots.txt is in the correct location. It must be in the top directory of the server (for example, www.example.com/robots.txt); placing the file in a subdirectory won't have any effect.

For a general introduction to the robots.txt protocol, please see http://www.robotstxt.org. Please also see the Wikipedia article for more details and examples of robots.txt rules.

Contact us

All that said, we of course take any request to desist crawling any site, or parts of a site, or any other feedback on the crawler operations seriously and will act on it in a prompt and appropriate manner.

If this is the case for you please don't hesitate to contact us at strandadmin at StrandVision.com and we will be happy to exclude your site, or otherwise investigate immediately.

 

CELEBRATING
Years
Years
14+
Since Aug 2003