Command Line Arguments:

  • −−index, −i [URL]
    Indexes the website specified with [URL]
  • −−add−hostlist
    Doesn’t index the [URL] specified with “−−index”; it simply adds the hostname to the list of the Hosts (hostlist), prints its ID and and exits! This command is extremely useful if you want to use the power of the table: hostlist_extras [Read more about: hostlist_extras regex] and you need the ID of a host (not) present in the table of the hosts.
  • −−threads, −t [1−100]
    Sets number of threads
  • −s
    Single Mode: On (Default: Off). If Single Mode = On : Indexes the website specified with “−−index” and exits.
  • −−cache
    Saves a copy of each indexed page (Default: Doesn’t save cache)
  • −−cache−compressed
    Saves a compressed copy of each indexed page (Default: Doesn’t save cache)
  • −−rels, −r [1,2]
    Saves relationships between pages (Default: Doesn’t save rels). This could be really useful to generate a map of who links who and who is linked by.
    1: saves only hostnames (Example: www.example.com links www.test.com; www.example.com links www.domain.net)
    2: saves hostnames and pages (Example: www.example.com/index.html links: www.example.com/download.html, www.example.com/test.html and www.test.com/docs.php; …)
    Let’s see an example of what you can do with this feature:
    http://lab.openwebspider.org/8like.php
  • −−add−external, −e
    Adds External Hosts (Default: Doesn’t add external hosts).
    If not specified all external hosts found in crawled pages will be ignored by the crawler and won’t indexed in the future.
  • −−conf−file [filename]
    Sets a configuration file (Default: openwebspider.conf)



Limits:

  • −−max−depth, −m [0−1000]
    Sets Max Depth Level of the pages to index. (Default: −1 (Index all pages))
    Depth Level = 0 : Index only home−page
    Depth Level = 1 : Index home−page and all pages directly linked by the home−page
  • −−max−pages, −l [1−1000000]
    Sets Max Pages to Index (per domain)
  • −−max−seconds, −c [1−100000]
    Sets Max Seconds (per domain)
  • −−max−kb, −k [1−100000]
    Sets Max Kb to Download (per domain)
  • −−errors [1−1000]
    Sets Max HTTP Errors Code (per domain)

Help:

  • −−help, −h

New Features in OpenWebSpiderCS v0.1.1

  • −−crawl−delay [seconds]
    Seconds between the download of a page and the next one (Default: 0 seconds)
  • −−req−timeout [seconds]
    HTTP Request Timeout (in seconds) (Default: 60 seconds)
  • −−stress−test [value]
    Downloads the same page (specified with −−index) x-times and exits
    Useful to perform stress-tests over your web server

New Features in OpenWebSpiderCS v0.1.2

  • −−images
    Indexes images
  • −−req−timeout [seconds]
    HTTP Request Timeout (in seconds) (Default: 60 seconds)
  • −−stress−test [value]
    NOW improved: OpenWebSpider# doesn’t require a configuration file and a MySQL Server and it doesn’t check robots.txt
  • −−no−index
    Doesen’t index crawled pages;
    Useful to index images or to create a map of a website (using −−rels)

New Features in OpenWebSpiderCS v0.1.3

  • −−keep−dup
    Doesen’t delete duplicated pages

New Features in OpenWebSpiderCS v0.1.4

  • −−pdf
    Indexes PDFs
  • −−mp3
    Indexes MP3s
Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Furl
  • Live
  • Reddit
  • Segnalo
  • StumbleUpon
  • Technorati
  • Upnews
  • Wikio
  • YahooMyWeb