Released OpenWebSpider v0.1.3

CHANGELOG:

  • New feature: CRAWLER NAME and CRAWLER VERSION used in the User-Agent string in HTTP Requests
  • New feature: New configuration file field: sql_hostlist_where
  • New feature: new command-line argument: –keep-dup
  • BUG: fixed the regex used to extract URLs from <BASE>
  • BUG: fixed in the function that extracts URLs
  • BUG: fixed a bug in page.cs::normalizePage()
  • BUG: fixed minor bugs
  • BUG: fixed a bug in robots.txt’s parser
  • BUG: fixed a bug in page-rels handler

Source code and binary are available in the package: Download

Documentation of OpenWebSpider# v0.1

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Furl
  • Live
  • Reddit
  • Segnalo
  • StumbleUpon
  • Technorati
  • Upnews
  • Wikio
  • YahooMyWeb