OpenWebSpiderCS v0.1.4 - Mysql/NET Connector upgraded to 5.2.5.0 - Enhanced encodings support - New feature: Support to META: "robots" (NOINDEX, NOFOLLOW) - New feature: New configuration file field: crawler_id - New field "crawler_id" in table "hostlist" - New table: crawler_act - New feature: Remote actions over running crawlers [Status, Play, Pause, Kill] - New file support: PDFs [Using PDFBox and IKVM] - New table: pdf - New file support: MP3s [Using UltraID3Lib] - New table: mp3 - New feature: new command-line argument: --pdf - New feature: new command-line argument: --mp3 OpenWebSpiderCS v0.1.3 - New feature: CRAWLER NAME and CRAWLER VERSION used in the User-Agent string in HTTP Requests - New feature: New configuration file field: sql_hostlist_where - New feature: new command-line argument: --keep-dup - BUG: fixed the regex used to extract URLs from - BUG: fixed in the function that extracts URLs - BUG: fixed a bug in page.cs::normalizePage() - BUG: fixed minor bugs - BUG: fixed a bug in robots.txt's parser - BUG: fixed a bug in page-rels handler OpenWebSpiderCS v0.1.2 - BUG: fixed the regex used to extract URLs from (I)FRAME - New feature: OpenWebSpider# can index images (new table: images) - New feature: new command-line argument: --images - Improved Stress-test facility: now OpenWebSpider# doesn't require a configuration file and a MySQL Server and it doesn't check robots.txt (in stress-test mode) - Timeout in execution of SQL queries set to 120 seconds (2 minutes) - New feature: new configuration file fields: CRAWLER NAME and CRAWLER VERSION - New feature: CRAWLER NAME used over robots.txt OpenWebSpiderCS v0.1.1 - New feature: new command-line argument: --req-timeout - New feature: new command-line argument: --stress-test - BUG: fixed a bug in http.cs::getURL() - New feature: preprocessing all pages by removing all un-wanted characters