OpenWebSpider# v0.1.4

Released OpenWebSpider# v0.1.4 now with MP3s and PDFs support! New tables has been added please refer to this page to learn more: Database Structure

This is the complete CHANGELOG:

  • Mysql/NET Connector upgraded to 5.2.5.0
  • Enhanced encodings support
  • New feature: Support to META: “robots” (NOINDEX, NOFOLLOW)
  • New feature: New configuration file field: crawler_id
  • New field “crawler_id” in table “hostlist”
  • New table: crawler_act
  • New feature: Remote actions over running crawlers [Status, Play, Pause, Kill]
  • New file support: PDFs [Using PDFBox and IKVM]
  • New table: pdf
  • New file support: MP3s [Using UltraID3Lib]
  • New table: mp3
  • New feature: new command-line argument: −−pdf
  • New feature: new command-line argument: −−mp3

Go to the DOWNLOAD page

OpenWebSpider# explained with 4 video: Compile, Configure and RUN!

OpenWebSpider Shen139 07 May 2009 5 Comments

Mono 2.4 has been released

mono_24

Mono 2.4 has been released! The Mono Project aims to make developers productive and happy: Mono 2.4 is our gift to the world. Sponsored by Novell, the Mono open source project has an active and enthusiastic contributing community and is positioned to become the leading choice for development of Linux applications.

Continue Reading »

News &Release Shen139 11 Apr 2009 Comments Off

OpenWebSpider# v0.1.3

Released OpenWebSpider v0.1.3

CHANGELOG:

  • New feature: CRAWLER NAME and CRAWLER VERSION used in the User-Agent string in HTTP Requests
  • New feature: New configuration file field: sql_hostlist_where
  • New feature: new command-line argument: –keep-dup
  • BUG: fixed the regex used to extract URLs from <BASE>
  • BUG: fixed in the function that extracts URLs
  • BUG: fixed a bug in page.cs::normalizePage()
  • BUG: fixed minor bugs
  • BUG: fixed a bug in robots.txt’s parser
  • BUG: fixed a bug in page-rels handler

Source code and binary are available in the package: Download

Documentation of OpenWebSpider# v0.1

News &OpenWebSpider &Release Shen139 05 Nov 2008 2 Comments

Mono 2.0 has been released

The Mono Project aims to make developers productive and happy: Mono 2.0 is our gift to the world.
Sponsored by Novell (http://www.novell.com), the Mono open source project has an active and enthusiastic contributing community and is positioned to become the leading choice for development of Linux applications.

Feature Highlights

Multi-Platform
Runs on Linux, OS X, BSD, and Microsoft Windows, including x86, x86-64, ARM, s390, PowerPC and much more
Multi-Language
Develop in C# 3.0 (including LINQ), VB 8, Java, Python, Ruby (http://www.ironruby.net/), Eiffel (http://www.eiffel.com/), F# (http://research.microsoft.com/fsharp/), Oxygene (http://remobjects.com/oxygene), and more
Based on ECMA Standards
Built on an implementation of the ECMA Common Language Infrastructure and C#
Microsoft Compatible API
Run ASP.NET, ADO.NET, and Windows.Forms 2.0 applications without recompilation
Open Source, Free Software
Mono’s runtime, compilers, and libraries are distributed under OSI approved licenses and are available for dual-licensing
Comprehensive Technology Coverage
Bindings and managed implementations of many popular libraries and protocols

News &Release Shen139 07 Oct 2008 Comments Off

OpenWebSpider# v0.1.2

Released OpenWebSpider v0.1.2

CHANGELOG:

  • BUG: fixed the regex used to extract URLs from (I)FRAME
  • New feature: OpenWebSpider# can index images (new table: images)
  • New feature: new command-line argument: −−images
  • Improved Stress-test facility: now OpenWebSpider# doesn’t require a configuration file and a MySQL Server and it doesn’t check robots.txt (in stress-test mode)
  • Timeout in execution of SQL queries set to 120 seconds (2 minutes)
  • New feature: new configuration file fields: CRAWLER NAME and CRAWLER VERSION
  • New feature: CRAWLER NAME used over robots.txt

Source code and binary are available in the package: Download

Documentation of OpenWebSpider# v0.1

OpenWebSpider Shen139 09 Sep 2008 24 Comments

OpenWebSpider# v0.1.1

Released OpenWebSpider v0.1.1

CHANGELOG:

  • New feature: new command-line argument: −−req−timeout
  • New feature: new command-line argument: −−stress−test
  • BUG: fixed a bug in http.cs::getURL()

[New features here: OpenWebSpider# v0.1 Command Line Arguments/Usage]

[ Read more about: Why C#? Why .NET Framework? ]

Source code and binary are available in the package: Download

Documentation of OpenWebSpider# v0.1

News &OpenWebSpider &Release Shen139 21 Aug 2008 22 Comments

OpenWebSpider# v0.1

Released the first public version of OpenWebSpider entirely written in C#
[ Read more about: Why C#? Why .NET Framework? ]

Source code and binary are available in the package: Download

Documentation of OpenWebSpider# v0.1

News &OpenWebSpider &Release Shen139 29 Jul 2008 19 Comments

OpenWebSpider v0.7

Released OpenWebSpider v0.7 Source Code + Win32 Binary [Download]

News &Release Shen139 21 Jul 2008 Comments Off

Next Page »