OpenWebSpider XMas edition

I am very pleased to announce a new phantasmagorical version of openwebspider!!!
Bazinga! To be honest this is just the announcement that I’m working on it and that things are going quite well.
I do not know when I’ll be able to do a release but hopefully by January.

It all started with C, then moving to C# and at the end to javascript.
JavaScript??? Why JavaScript?
Well, lately I’ve worked a lot in JS, I’ve loved nodejs and I wanted to try to have fun with yet another rewriting of a project that would otherwise have ended up in the oblivion.
And so… here’s to you (right now I can only show you some screenshots).

Merry Christmas dudes!

OpenWebSpider mirrorjs UI

 

OpenWebSpider mirrorjs UI

Active Workers

 

OpenWebSpider mirrorjs UI

Workers history (log)

 

Search (inside the UI)

Search (inside the UI)

 

OpenWebSpider mirrorjs UI

Pages Map (who links to and who is linked by)

 

OpenWebSpider mirrorjs UI

Database settings

 
What’s behind?
JavaScript, nodejs, mirrorjs and mysql.

A special thanks goes to “relane” of mirrorjs that helped me a lot to understand the magic behind this new cool javascript framework on which is based this version of openwebspider.

 

 

News &OpenWebSpider Shen139 24 Dec 2014 No Comments

OpenWebSpider# v0.1.4

Released OpenWebSpider# v0.1.4 now with MP3s and PDFs support! New tables has been added please refer to this page to learn more: Database Structure

This is the complete CHANGELOG:

  • Mysql/NET Connector upgraded to 5.2.5.0
  • Enhanced encodings support
  • New feature: Support to META: “robots” (NOINDEX, NOFOLLOW)
  • New feature: New configuration file field: crawler_id
  • New field “crawler_id” in table “hostlist”
  • New table: crawler_act
  • New feature: Remote actions over running crawlers [Status, Play, Pause, Kill]
  • New file support: PDFs [Using PDFBox and IKVM]
  • New table: pdf
  • New file support: MP3s [Using UltraID3Lib]
  • New table: mp3
  • New feature: new command-line argument: −−pdf
  • New feature: new command-line argument: −−mp3

Go to the DOWNLOAD page

OpenWebSpider# explained with 4 video: Compile, Configure and RUN!

OpenWebSpider Shen139 07 May 2009 5 Comments

Mono 2.4 has been released

mono_24

Mono 2.4 has been released! The Mono Project aims to make developers productive and happy: Mono 2.4 is our gift to the world. Sponsored by Novell, the Mono open source project has an active and enthusiastic contributing community and is positioned to become the leading choice for development of Linux applications.

Continue Reading »

News &Release Shen139 11 Apr 2009 Comments Off

OpenWebSpider# v0.1.3

Released OpenWebSpider v0.1.3

CHANGELOG:

  • New feature: CRAWLER NAME and CRAWLER VERSION used in the User-Agent string in HTTP Requests
  • New feature: New configuration file field: sql_hostlist_where
  • New feature: new command-line argument: –keep-dup
  • BUG: fixed the regex used to extract URLs from <BASE>
  • BUG: fixed in the function that extracts URLs
  • BUG: fixed a bug in page.cs::normalizePage()
  • BUG: fixed minor bugs
  • BUG: fixed a bug in robots.txt’s parser
  • BUG: fixed a bug in page-rels handler

Source code and binary are available in the package: Download

Documentation of OpenWebSpider# v0.1

News &OpenWebSpider &Release Shen139 05 Nov 2008 2 Comments

Mono 2.0 has been released

The Mono Project aims to make developers productive and happy: Mono 2.0 is our gift to the world.
Sponsored by Novell (http://www.novell.com), the Mono open source project has an active and enthusiastic contributing community and is positioned to become the leading choice for development of Linux applications.

Feature Highlights

Multi-Platform
Runs on Linux, OS X, BSD, and Microsoft Windows, including x86, x86-64, ARM, s390, PowerPC and much more
Multi-Language
Develop in C# 3.0 (including LINQ), VB 8, Java, Python, Ruby (http://www.ironruby.net/), Eiffel (http://www.eiffel.com/), F# (http://research.microsoft.com/fsharp/), Oxygene (http://remobjects.com/oxygene), and more
Based on ECMA Standards
Built on an implementation of the ECMA Common Language Infrastructure and C#
Microsoft Compatible API
Run ASP.NET, ADO.NET, and Windows.Forms 2.0 applications without recompilation
Open Source, Free Software
Mono’s runtime, compilers, and libraries are distributed under OSI approved licenses and are available for dual-licensing
Comprehensive Technology Coverage
Bindings and managed implementations of many popular libraries and protocols

News &Release Shen139 07 Oct 2008 Comments Off

OpenWebSpider# v0.1.2

Released OpenWebSpider v0.1.2

CHANGELOG:

  • BUG: fixed the regex used to extract URLs from (I)FRAME
  • New feature: OpenWebSpider# can index images (new table: images)
  • New feature: new command-line argument: −−images
  • Improved Stress-test facility: now OpenWebSpider# doesn’t require a configuration file and a MySQL Server and it doesn’t check robots.txt (in stress-test mode)
  • Timeout in execution of SQL queries set to 120 seconds (2 minutes)
  • New feature: new configuration file fields: CRAWLER NAME and CRAWLER VERSION
  • New feature: CRAWLER NAME used over robots.txt

Source code and binary are available in the package: Download

Documentation of OpenWebSpider# v0.1

OpenWebSpider Shen139 09 Sep 2008 24 Comments

OpenWebSpider# v0.1.1

Released OpenWebSpider v0.1.1

CHANGELOG:

  • New feature: new command-line argument: −−req−timeout
  • New feature: new command-line argument: −−stress−test
  • BUG: fixed a bug in http.cs::getURL()

[New features here: OpenWebSpider# v0.1 Command Line Arguments/Usage]

[ Read more about: Why C#? Why .NET Framework? ]

Source code and binary are available in the package: Download

Documentation of OpenWebSpider# v0.1

News &OpenWebSpider &Release Shen139 21 Aug 2008 22 Comments

OpenWebSpider# v0.1

Released the first public version of OpenWebSpider entirely written in C#
[ Read more about: Why C#? Why .NET Framework? ]

Source code and binary are available in the package: Download

Documentation of OpenWebSpider# v0.1

News &OpenWebSpider &Release Shen139 29 Jul 2008 19 Comments

Next Page »