<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: OpenWebSpider# v0.1</title>
	<atom:link href="http://www.openwebspider.org/2008/07/29/openwebspider-v01/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.openwebspider.org/2008/07/29/openwebspider-v01/</link>
	<description>Open Source Web Spider</description>
	<lastBuildDate>Tue, 19 May 2009 07:59:47 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
	<item>
		<title>By: neddy</title>
		<link>http://www.openwebspider.org/2008/07/29/openwebspider-v01/comment-page-1/#comment-620</link>
		<dc:creator>neddy</dc:creator>
		<pubDate>Wed, 03 Dec 2008 05:51:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.openwebspider.org/?p=37#comment-620</guid>
		<description>webie, in regards to sphinx.. I have been using ows + sphinx since about ows v1.1 about early last year some time and it works great!.. If any of you guys need a hand configuring your sphinx setups with ows flick me an email and ill be happy to help... 

If you want to see a working search that is indexed by sphinx, and spidered by ows check out http://www.hfvs.net/ Also the database that i use for sites to spider can be found at: http://cyberdawn.w3dt.net/ it contains about 90 million tld&#039;s.</description>
		<content:encoded><![CDATA[<p>webie, in regards to sphinx.. I have been using ows + sphinx since about ows v1.1 about early last year some time and it works great!.. If any of you guys need a hand configuring your sphinx setups with ows flick me an email and ill be happy to help&#8230; </p>
<p>If you want to see a working search that is indexed by sphinx, and spidered by ows check out <a href="http://www.hfvs.net/" rel="nofollow">http://www.hfvs.net/</a> Also the database that i use for sites to spider can be found at: <a href="http://cyberdawn.w3dt.net/" rel="nofollow">http://cyberdawn.w3dt.net/</a> it contains about 90 million tld&#8217;s.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shen139</title>
		<link>http://www.openwebspider.org/2008/07/29/openwebspider-v01/comment-page-1/#comment-178</link>
		<dc:creator>Shen139</dc:creator>
		<pubDate>Thu, 21 Aug 2008 13:38:28 +0000</pubDate>
		<guid isPermaLink="false">http://www.openwebspider.org/?p=37#comment-178</guid>
		<description>Hi,
call me Stefano! That&#039;s my name :-)

You have my email, when you want you can send me your work and I&#039;ll publish it!

;-)</description>
		<content:encoded><![CDATA[<p>Hi,<br />
call me Stefano! That&#8217;s my name <img src='http://www.openwebspider.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>You have my email, when you want you can send me your work and I&#8217;ll publish it!</p>
<p> <img src='http://www.openwebspider.org/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: webie</title>
		<link>http://www.openwebspider.org/2008/07/29/openwebspider-v01/comment-page-1/#comment-177</link>
		<dc:creator>webie</dc:creator>
		<pubDate>Thu, 21 Aug 2008 11:51:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.openwebspider.org/?p=37#comment-177</guid>
		<description>Hi Shen,

I have no problem if you want to add the source code front end to your project .

When i get time i want to create some ready made smarty templates for ows front end putting all this together would make a great free search engine project. 


Regard

Darren

PS Big Thanks to you Shen for creating OWS with out your skills and giving up your time we could not follow our online adventures!</description>
		<content:encoded><![CDATA[<p>Hi Shen,</p>
<p>I have no problem if you want to add the source code front end to your project .</p>
<p>When i get time i want to create some ready made smarty templates for ows front end putting all this together would make a great free search engine project. </p>
<p>Regard</p>
<p>Darren</p>
<p>PS Big Thanks to you Shen for creating OWS with out your skills and giving up your time we could not follow our online adventures!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shen139</title>
		<link>http://www.openwebspider.org/2008/07/29/openwebspider-v01/comment-page-1/#comment-175</link>
		<dc:creator>Shen139</dc:creator>
		<pubDate>Thu, 21 Aug 2008 11:06:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.openwebspider.org/?p=37#comment-175</guid>
		<description>Hi,
I really like Sphinx!
It&#039;s hundred times faster than MySQL Full-Text Index and I suggest it to everyone who needs good performance over a big index!

I think that http://www.linuxhostuk.co.uk/searchdemo really rocks :-) it&#039;s fast and cool!
I&#039;m interested in your front-end, I&#039;ll be very happy to publish it here!

Please contact me @ shen139 (at) openwebspider [.] org</description>
		<content:encoded><![CDATA[<p>Hi,<br />
I really like Sphinx!<br />
It&#8217;s hundred times faster than MySQL Full-Text Index and I suggest it to everyone who needs good performance over a big index!</p>
<p>I think that <a href="http://www.linuxhostuk.co.uk/searchdemo" rel="nofollow">http://www.linuxhostuk.co.uk/searchdemo</a> really rocks <img src='http://www.openwebspider.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  it&#8217;s fast and cool!<br />
I&#8217;m interested in your front-end, I&#8217;ll be very happy to publish it here!</p>
<p>Please contact me @ shen139 (at) openwebspider [.] org</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: webie</title>
		<link>http://www.openwebspider.org/2008/07/29/openwebspider-v01/comment-page-1/#comment-174</link>
		<dc:creator>webie</dc:creator>
		<pubDate>Thu, 21 Aug 2008 10:45:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.openwebspider.org/?p=37#comment-174</guid>
		<description>Hi shen,

I forgot to say i had new front end made for ows which uses sphinx it makes a powerfull alternative to the standard search its free i was trying to give back i was wondering if you would like to take a look and maybe add to ows site as an add on to ows users the search demo for ows &amp; sphinx is here www.linuxhostuk.co.uk/searchdemo let me know your thoughts.



Regards

Darren</description>
		<content:encoded><![CDATA[<p>Hi shen,</p>
<p>I forgot to say i had new front end made for ows which uses sphinx it makes a powerfull alternative to the standard search its free i was trying to give back i was wondering if you would like to take a look and maybe add to ows site as an add on to ows users the search demo for ows &amp; sphinx is here <a href="http://www.linuxhostuk.co.uk/searchdemo" rel="nofollow">http://www.linuxhostuk.co.uk/searchdemo</a> let me know your thoughts.</p>
<p>Regards</p>
<p>Darren</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: webie</title>
		<link>http://www.openwebspider.org/2008/07/29/openwebspider-v01/comment-page-1/#comment-173</link>
		<dc:creator>webie</dc:creator>
		<pubDate>Thu, 21 Aug 2008 10:37:07 +0000</pubDate>
		<guid isPermaLink="false">http://www.openwebspider.org/?p=37#comment-173</guid>
		<description>Hi Shen,

I thought it was me going mad many thanks i look forward to the fix


regards

darren</description>
		<content:encoded><![CDATA[<p>Hi Shen,</p>
<p>I thought it was me going mad many thanks i look forward to the fix</p>
<p>regards</p>
<p>darren</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shen139</title>
		<link>http://www.openwebspider.org/2008/07/29/openwebspider-v01/comment-page-1/#comment-170</link>
		<dc:creator>Shen139</dc:creator>
		<pubDate>Thu, 21 Aug 2008 08:47:12 +0000</pubDate>
		<guid isPermaLink="false">http://www.openwebspider.org/?p=37#comment-170</guid>
		<description>Hi,
it&#039;s impossible that the error is on www.fedora.org because that site has a robots.txt that disallow everything so you have to say me the exact website that makes OWS stop to work. (Please use -s (single host mode) to perform tests)

I&#039;ve tested the crlawler (a version with little modification respect the public one) over: &quot;download.opensuse.org&quot; (where there is a lot of un-handled content-type (ISO, torrent, ...) )

command line arguments:
--index http://download.opensuse.org/  --threads 6 --add-external -s

--[
...
T[0]            Downloading... [ http://download.opensuse.org:80/distribution/11.0/iso/dvd/openSUSE-11.0-DVD-i386.iso ] [Depth Level: 1]
T[0]            Downloaded 0 Kb (0 bytes) in 4609 ms
T[0]            HTTP Status Code: 200 -][- Content-Type: application/x-iso9660-image
T[0]            Indexing...NOT INDEXED [0 ms ]

T[2]            Downloading... [ http://download.opensuse.org:80/distribution/11.0/iso/cd/openSUSE-11.0-GNOME-LiveCD-i386.iso ] [Depth Level: 1]
T[2]            Downloaded 0 Kb (0 bytes) in 53015 ms
T[2]            HTTP Status Code: 200 -][- Content-Type: application/octet-stream
T[2]            Indexing...NOT INDEXED [0 ms ]

T[2]            Downloading... [ http://download.opensuse.org:80/distribution/11.0/iso/torrent/openSUSE-11.0-DVD-i386.torrent ] [Depth Level: 1]
T[2]            Downloaded 0 Kb (0 bytes) in 56609 ms
T[2]            HTTP Status Code: 200 -][- Content-Type: application/x-bittorrent
T[2]            Indexing...NOT INDEXED [0 ms ]

T[2]            Downloading... [ http://download.opensuse.org:80/distribution/11.0/iso/dvd/openSUSE-11.0-DVD-x86_64.iso.metalink ] [Depth Level: 2]
T[2]            Downloaded 0 Kb (0 bytes) in 953 ms
T[2]            HTTP Status Code: 200 -][- Content-Type: application/metalink+xml; charset=UTF-8
T[2]            Indexing...NOT INDEXED [0 ms ]
...
]--
everything seems to work well!!!

(few minutes later)

oh s..t!
I&#039;ve tested the same website with OpenWebSpiderCS v0.1 and it doesn&#039;t work well :-)
I fix the code, I add 2 new features and I publish it!!!

Thanks for the suggestion!
;-)</description>
		<content:encoded><![CDATA[<p>Hi,<br />
it&#8217;s impossible that the error is on <a href="http://www.fedora.org" rel="nofollow">http://www.fedora.org</a> because that site has a robots.txt that disallow everything so you have to say me the exact website that makes OWS stop to work. (Please use -s (single host mode) to perform tests)</p>
<p>I&#8217;ve tested the crlawler (a version with little modification respect the public one) over: &#8220;download.opensuse.org&#8221; (where there is a lot of un-handled content-type (ISO, torrent, &#8230;) )</p>
<p>command line arguments:<br />
&#8211;index <a href="http://download.opensuse.org/" rel="nofollow">http://download.opensuse.org/</a>  &#8211;threads 6 &#8211;add-external -s</p>
<p>&#8211;[<br />
...<br />
T[0]            Downloading&#8230; [ <a href="http://download.opensuse.org:80/distribution/11.0/iso/dvd/openSUSE-11.0-DVD-i386.iso" rel="nofollow">http://download.opensuse.org:80/distribution/11.0/iso/dvd/openSUSE-11.0-DVD-i386.iso</a> ] [Depth Level: 1]<br />
T[0]            Downloaded 0 Kb (0 bytes) in 4609 ms<br />
T[0]            HTTP Status Code: 200 -][- Content-Type: application/x-iso9660-image<br />
T[0]            Indexing&#8230;NOT INDEXED [0 ms ]</p>
<p>T[2]            Downloading&#8230; [ <a href="http://download.opensuse.org:80/distribution/11.0/iso/cd/openSUSE-11.0-GNOME-LiveCD-i386.iso" rel="nofollow">http://download.opensuse.org:80/distribution/11.0/iso/cd/openSUSE-11.0-GNOME-LiveCD-i386.iso</a> ] [Depth Level: 1]<br />
T[2]            Downloaded 0 Kb (0 bytes) in 53015 ms<br />
T[2]            HTTP Status Code: 200 -][- Content-Type: application/octet-stream<br />
T[2]            Indexing&#8230;NOT INDEXED [0 ms ]</p>
<p>T[2]            Downloading&#8230; [ <a href="http://download.opensuse.org:80/distribution/11.0/iso/torrent/openSUSE-11.0-DVD-i386.torrent" rel="nofollow">http://download.opensuse.org:80/distribution/11.0/iso/torrent/openSUSE-11.0-DVD-i386.torrent</a> ] [Depth Level: 1]<br />
T[2]            Downloaded 0 Kb (0 bytes) in 56609 ms<br />
T[2]            HTTP Status Code: 200 -][- Content-Type: application/x-bittorrent<br />
T[2]            Indexing&#8230;NOT INDEXED [0 ms ]</p>
<p>T[2]            Downloading&#8230; [ <a href="http://download.opensuse.org:80/distribution/11.0/iso/dvd/openSUSE-11.0-DVD-x86_64.iso.metalink" rel="nofollow">http://download.opensuse.org:80/distribution/11.0/iso/dvd/openSUSE-11.0-DVD-x86_64.iso.metalink</a> ] [Depth Level: 2]<br />
T[2]            Downloaded 0 Kb (0 bytes) in 953 ms<br />
T[2]            HTTP Status Code: 200 -][- Content-Type: application/metalink+xml; charset=UTF-8<br />
T[2]            Indexing&#8230;NOT INDEXED [0 ms ]<br />
&#8230;<br />
]&#8211;<br />
everything seems to work well!!!</p>
<p>(few minutes later)</p>
<p>oh s..t!<br />
I&#8217;ve tested the same website with OpenWebSpiderCS v0.1 and it doesn&#8217;t work well <img src='http://www.openwebspider.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /><br />
I fix the code, I add 2 new features and I publish it!!!</p>
<p>Thanks for the suggestion!<br />
 <img src='http://www.openwebspider.org/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: webie</title>
		<link>http://www.openwebspider.org/2008/07/29/openwebspider-v01/comment-page-1/#comment-167</link>
		<dc:creator>webie</dc:creator>
		<pubDate>Wed, 20 Aug 2008 23:06:17 +0000</pubDate>
		<guid isPermaLink="false">http://www.openwebspider.org/?p=37#comment-167</guid>
		<description>Hi Shen,

Going back to my post when ows see&#039;s zip,file etc it it gives me ‘ HTTP Status Code: 0 -] Error: Response received from server was null ‘ then stops crawling and every link returns the same error.

I have reduced the threads to only 6 and also set crawl delay but still will not index after hitting zip file or ISO file etc

This is my commands

--index www.fedora.org  --threads 6 --add-external --crawl-delay 1

 may be you can point me in right direction in how to find what maybe causing this error.

Regards

Darren</description>
		<content:encoded><![CDATA[<p>Hi Shen,</p>
<p>Going back to my post when ows see&#8217;s zip,file etc it it gives me ‘ HTTP Status Code: 0 -] Error: Response received from server was null ‘ then stops crawling and every link returns the same error.</p>
<p>I have reduced the threads to only 6 and also set crawl delay but still will not index after hitting zip file or ISO file etc</p>
<p>This is my commands</p>
<p>&#8211;index <a href="http://www.fedora.org" rel="nofollow">http://www.fedora.org</a>  &#8211;threads 6 &#8211;add-external &#8211;crawl-delay 1</p>
<p> may be you can point me in right direction in how to find what maybe causing this error.</p>
<p>Regards</p>
<p>Darren</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shen139</title>
		<link>http://www.openwebspider.org/2008/07/29/openwebspider-v01/comment-page-1/#comment-166</link>
		<dc:creator>Shen139</dc:creator>
		<pubDate>Wed, 20 Aug 2008 09:20:45 +0000</pubDate>
		<guid isPermaLink="false">http://www.openwebspider.org/?p=37#comment-166</guid>
		<description>OpenWebSpiderCS v0.1 has an hard-coded timeout for each request: 60 seconds.
If an HTTP request take longer that that time the crawler will ignore that page.
(I&#039;ve added in my TODO list a new argument with which specify that timeout via the command line! Example: --timeout 180)

Another problem could be that the Web Server is under load and can&#039;t serve pages within 60 seconds!
I can suggest you to:
- use a crawl delay (--crawl-delay &lt;seconds[1-20]&gt;)
   the number of seconds between the download of a page and the next one
- use a lower number of threads

When ows hit files that has a content-type un-handled it doesn&#039;t download them at all, so I think that your case is only a coincidence.</description>
		<content:encoded><![CDATA[<p>OpenWebSpiderCS v0.1 has an hard-coded timeout for each request: 60 seconds.<br />
If an HTTP request take longer that that time the crawler will ignore that page.<br />
(I&#8217;ve added in my TODO list a new argument with which specify that timeout via the command line! Example: &#8211;timeout 180)</p>
<p>Another problem could be that the Web Server is under load and can&#8217;t serve pages within 60 seconds!<br />
I can suggest you to:<br />
- use a crawl delay (&#8211;crawl-delay <seconds [1-20]>)<br />
   the number of seconds between the download of a page and the next one<br />
- use a lower number of threads</p>
<p>When ows hit files that has a content-type un-handled it doesn&#8217;t download them at all, so I think that your case is only a coincidence.</seconds></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shen139</title>
		<link>http://www.openwebspider.org/2008/07/29/openwebspider-v01/comment-page-1/#comment-165</link>
		<dc:creator>Shen139</dc:creator>
		<pubDate>Wed, 20 Aug 2008 08:51:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.openwebspider.org/?p=37#comment-165</guid>
		<description>Hi,
Yes, OWS v0.7 is a C project.
I suggest you to compile it with Microsoft Visual C++ (the express edition is free to use and to download from the MS website)

&quot;openwebspider.vcproj&quot; is the project file for OWS v0.7

MonoDevelop is an IDE for building .NET projects under Linux!</description>
		<content:encoded><![CDATA[<p>Hi,<br />
Yes, OWS v0.7 is a C project.<br />
I suggest you to compile it with Microsoft Visual C++ (the express edition is free to use and to download from the MS website)</p>
<p>&#8220;openwebspider.vcproj&#8221; is the project file for OWS v0.7</p>
<p>MonoDevelop is an IDE for building .NET projects under Linux!</p>
]]></content:encoded>
	</item>
</channel>
</rss>

