<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: OpenWebSpider# v0.1.2</title>
	<atom:link href="http://www.openwebspider.org/2008/09/09/openwebspider-v012/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.openwebspider.org/2008/09/09/openwebspider-v012/</link>
	<description>Open Source Web Spider</description>
	<lastBuildDate>Tue, 19 May 2009 07:59:47 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
	<item>
		<title>By: SHL</title>
		<link>http://www.openwebspider.org/2008/09/09/openwebspider-v012/comment-page-1/#comment-277</link>
		<dc:creator>SHL</dc:creator>
		<pubDate>Sat, 01 Nov 2008 19:47:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.openwebspider.org/?p=62#comment-277</guid>
		<description>[cite=shen139]
OWS is structured that it uses the host_id (the ID of the current domain) for many tasks, and so using a regex or any other trick we will destroy the fundamentals of its core.
[/cite]

Ok, then I understand why it&#039;s not an easy solution.

[cite=shen139]
I’m planning a new version of OWS with this feature… 
[/cite]

Looking forward to that!

Thanks for your replies!

Kind Regards
// Samuel</description>
		<content:encoded><![CDATA[<p>[cite=shen139]<br />
OWS is structured that it uses the host_id (the ID of the current domain) for many tasks, and so using a regex or any other trick we will destroy the fundamentals of its core.<br />
[/cite]</p>
<p>Ok, then I understand why it&#8217;s not an easy solution.</p>
<p>[cite=shen139]<br />
I’m planning a new version of OWS with this feature…<br />
[/cite]</p>
<p>Looking forward to that!</p>
<p>Thanks for your replies!</p>
<p>Kind Regards<br />
// Samuel</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bernd</title>
		<link>http://www.openwebspider.org/2008/09/09/openwebspider-v012/comment-page-1/#comment-268</link>
		<dc:creator>bernd</dc:creator>
		<pubDate>Wed, 29 Oct 2008 13:16:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.openwebspider.org/?p=62#comment-268</guid>
		<description>many thanks. I have send you a email.</description>
		<content:encoded><![CDATA[<p>many thanks. I have send you a email.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shen139</title>
		<link>http://www.openwebspider.org/2008/09/09/openwebspider-v012/comment-page-1/#comment-266</link>
		<dc:creator>Shen139</dc:creator>
		<pubDate>Wed, 29 Oct 2008 10:11:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.openwebspider.org/?p=62#comment-266</guid>
		<description>Hi bernd,
if a domain is with status = 1 means that that domain has been indexed.
OWS insert new domains with status = 0; status = 1 -&gt; indexed; status = 2 -&gt; indexing!

If you have 0 indexed_pages per domain means that OWS has not been able to index that sites. I could say because there was an error somewhere.
Maybe OWS had an error indexing the first domain and then it wasn&#039;t able to index the rest of the domains.

I don&#039;t speak German and I don&#039;t know what “Der Wartezustand wurde aufgrund eines abgebrochenen Mutex beendet?” is! That error message isn&#039;t inside OWS itselft but in your Framework.

I can read Mutex... so I guess that you had a problem with mutexes.

I can only suggest you to wait for next version (many bug fixes in that version) or email me and I&#039;ll send you a copy.

Stefano</description>
		<content:encoded><![CDATA[<p>Hi bernd,<br />
if a domain is with status = 1 means that that domain has been indexed.<br />
OWS insert new domains with status = 0; status = 1 -> indexed; status = 2 -> indexing!</p>
<p>If you have 0 indexed_pages per domain means that OWS has not been able to index that sites. I could say because there was an error somewhere.<br />
Maybe OWS had an error indexing the first domain and then it wasn&#8217;t able to index the rest of the domains.</p>
<p>I don&#8217;t speak German and I don&#8217;t know what “Der Wartezustand wurde aufgrund eines abgebrochenen Mutex beendet?” is! That error message isn&#8217;t inside OWS itselft but in your Framework.</p>
<p>I can read Mutex&#8230; so I guess that you had a problem with mutexes.</p>
<p>I can only suggest you to wait for next version (many bug fixes in that version) or email me and I&#8217;ll send you a copy.</p>
<p>Stefano</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bernd</title>
		<link>http://www.openwebspider.org/2008/09/09/openwebspider-v012/comment-page-1/#comment-264</link>
		<dc:creator>bernd</dc:creator>
		<pubDate>Wed, 29 Oct 2008 05:23:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.openwebspider.org/?p=62#comment-264</guid>
		<description>What means &quot;Der Wartezustand wurde aufgrund eines abgebrochenen Mutex beendet?&quot;

On any domains I have this error.</description>
		<content:encoded><![CDATA[<p>What means &#8220;Der Wartezustand wurde aufgrund eines abgebrochenen Mutex beendet?&#8221;</p>
<p>On any domains I have this error.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bernd</title>
		<link>http://www.openwebspider.org/2008/09/09/openwebspider-v012/comment-page-1/#comment-262</link>
		<dc:creator>bernd</dc:creator>
		<pubDate>Tue, 28 Oct 2008 18:30:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.openwebspider.org/?p=62#comment-262</guid>
		<description>Perhaps is my English to bad. I dont know.

&quot;If all the domains in hostlist are indexed OWS won’t have anything to do.&quot;

In my hostlist are 15 domains with status 1 but 0 indexed_pages. When will the spider crawl this? This domains found on a crawl, but the spider has only insert the domain but has no indexed.

&quot;If example.de contains only: example1.de and example2.de OWS will only index these websites, no others! OK?&quot;

Ok, but what is if the example1.de contains the example1A.de ?
example.de
     example1.de
         example1A.de

Do the spider crawl crawl example.de , example1.de and example1A.de ?


Many thanks</description>
		<content:encoded><![CDATA[<p>Perhaps is my English to bad. I dont know.</p>
<p>&#8220;If all the domains in hostlist are indexed OWS won’t have anything to do.&#8221;</p>
<p>In my hostlist are 15 domains with status 1 but 0 indexed_pages. When will the spider crawl this? This domains found on a crawl, but the spider has only insert the domain but has no indexed.</p>
<p>&#8220;If example.de contains only: example1.de and example2.de OWS will only index these websites, no others! OK?&#8221;</p>
<p>Ok, but what is if the example1.de contains the example1A.de ?<br />
example.de<br />
     example1.de<br />
         example1A.de</p>
<p>Do the spider crawl crawl example.de , example1.de and example1A.de ?</p>
<p>Many thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shen139</title>
		<link>http://www.openwebspider.org/2008/09/09/openwebspider-v012/comment-page-1/#comment-259</link>
		<dc:creator>Shen139</dc:creator>
		<pubDate>Tue, 28 Oct 2008 13:41:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.openwebspider.org/?p=62#comment-259</guid>
		<description>Using −−add−external OWS will add all hosts different to the current one to the table hostlist!
If example.de contains only: example1.de and example2.de OWS will only index these websites, no others! OK?

What uncrawled domain? Who tells to OWS what to crawl? (the table hostlist)
If all the domains in hostlist are indexed OWS won&#039;t have anything to do.
You should set the status of the domains to: 0
or run OWS to that domain with: openwebspider --index example.de and then: openwebspider --index example1.de and openwebspider --index example2.de

Do you understand?
(Maybe I misunderstood)</description>
		<content:encoded><![CDATA[<p>Using −−add−external OWS will add all hosts different to the current one to the table hostlist!<br />
If example.de contains only: example1.de and example2.de OWS will only index these websites, no others! OK?</p>
<p>What uncrawled domain? Who tells to OWS what to crawl? (the table hostlist)<br />
If all the domains in hostlist are indexed OWS won&#8217;t have anything to do.<br />
You should set the status of the domains to: 0<br />
or run OWS to that domain with: openwebspider &#8211;index example.de and then: openwebspider &#8211;index example1.de and openwebspider &#8211;index example2.de</p>
<p>Do you understand?<br />
(Maybe I misunderstood)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bernd</title>
		<link>http://www.openwebspider.org/2008/09/09/openwebspider-v012/comment-page-1/#comment-258</link>
		<dc:creator>bernd</dc:creator>
		<pubDate>Tue, 28 Oct 2008 13:28:54 +0000</pubDate>
		<guid isPermaLink="false">http://www.openwebspider.org/?p=62#comment-258</guid>
		<description>Thanks for reply. I have one more question. The spider find urls on a host and insert this in the database. But not every domain would be crawl by spider. Why ?

I used --add-external .

Example:
spider --&gt; example.de (find external example1.de and external2.de) all internal links are spidered.

in database in table hostlist now I find :
23  	example.de  	80  	1  	2008-10-28  	45
24  	example1.de  	80  	1  	2008-10-28  	0
25  	example2.de  	80  	1  	2008-10-28  	0

The spider says after crawl, that no more domians (or hosts?) are find etc.
bye bye

Could you include a function, that the spider crawl a uncrawled domain from database if he not found a link on the current spidered site?

Sorry for bad english.</description>
		<content:encoded><![CDATA[<p>Thanks for reply. I have one more question. The spider find urls on a host and insert this in the database. But not every domain would be crawl by spider. Why ?</p>
<p>I used &#8211;add-external .</p>
<p>Example:<br />
spider &#8211;&gt; example.de (find external example1.de and external2.de) all internal links are spidered.</p>
<p>in database in table hostlist now I find :<br />
23  	example.de  	80  	1  	2008-10-28  	45<br />
24  	example1.de  	80  	1  	2008-10-28  	0<br />
25  	example2.de  	80  	1  	2008-10-28  	0</p>
<p>The spider says after crawl, that no more domians (or hosts?) are find etc.<br />
bye bye</p>
<p>Could you include a function, that the spider crawl a uncrawled domain from database if he not found a link on the current spidered site?</p>
<p>Sorry for bad english.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shen139</title>
		<link>http://www.openwebspider.org/2008/09/09/openwebspider-v012/comment-page-1/#comment-257</link>
		<dc:creator>Shen139</dc:creator>
		<pubDate>Tue, 28 Oct 2008 09:38:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.openwebspider.org/?p=62#comment-257</guid>
		<description>Hi,
Thanks (for the cool spider)!

This is a known bug fixed by OpenWebSpider v0.1.3

I&#039;ll release it as soon as possible (I hope within this week). It fixes many bugs.

I&#039;ve tested your command line arguments with new version and it works fine!

Stefano</description>
		<content:encoded><![CDATA[<p>Hi,<br />
Thanks (for the cool spider)!</p>
<p>This is a known bug fixed by OpenWebSpider v0.1.3</p>
<p>I&#8217;ll release it as soon as possible (I hope within this week). It fixes many bugs.</p>
<p>I&#8217;ve tested your command line arguments with new version and it works fine!</p>
<p>Stefano</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bernd</title>
		<link>http://www.openwebspider.org/2008/09/09/openwebspider-v012/comment-page-1/#comment-256</link>
		<dc:creator>bernd</dc:creator>
		<pubDate>Mon, 27 Oct 2008 17:38:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.openwebspider.org/?p=62#comment-256</guid>
		<description>Hello, 

this is a cool Spider ;)

I´ve tested this but there is a Errormessage: Getting Urls...Error: Der Index war außerhalb des Arraybereiches.

I tested with this command line.
--index one3p.de -t 10 -r 1 --crawl-delay 5 --req-timeout 30 -l 3000 -s

Another url is working fine but this one works not (http://one3p.de) Whats wrong?

many thanks</description>
		<content:encoded><![CDATA[<p>Hello, </p>
<p>this is a cool Spider <img src='http://www.openwebspider.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>I´ve tested this but there is a Errormessage: Getting Urls&#8230;Error: Der Index war außerhalb des Arraybereiches.</p>
<p>I tested with this command line.<br />
&#8211;index one3p.de -t 10 -r 1 &#8211;crawl-delay 5 &#8211;req-timeout 30 -l 3000 -s</p>
<p>Another url is working fine but this one works not (<a href="http://one3p.de" rel="nofollow">http://one3p.de</a>) Whats wrong?</p>
<p>many thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shen139</title>
		<link>http://www.openwebspider.org/2008/09/09/openwebspider-v012/comment-page-1/#comment-253</link>
		<dc:creator>Shen139</dc:creator>
		<pubDate>Wed, 22 Oct 2008 14:35:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.openwebspider.org/?p=62#comment-253</guid>
		<description>Hi,
you are right, a regex will solve the problem but OWS is structured that it uses the host_id (the ID of the current domain) for many tasks, and so using a regex or any other trick we will destroy the fundamentals of its core.

www.domain.com it&#039;s different from domain.com! Many webservers could have different pages for domain with and without WWW. Do you understand? So it&#039;s not simple to solve your way!

I&#039;m planning a new version of OWS with this feature... OWS 0.7 had a command-line argument:
--[
-F

      (Free Indexing mode)

   1.
                This is a new feature in OpenWebSpider v0.6! Free indexing mode means that
      the web spider will index pages while it encounter them!
      For example if we have a web site with the following structure:

                home_page
              /      &#124;    \
           link1   link2  http://www.example.com/linkX

      the web spider will index (in order): home_page, link1, link2 and http://www.example.com/linkX
      Whereas without this argument the web spider will index only: home_page, link1 and link2
]--
but it wasn&#039;t safe to use and created many problems!</description>
		<content:encoded><![CDATA[<p>Hi,<br />
you are right, a regex will solve the problem but OWS is structured that it uses the host_id (the ID of the current domain) for many tasks, and so using a regex or any other trick we will destroy the fundamentals of its core.</p>
<p><a href="http://www.domain.com" rel="nofollow">http://www.domain.com</a> it&#8217;s different from domain.com! Many webservers could have different pages for domain with and without WWW. Do you understand? So it&#8217;s not simple to solve your way!</p>
<p>I&#8217;m planning a new version of OWS with this feature&#8230; OWS 0.7 had a command-line argument:<br />
&#8211;[<br />
-F</p>
<p>      (Free Indexing mode)</p>
<p>   1.<br />
                This is a new feature in OpenWebSpider v0.6! Free indexing mode means that<br />
      the web spider will index pages while it encounter them!<br />
      For example if we have a web site with the following structure:</p>
<p>                home_page<br />
              /      |    \<br />
           link1   link2  <a href="http://www.example.com/linkX" rel="nofollow">http://www.example.com/linkX</a></p>
<p>      the web spider will index (in order): home_page, link1, link2 and <a href="http://www.example.com/linkX" rel="nofollow">http://www.example.com/linkX</a><br />
      Whereas without this argument the web spider will index only: home_page, link1 and link2<br />
]&#8211;<br />
but it wasn&#8217;t safe to use and created many problems!</p>
]]></content:encoded>
	</item>
</channel>
</rss>

