<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Ways to process and use Wikipedia dumps</title>
	<atom:link href="http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/</link>
	<description>( to ) ? be : ! be;</description>
	<pubDate>Tue, 06 Jan 2009 23:43:21 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5</generator>
		<item>
		<title>By: b.b.goyal, barnala</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-10138</link>
		<dc:creator>b.b.goyal, barnala</dc:creator>
		<pubDate>Fri, 07 Nov 2008 07:13:56 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-10138</guid>
		<description>sir, i want to display wikipedia contents on my site. is it possible? if yes, how? regrds.</description>
		<content:encoded><![CDATA[<p>sir, i want to display wikipedia contents on my site. is it possible? if yes, how? regrds.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andy Bailey</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-7924</link>
		<dc:creator>Andy Bailey</dc:creator>
		<pubDate>Mon, 29 Sep 2008 11:00:41 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-7924</guid>
		<description>Prashanth,
I managed to load the sql imports into the table 'text'. It now holds just over 10,000,000 records but i can't do anything with it. Everytime i try a select count(*) or a Select * limit 0,1 i get no return from MySQL. I may have to abandon all and use a mediaWiki method instead. Did you have any trouble accessing the data? One thing i've noticed is that the primary key index (old_id) is taking up no diskspace which seems a little weird.</description>
		<content:encoded><![CDATA[<p>Prashanth,<br />
I managed to load the sql imports into the table &#8216;text&#8217;. It now holds just over 10,000,000 records but i can&#8217;t do anything with it. Everytime i try a select count(*) or a Select * limit 0,1 i get no return from MySQL. I may have to abandon all and use a mediaWiki method instead. Did you have any trouble accessing the data? One thing i&#8217;ve noticed is that the primary key index (old_id) is taking up no diskspace which seems a little weird.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andy Bailey</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-5875</link>
		<dc:creator>Andy Bailey</dc:creator>
		<pubDate>Mon, 25 Aug 2008 00:11:55 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-5875</guid>
		<description>Prashanth,
Thanks for a great post. I found i had to make some small alterations for mysql 5.0.21

CREATE DATABASE wiki;
use wiki;
create table text {
old_int int UNSIGNED NOT NULL, AUTO_INCREMENT,
old_text mediumblob NOT NULL,
old_flags tinyblob NOT NULL,
PRIMARY KEY (old_id)
} MAX_ROWS 10000000 AVG_ROW_LENGTH=10240;</description>
		<content:encoded><![CDATA[<p>Prashanth,<br />
Thanks for a great post. I found i had to make some small alterations for mysql 5.0.21</p>
<p>CREATE DATABASE wiki;<br />
use wiki;<br />
create table text {<br />
old_int int UNSIGNED NOT NULL, AUTO_INCREMENT,<br />
old_text mediumblob NOT NULL,<br />
old_flags tinyblob NOT NULL,<br />
PRIMARY KEY (old_id)<br />
} MAX_ROWS 10000000 AVG_ROW_LENGTH=10240;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: prashanthellina</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-5263</link>
		<dc:creator>prashanthellina</dc:creator>
		<pubDate>Mon, 04 Aug 2008 15:57:32 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-5263</guid>
		<description>Jared, the intra-wiki links are internal automatically.</description>
		<content:encoded><![CDATA[<p>Jared, the intra-wiki links are internal automatically.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jared</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-5202</link>
		<dc:creator>Jared</dc:creator>
		<pubDate>Fri, 01 Aug 2008 20:52:04 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-5202</guid>
		<description>Does anyone know how to make all the links internal to my domain after we do the dump?</description>
		<content:encoded><![CDATA[<p>Does anyone know how to make all the links internal to my domain after we do the dump?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: prashanthellina</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4769</link>
		<dc:creator>prashanthellina</dc:creator>
		<pubDate>Thu, 17 Jul 2008 16:09:11 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4769</guid>
		<description>Indrajeet, I did not understand your question. What are you trying to do?</description>
		<content:encoded><![CDATA[<p>Indrajeet, I did not understand your question. What are you trying to do?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: prashanthellina</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4768</link>
		<dc:creator>prashanthellina</dc:creator>
		<pubDate>Thu, 17 Jul 2008 16:08:25 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4768</guid>
		<description>Rich, way to go! That's a wonderful way to learn. Am glad I was of help. Enjoy!
RK, will get back to you by mail shortly.</description>
		<content:encoded><![CDATA[<p>Rich, way to go! That&#8217;s a wonderful way to learn. Am glad I was of help. Enjoy!<br />
RK, will get back to you by mail shortly.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: RK</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4567</link>
		<dc:creator>RK</dc:creator>
		<pubDate>Tue, 08 Jul 2008 17:47:52 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4567</guid>
		<description>Mailed you but couldnt get a revert. You are the only person i know who can get this done :)

RKs last blog post..&lt;a href="http://www.managementparadise.com/forums/introduce-yourself/34682-naishadh86-intro.html" rel="nofollow"&gt;naishadh86 Intro&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p>Mailed you but couldnt get a revert. You are the only person i know who can get this done <img src='http://blog.prashanthellina.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /><br />
RKs last blog post..<a href="http://www.managementparadise.com/forums/introduce-yourself/34682-naishadh86-intro.html" rel="nofollow" onclick="javascript:urchinTracker ('/outbound/comment/www.managementparadise.com');">naishadh86 Intro</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: rich</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4562</link>
		<dc:creator>rich</dc:creator>
		<pubDate>Tue, 08 Jul 2008 13:13:13 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4562</guid>
		<description>THANK YOU!!  This is extremely helpful.  I downloaded one of the dumps and had no clue what to do with it.  I figured I would just start and tinker with it along the way (that's how I learn everything for computers, treat it like a puzzle to solve and I end up teaching myself just about anything)  but when I saw the dump would decompress into massively large files and take a while to do it, I decided to take a step back and slow down before I blow up my computer in the process.  This helped me more than you can know, thanks so much!!!!</description>
		<content:encoded><![CDATA[<p>THANK YOU!!  This is extremely helpful.  I downloaded one of the dumps and had no clue what to do with it.  I figured I would just start and tinker with it along the way (that&#8217;s how I learn everything for computers, treat it like a puzzle to solve and I end up teaching myself just about anything)  but when I saw the dump would decompress into massively large files and take a while to do it, I decided to take a step back and slow down before I blow up my computer in the process.  This helped me more than you can know, thanks so much!!!!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: indrajeet</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4538</link>
		<dc:creator>indrajeet</dc:creator>
		<pubDate>Mon, 07 Jul 2008 11:32:34 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4538</guid>
		<description>tell me, how to dump the database after installing mediawiki 
please tell me i am waiting for ur reply.






thanks and regards 
Indrajeet Dhanjode</description>
		<content:encoded><![CDATA[<p>tell me, how to dump the database after installing mediawiki<br />
please tell me i am waiting for ur reply.</p>
<p>thanks and regards<br />
Indrajeet Dhanjode</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: prashanthellina</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4339</link>
		<dc:creator>prashanthellina</dc:creator>
		<pubDate>Sun, 15 Jun 2008 05:12:29 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4339</guid>
		<description>RK, I prefer communicating via email. Please mail me at the same address and we can have a discussion.</description>
		<content:encoded><![CDATA[<p>RK, I prefer communicating via email. Please mail me at the same address and we can have a discussion.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: RK</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4334</link>
		<dc:creator>RK</dc:creator>
		<pubDate>Sat, 14 Jun 2008 15:35:38 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4334</guid>
		<description>Hello,

I've sent you an add request on gmail for a site we need done integrated with mediawiki and wikipedia database dump. PLease accept the add request so that we can discuss it in detail.</description>
		<content:encoded><![CDATA[<p>Hello,</p>
<p>I&#8217;ve sent you an add request on gmail for a site we need done integrated with mediawiki and wikipedia database dump. PLease accept the add request so that we can discuss it in detail.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: prashanthellina</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4146</link>
		<dc:creator>prashanthellina</dc:creator>
		<pubDate>Fri, 23 May 2008 14:26:12 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4146</guid>
		<description>Thanks for the info, Totic.</description>
		<content:encoded><![CDATA[<p>Thanks for the info, Totic.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: totic</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4076</link>
		<dc:creator>totic</dc:creator>
		<pubDate>Sun, 18 May 2008 07:06:25 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4076</guid>
		<description>I also had the problem with wget, it seems to always fail when the files are incredible large, my solution was to just 
use curl

example:

curl http://download.wikimedia.org/enwiki/20070206/enwiki-20070206-pages-articles.xml.bz2 -o enwiki-20070206-pages-articles.xml.bz2</description>
		<content:encoded><![CDATA[<p>I also had the problem with wget, it seems to always fail when the files are incredible large, my solution was to just<br />
use curl</p>
<p>example:</p>
<p>curl <a href="http://download.wikimedia.org/enwiki/20070206/enwiki-20070206-pages-articles.xml.bz2" rel="nofollow" onclick="javascript:urchinTracker ('/outbound/comment/download.wikimedia.org');">http://download.wikimedia.org/enwiki/20070206/enwiki-20070206-pages-articles.xml.bz2</a> -o enwiki-20070206-pages-articles.xml.bz2</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: prashanthellina</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-3654</link>
		<dc:creator>prashanthellina</dc:creator>
		<pubDate>Sun, 27 Apr 2008 06:28:39 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-3654</guid>
		<description>Great! It worked for me so I assume it is correct. I have not run this against the latest xml though.</description>
		<content:encoded><![CDATA[<p>Great! It worked for me so I assume it is correct. I have not run this against the latest xml though.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: someone</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-3556</link>
		<dc:creator>someone</dc:creator>
		<pubDate>Fri, 25 Apr 2008 13:23:31 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-3556</guid>
		<description>Hi

I used your split script.._thanks_ a lot for it! 
but are you sure that it is accurate? does it need to be updated?</description>
		<content:encoded><![CDATA[<p>Hi</p>
<p>I used your split script.._thanks_ a lot for it!<br />
but are you sure that it is accurate? does it need to be updated?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: prashanthellina</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-198</link>
		<dc:creator>prashanthellina</dc:creator>
		<pubDate>Wed, 24 Oct 2007 14:55:19 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-198</guid>
		<description>Hi Satheesh,

My email is prashanthBLAHellina AT gmail DOTT com (remove BLAH).</description>
		<content:encoded><![CDATA[<p>Hi Satheesh,</p>
<p>My email is prashanthBLAHellina AT gmail DOTT com (remove BLAH).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: satheesh nair</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-197</link>
		<dc:creator>satheesh nair</dc:creator>
		<pubDate>Wed, 24 Oct 2007 14:24:12 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-197</guid>
		<description>Prashanth, Can we discuss a deal to develop a wikipedia mirror for a project of mine please. Please give me your contact details, I am in bangalore</description>
		<content:encoded><![CDATA[<p>Prashanth, Can we discuss a deal to develop a wikipedia mirror for a project of mine please. Please give me your contact details, I am in bangalore</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: senddesks &#187; Ways to process and use Wikipedia dumps</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-195</link>
		<dc:creator>senddesks &#187; Ways to process and use Wikipedia dumps</dc:creator>
		<pubDate>Wed, 24 Oct 2007 02:59:58 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-195</guid>
		<description>[...] here for full [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] here for full [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: prashanthellina</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-138</link>
		<dc:creator>prashanthellina</dc:creator>
		<pubDate>Thu, 18 Oct 2007 02:25:20 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-138</guid>
		<description>Interesting, I tried wget from my local machine and It was downloading alright.

I want to get the processed data to be accessible through some kind of API so anybody can "query" the database over HTTP. Instead of getting 5GB to my disk and then uploading back to Dreamhost, I felt it better to do it there itself. I have a 128kbps "broadband" connection, so you can imagine the upload rate.</description>
		<content:encoded><![CDATA[<p>Interesting, I tried wget from my local machine and It was downloading alright.</p>
<p>I want to get the processed data to be accessible through some kind of API so anybody can &#8220;query&#8221; the database over HTTP. Instead of getting 5GB to my disk and then uploading back to Dreamhost, I felt it better to do it there itself. I have a 128kbps &#8220;broadband&#8221; connection, so you can imagine the upload rate.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
