<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Ways to process and use Wikipedia dumps</title>
	<atom:link href="http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/</link>
	<description>( to ) ? be : ! be;</description>
	<lastBuildDate>Thu, 11 Feb 2010 04:06:39 -0800</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: bhupinder</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/comment-page-1/#comment-27716</link>
		<dc:creator>bhupinder</dc:creator>
		<pubDate>Wed, 03 Jun 2009 19:27:21 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-27716</guid>
		<description>Hi everybody there . Could just help me for the structure for tables revision and page also. i am able to get the three files named as text, page and revision using xmltosql tool. i have been able to get the &quot;text&quot; table structure here but not for rest of the two.if any body can help me out for them also. one can jsut help me by responding to my query here.

i will be thankful to that person.  i know i can figure out some thing from data itself, but then also if probably some body has there something near to their structure, let me know .

i am still waiting for some kind of help

thanks,
bhupinder</description>
		<content:encoded><![CDATA[<p>Hi everybody there . Could just help me for the structure for tables revision and page also. i am able to get the three files named as text, page and revision using xmltosql tool. i have been able to get the &#8220;text&#8221; table structure here but not for rest of the two.if any body can help me out for them also. one can jsut help me by responding to my query here.</p>
<p>i will be thankful to that person.  i know i can figure out some thing from data itself, but then also if probably some body has there something near to their structure, let me know .</p>
<p>i am still waiting for some kind of help</p>
<p>thanks,<br />
bhupinder</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: b.b.goyal, barnala</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/comment-page-1/#comment-10138</link>
		<dc:creator>b.b.goyal, barnala</dc:creator>
		<pubDate>Fri, 07 Nov 2008 07:13:56 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-10138</guid>
		<description>sir, i want to display wikipedia contents on my site. is it possible? if yes, how? regrds.</description>
		<content:encoded><![CDATA[<p>sir, i want to display wikipedia contents on my site. is it possible? if yes, how? regrds.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andy Bailey</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/comment-page-1/#comment-7924</link>
		<dc:creator>Andy Bailey</dc:creator>
		<pubDate>Mon, 29 Sep 2008 11:00:41 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-7924</guid>
		<description>Prashanth,
I managed to load the sql imports into the table &#039;text&#039;. It now holds just over 10,000,000 records but i can&#039;t do anything with it. Everytime i try a select count(*) or a Select * limit 0,1 i get no return from MySQL. I may have to abandon all and use a mediaWiki method instead. Did you have any trouble accessing the data? One thing i&#039;ve noticed is that the primary key index (old_id) is taking up no diskspace which seems a little weird.</description>
		<content:encoded><![CDATA[<p>Prashanth,<br />
I managed to load the sql imports into the table &#8216;text&#8217;. It now holds just over 10,000,000 records but i can&#8217;t do anything with it. Everytime i try a select count(*) or a Select * limit 0,1 i get no return from MySQL. I may have to abandon all and use a mediaWiki method instead. Did you have any trouble accessing the data? One thing i&#8217;ve noticed is that the primary key index (old_id) is taking up no diskspace which seems a little weird.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andy Bailey</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/comment-page-1/#comment-5875</link>
		<dc:creator>Andy Bailey</dc:creator>
		<pubDate>Mon, 25 Aug 2008 00:11:55 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-5875</guid>
		<description>Prashanth,
Thanks for a great post. I found i had to make some small alterations for mysql 5.0.21

CREATE DATABASE wiki;
use wiki;
create table text {
old_int int UNSIGNED NOT NULL, AUTO_INCREMENT,
old_text mediumblob NOT NULL,
old_flags tinyblob NOT NULL,
PRIMARY KEY (old_id)
} MAX_ROWS 10000000 AVG_ROW_LENGTH=10240;</description>
		<content:encoded><![CDATA[<p>Prashanth,<br />
Thanks for a great post. I found i had to make some small alterations for mysql 5.0.21</p>
<p>CREATE DATABASE wiki;<br />
use wiki;<br />
create table text {<br />
old_int int UNSIGNED NOT NULL, AUTO_INCREMENT,<br />
old_text mediumblob NOT NULL,<br />
old_flags tinyblob NOT NULL,<br />
PRIMARY KEY (old_id)<br />
} MAX_ROWS 10000000 AVG_ROW_LENGTH=10240;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: prashanthellina</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/comment-page-1/#comment-5263</link>
		<dc:creator>prashanthellina</dc:creator>
		<pubDate>Mon, 04 Aug 2008 15:57:32 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-5263</guid>
		<description>Jared, the intra-wiki links are internal automatically.</description>
		<content:encoded><![CDATA[<p>Jared, the intra-wiki links are internal automatically.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jared</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/comment-page-1/#comment-5202</link>
		<dc:creator>Jared</dc:creator>
		<pubDate>Fri, 01 Aug 2008 20:52:04 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-5202</guid>
		<description>Does anyone know how to make all the links internal to my domain after we do the dump?</description>
		<content:encoded><![CDATA[<p>Does anyone know how to make all the links internal to my domain after we do the dump?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: prashanthellina</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/comment-page-1/#comment-4769</link>
		<dc:creator>prashanthellina</dc:creator>
		<pubDate>Thu, 17 Jul 2008 16:09:11 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4769</guid>
		<description>Indrajeet, I did not understand your question. What are you trying to do?</description>
		<content:encoded><![CDATA[<p>Indrajeet, I did not understand your question. What are you trying to do?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: prashanthellina</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/comment-page-1/#comment-4768</link>
		<dc:creator>prashanthellina</dc:creator>
		<pubDate>Thu, 17 Jul 2008 16:08:25 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4768</guid>
		<description>Rich, way to go! That&#039;s a wonderful way to learn. Am glad I was of help. Enjoy!
RK, will get back to you by mail shortly.</description>
		<content:encoded><![CDATA[<p>Rich, way to go! That&#8217;s a wonderful way to learn. Am glad I was of help. Enjoy!<br />
RK, will get back to you by mail shortly.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: RK</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/comment-page-1/#comment-4567</link>
		<dc:creator>RK</dc:creator>
		<pubDate>Tue, 08 Jul 2008 17:47:52 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4567</guid>
		<description>Mailed you but couldnt get a revert. You are the only person i know who can get this done :)

RKs last blog post..&lt;a href=&quot;http://www.managementparadise.com/forums/introduce-yourself/34682-naishadh86-intro.html&quot; rel=&quot;nofollow&quot;&gt;naishadh86 Intro&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p>Mailed you but couldnt get a revert. You are the only person i know who can get this done <img src='http://blog.prashanthellina.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>RKs last blog post..<a href="http://www.managementparadise.com/forums/introduce-yourself/34682-naishadh86-intro.html" rel="nofollow">naishadh86 Intro</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: rich</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/comment-page-1/#comment-4562</link>
		<dc:creator>rich</dc:creator>
		<pubDate>Tue, 08 Jul 2008 13:13:13 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4562</guid>
		<description>THANK YOU!!  This is extremely helpful.  I downloaded one of the dumps and had no clue what to do with it.  I figured I would just start and tinker with it along the way (that&#039;s how I learn everything for computers, treat it like a puzzle to solve and I end up teaching myself just about anything)  but when I saw the dump would decompress into massively large files and take a while to do it, I decided to take a step back and slow down before I blow up my computer in the process.  This helped me more than you can know, thanks so much!!!!</description>
		<content:encoded><![CDATA[<p>THANK YOU!!  This is extremely helpful.  I downloaded one of the dumps and had no clue what to do with it.  I figured I would just start and tinker with it along the way (that&#8217;s how I learn everything for computers, treat it like a puzzle to solve and I end up teaching myself just about anything)  but when I saw the dump would decompress into massively large files and take a while to do it, I decided to take a step back and slow down before I blow up my computer in the process.  This helped me more than you can know, thanks so much!!!!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: indrajeet</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/comment-page-1/#comment-4538</link>
		<dc:creator>indrajeet</dc:creator>
		<pubDate>Mon, 07 Jul 2008 11:32:34 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4538</guid>
		<description>tell me, how to dump the database after installing mediawiki 
please tell me i am waiting for ur reply.






thanks and regards 
Indrajeet Dhanjode</description>
		<content:encoded><![CDATA[<p>tell me, how to dump the database after installing mediawiki<br />
please tell me i am waiting for ur reply.</p>
<p>thanks and regards<br />
Indrajeet Dhanjode</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: prashanthellina</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/comment-page-1/#comment-4339</link>
		<dc:creator>prashanthellina</dc:creator>
		<pubDate>Sun, 15 Jun 2008 05:12:29 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4339</guid>
		<description>RK, I prefer communicating via email. Please mail me at the same address and we can have a discussion.</description>
		<content:encoded><![CDATA[<p>RK, I prefer communicating via email. Please mail me at the same address and we can have a discussion.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: RK</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/comment-page-1/#comment-4334</link>
		<dc:creator>RK</dc:creator>
		<pubDate>Sat, 14 Jun 2008 15:35:38 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4334</guid>
		<description>Hello,

I&#039;ve sent you an add request on gmail for a site we need done integrated with mediawiki and wikipedia database dump. PLease accept the add request so that we can discuss it in detail.</description>
		<content:encoded><![CDATA[<p>Hello,</p>
<p>I&#8217;ve sent you an add request on gmail for a site we need done integrated with mediawiki and wikipedia database dump. PLease accept the add request so that we can discuss it in detail.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: prashanthellina</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/comment-page-1/#comment-4146</link>
		<dc:creator>prashanthellina</dc:creator>
		<pubDate>Fri, 23 May 2008 14:26:12 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4146</guid>
		<description>Thanks for the info, Totic.</description>
		<content:encoded><![CDATA[<p>Thanks for the info, Totic.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: totic</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/comment-page-1/#comment-4076</link>
		<dc:creator>totic</dc:creator>
		<pubDate>Sun, 18 May 2008 07:06:25 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-4076</guid>
		<description>I also had the problem with wget, it seems to always fail when the files are incredible large, my solution was to just 
use curl

example:

curl http://download.wikimedia.org/enwiki/20070206/enwiki-20070206-pages-articles.xml.bz2 -o enwiki-20070206-pages-articles.xml.bz2</description>
		<content:encoded><![CDATA[<p>I also had the problem with wget, it seems to always fail when the files are incredible large, my solution was to just<br />
use curl</p>
<p>example:</p>
<p>curl <a href="http://download.wikimedia.org/enwiki/20070206/enwiki-20070206-pages-articles.xml.bz2" rel="nofollow">http://download.wikimedia.org/enwiki/20070206/enwiki-20070206-pages-articles.xml.bz2</a> -o enwiki-20070206-pages-articles.xml.bz2</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: prashanthellina</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/comment-page-1/#comment-3654</link>
		<dc:creator>prashanthellina</dc:creator>
		<pubDate>Sun, 27 Apr 2008 06:28:39 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-3654</guid>
		<description>Great! It worked for me so I assume it is correct. I have not run this against the latest xml though.</description>
		<content:encoded><![CDATA[<p>Great! It worked for me so I assume it is correct. I have not run this against the latest xml though.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: someone</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/comment-page-1/#comment-3556</link>
		<dc:creator>someone</dc:creator>
		<pubDate>Fri, 25 Apr 2008 13:23:31 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-3556</guid>
		<description>Hi

I used your split script.._thanks_ a lot for it! 
but are you sure that it is accurate? does it need to be updated?</description>
		<content:encoded><![CDATA[<p>Hi</p>
<p>I used your split script.._thanks_ a lot for it!<br />
but are you sure that it is accurate? does it need to be updated?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: prashanthellina</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/comment-page-1/#comment-198</link>
		<dc:creator>prashanthellina</dc:creator>
		<pubDate>Wed, 24 Oct 2007 14:55:19 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-198</guid>
		<description>Hi Satheesh,

My email is prashanthBLAHellina AT gmail DOTT com (remove BLAH).</description>
		<content:encoded><![CDATA[<p>Hi Satheesh,</p>
<p>My email is prashanthBLAHellina AT gmail DOTT com (remove BLAH).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: satheesh nair</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/comment-page-1/#comment-197</link>
		<dc:creator>satheesh nair</dc:creator>
		<pubDate>Wed, 24 Oct 2007 14:24:12 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-197</guid>
		<description>Prashanth, Can we discuss a deal to develop a wikipedia mirror for a project of mine please. Please give me your contact details, I am in bangalore</description>
		<content:encoded><![CDATA[<p>Prashanth, Can we discuss a deal to develop a wikipedia mirror for a project of mine please. Please give me your contact details, I am in bangalore</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: senddesks &#187; Ways to process and use Wikipedia dumps</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/comment-page-1/#comment-195</link>
		<dc:creator>senddesks &#187; Ways to process and use Wikipedia dumps</dc:creator>
		<pubDate>Wed, 24 Oct 2007 02:59:58 +0000</pubDate>
		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comment-195</guid>
		<description>[...] here for full [...]</description>
		<content:encoded><![CDATA[<p>[...] here for full [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>
