<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Prashanth Ellina &#187; programming</title>
	<atom:link href="http://blog.prashanthellina.com/category/programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.prashanthellina.com</link>
	<description>In Pursuit of Truth</description>
	<lastBuildDate>Sun, 28 Nov 2010 09:35:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>All your aliases are belong to you</title>
		<link>http://blog.prashanthellina.com/2009/08/28/all-your-aliases-are-belong-to-you/</link>
		<comments>http://blog.prashanthellina.com/2009/08/28/all-your-aliases-are-belong-to-you/#comments</comments>
		<pubDate>Fri, 28 Aug 2009 02:47:06 +0000</pubDate>
		<dc:creator>prashanthellina</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[text processing]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[veveo]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[productivity]]></category>
		<category><![CDATA[script]]></category>

		<guid isPermaLink="false">http://blog.prashanthellina.com/?p=129</guid>
		<description><![CDATA[I like setting up shortcuts to frequently used commands whether I used Windows or Linux. I use the terminal often and create shortcuts to frequently used commands using &#8220;alias&#8221; feature of BASH. This has saved me considerable time in the past. However, I recently felt that if I could have a helper tool to monitor [...]]]></description>
			<content:encoded><![CDATA[<p>I like setting up shortcuts to frequently used commands whether I used Windows or Linux. I use the terminal often and create shortcuts to frequently used    commands using &#8220;alias&#8221; feature of BASH. This has saved me considerable time in the past. However, I recently felt that if I could have a helper tool to       monitor my usage of commands and automatically suggest candidates for aliasing, that would be useful. The output of that is Aliaser.</p>
<p>Aliaser works by monitoring your bash history. It analyses command frequency and suggests candidates for aliasing. It manages aliases so created. The feature I like most in Aliaser is that it reminds you to use the aliases you created by showing tips on opening a new terminal session.</p>
<p>Download Aliaser from <a href="http://aliaser.googlecode.com">http://aliaser.googlecode.com</a>.</p>
<p><a href="http://aliaser.googlecode.com"><br />
<img align="center" src="http://aliaser.googlecode.com/files/aliaser_tips.png"/><br />
</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.prashanthellina.com/2009/08/28/all-your-aliases-are-belong-to-you/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Command-line language translation</title>
		<link>http://blog.prashanthellina.com/2009/08/18/command-line-language-translation/</link>
		<comments>http://blog.prashanthellina.com/2009/08/18/command-line-language-translation/#comments</comments>
		<pubDate>Tue, 18 Aug 2009 14:15:42 +0000</pubDate>
		<dc:creator>prashanthellina</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[script]]></category>
		<category><![CDATA[tool]]></category>
		<category><![CDATA[translation]]></category>

		<guid isPermaLink="false">http://blog.prashanthellina.com/?p=117</guid>
		<description><![CDATA[Here is a simple utility created using Python for translating text from various languages into English. It uses the Google AJAX API to do this. Usage prashanth@prashanth-desktop:~$ translate bonjour hello prashanth@prashanth-desktop:~$ translate guten morgen Good morning Code #!/usr/bin/env python ''' Translates text into english using Google Translate. Usage: python translate.py (or) echo &#124; python translate.py [...]]]></description>
			<content:encoded><![CDATA[<p>Here is a simple utility created using Python for translating text from various languages into English. It uses the Google AJAX API to do this.</p>
<p><strong>Usage</strong></p>
<pre lang="bash">
prashanth@prashanth-desktop:~$ translate bonjour
hello
prashanth@prashanth-desktop:~$ translate guten morgen
Good morning
</pre>
<p><br/></p>
<p><strong>Code</strong></p>
<pre lang="python">
#!/usr/bin/env python
'''
Translates text into english using Google Translate.
Usage: python translate.py <text>
        (or)
       echo <text> | python translate.py

For convenience, make a symlink to this file from /usr/bin/translate.
'''
# derived from : http://code.google.com/p/py-gtranslate/source/browse/trunk/gtrans.py

import sys
import urllib2
import urllib
import simplejson as json

FROM_LANGUAGE = ''
TO_LANGUAGE = 'en'
BASE_URL = 'http://ajax.googleapis.com/ajax/services/language/translate'

def translate(from_language, to_language, text):
    langpair = '%s|%s' % (from_language, to_language)
    params = {'v': '1.0', 'langpair': langpair, 'q': urllib.quote_plus(text)}

    params = '%s' % ('&#038;'.join(['%s=%s' % (k,v) for (k,v) in params.items()]))

    url = '%s?%s' % (BASE_URL, params)
    resp = json.load(urllib2.urlopen(url))
    try:
        return resp['responseData']['translatedText']
    except:
        return text

def main(text):
    if text:
        print translate(FROM_LANGUAGE, TO_LANGUAGE, text)

    else:
        lines = [l.strip() for l in sys.stdin.readlines()]
        for line in lines:
            if line:
                text = translate(FROM_LANGUAGE, TO_LANGUAGE, line)
                print '[%s]' % line
                print text
            else:
                print

if __name__ == '__main__':
    args = sys.argv[1:]
    text = ' '.join(sys.argv[1:])
    main(text)
</pre>
<p><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.prashanthellina.com/2009/08/18/command-line-language-translation/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Extracting relevant text from HTML pages</title>
		<link>http://blog.prashanthellina.com/2009/07/27/extracting-relevant-text-from-html-pages/</link>
		<comments>http://blog.prashanthellina.com/2009/07/27/extracting-relevant-text-from-html-pages/#comments</comments>
		<pubDate>Mon, 27 Jul 2009 11:28:09 +0000</pubDate>
		<dc:creator>prashanthellina</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[text processing]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://blog.prashanthellina.com/?p=99</guid>
		<description><![CDATA[Some time back I had done some work on extracting topics from an arbitrary piece of text using Wikipedia data. Recently I thought of a concept to put that algorithm to work. As a part of this project, I need to extract relevant text from an arbitrary HTML page. By relevant I mean the &#8220;meat&#8221; [...]]]></description>
			<content:encoded><![CDATA[<p>Some time back I had done some work on <a href="http://blog.prashanthellina.com/2007/12/21/topic-extraction-using-wikipedia-data/">extracting topics</a> from an arbitrary piece of text using Wikipedia data. Recently I thought of a concept to put that algorithm to work. As a part of this project, I need to extract <strong>relevant</strong> text from an arbitrary HTML page. By relevant I mean the &#8220;meat&#8221; of the page devoid of navigation links and side-content.</p>
<p>This algorithm has the following <strong>steps</strong>:</p>
<ul>
<li>make doc from html data (clean html)
<li>identify content nodes (nodes having substantial content)
<li>prune xml tree to remove irrelevant nodes
<li>get the most linked node from pruned tree (subtree contains relevant text)
<li>make the dot graph
</ul>
<p>I&#8217;ve pasted the relevant python module below for easy reading. However, if you want to download the code and hack it, you can get all the files from <a href="http://code.prashanthellina.com/code/content_extraction">here</a>.</p>
<p><strong>Code files</strong></p>
<ul>
<li>content_extract.py &#8211; actual work gets done here (file pasted below)
<li>cextract.py &#8211; cgi front-end which fetched url content and feeds to above script.
<li>cextract_config.py &#8211; cgi script configuration file. You have to adjust this to your environment.
</ul>
<p><strong>Try it right here and right now</strong></p>
<form action="http://www.prashanthellina.com/cgi-bin/cextract.py" method="GET">
url:<br />
<input type="text" name="url" size="60"/>
<input type="submit" value="extract text"/>
</form>
<p><strong>Some samples</strong></p>
<ul>
<li><a href="http://www.prashanthellina.com/cextract_data/510fbe51d89334aecb70d9b1d1635711.html">http://news.bbc.co.uk/sport2/hi/motorsport/formula_one/8169436.stm</a>
<li><a href="http://www.prashanthellina.com/cextract_data/640ca0b7c6e818b2b1bf952a206a6388.html">http://www.prashanthellina.com/cextract_data/640ca0b7c6e818b2b1bf952a206a6388.html</a>
<li><a href="http://www.prashanthellina.com/cextract_data/8408efd51b4f1f6b91650d4ea3ce8924.html">http://www.telegraph.co.uk/news/worldnews/europe/france/5913494/Nicolas-Sarkozy-to-slow-down-after-collapsing-while-jogging.html</a>
</ul>
<p>Please let me know if you find cases for which the algorithm does not work. Even better would be to download the code and hack it up and post back. I am eager to see what you can come up with.</p>
<pre lang="python">
#!/usr/bin/env python

import sys
from cStringIO import StringIO

from lxml import etree #http://codespeak.net/lxml/

IGNORABLE_TAGS = set(['script', 'a'])
MIN_TEXT_LEN = 50

def get_text(node):
    '''
    Given a XML node, extract all the text it contains.
    (does not recurse into children)
    '''
    text = [node.text or '']
    for cnode in node.getchildren():
        tail = cnode.tail
        if tail is not None:
            text.append(cnode.tail)

    text = '\n'.join(text).strip()
    return text

def get_xml(node):
    '''
    Convert the sub-tree from node downwards
    into string XML representation.
    '''
    return etree.tostring(node)

def create_doc(data):
    '''
    Construct XML tree datastructure from xml string representation.
    '''
    parser = etree.HTMLParser()
    doc = etree.parse(StringIO(data), parser)
    return doc

def get_content_nodes(doc):
    '''
    Identify nodes in the XML document that
    have substantial text.
    '''
    nodes = []

    for n in doc.xpath('//*'):
        tag = n.tag

        if tag.lower() in IGNORABLE_TAGS:
            continue

        text = get_text(n)
        if not text:
            continue

        if len(text) < MIN_TEXT_LEN:
            continue

        nodes.append(n)

    return nodes

def make_pruned_tree(content_nodes):
    '''
    Prune the whole XML tree by remnoving nodes
    other than content nodes and their ancestors.
    '''
    nodes = {}
    links = {}

    for node in content_nodes:

        nodes[id(node)] = node

        parent = node.getparent()
        if parent is not None:
            links[id(node)] = id(parent)

        for anode in node.iterancestors():
            _id = id(anode)
            parent = anode.getparent()
            if parent is not None:
                links[_id] = id(parent)

            if _id not in nodes:
                nodes[_id] = anode

    return nodes, links

def get_inlink_counts(links):
    '''
    Given the inter-node links, find out which
    node has maximum number of links coming into it.
    '''
    counts = {}

    for from_id, to_id in links.iteritems():
        count = counts.setdefault(to_id, 0)
        counts[to_id] = count + 1

    return counts

def get_most_linked_node(nodes, links):
    '''
    Identify the node which is most linked.
    (i,e) has most number of inlinks.
    '''
    inlink_counts = get_inlink_counts(links)

    mcount, mid = max([(count, _id) for _id, count in inlink_counts.iteritems()])
    node = nodes[mid]
    return node

def make_dot_graph(nodes, links, chosen_node, stream):
    '''
    Construct the dot format graph representation
    so that graphviz can render the tree for visualization.
    '''
    o = stream

    print >> o, "digraph G {"

    for _id, node in nodes.iteritems():

        tlen = len(get_text(node))
        tag = node.tag

        if tlen:
            text = '%s (%d)' % (tag, tlen)
        else:
            text = tag

        if _id == chosen_node:
            attrs = 'style=filled color=lightblue'
        else:
            attrs = ''

        print >> o, "%s [label=\"%s\" %s];" % (_id, text, attrs)

    for fid, tid in links.iteritems():
        print >> o, "%d -> %d;" % (fid, tid)

    print >> o, "}"

def main():
    # make doc from html data (cleans html)
    doc = create_doc(sys.stdin.read())

    # identify content nodes
    content_nodes = get_content_nodes(doc)

    # prune xml tree to remove irrelevant nodes
    nodes, links = make_pruned_tree(content_nodes)

    # get the most linked node from pruned tree
    mnode = get_most_linked_node(nodes, links)

    # make the dot graph
    make_dot_graph(nodes, links, id(mnode), sys.stdout)

if __name__ == '__main__':
    #Eg: wget "http://blog.prashanthellina.com" -O - | python thisscript.py | dot -Tpng -o /tmp/test.png ; eog /tmp/test.png
    main()
</pre>
]]></content:encoded>
			<wfw:commentRss>http://blog.prashanthellina.com/2009/07/27/extracting-relevant-text-from-html-pages/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Clustering Data using Python</title>
		<link>http://blog.prashanthellina.com/2009/07/25/clustering-data-using-python/</link>
		<comments>http://blog.prashanthellina.com/2009/07/25/clustering-data-using-python/#comments</comments>
		<pubDate>Sat, 25 Jul 2009 04:06:43 +0000</pubDate>
		<dc:creator>prashanthellina</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[text processing]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[script]]></category>

		<guid isPermaLink="false">http://blog.prashanthellina.com/?p=93</guid>
		<description><![CDATA[As a part of a project I am working on, I had to cluster urls on a page. After some light googling I found, python-cluster. You can find below a simple python script to illustrate the usage of python-cluster library. Code import pprint from difflib import SequenceMatcher # http://python-cluster.sourceforge.net/ from cluster import HierarchicalClustering # input [...]]]></description>
			<content:encoded><![CDATA[<p>As a part of a project I am working on, I had to cluster urls on a page. After some light googling I found, <a href="http://python-cluster.sourceforge.net/">python-cluster</a>. You can find below a simple python script to illustrate the usage of python-cluster library.</p>
<p><strong>Code</strong></p>
<pre lang="python">
import pprint
from difflib import SequenceMatcher

# http://python-cluster.sourceforge.net/
from cluster import HierarchicalClustering

# input urls to be clustered
urls = [
    'http://slashdot.org//it.slashdot.org/comments.pl?sid=1314601&#038;cid=28814385',
    '#articles',
    'http://slashdot.org//it.slashdot.org/comments.pl?sid=1314601&#038;cid=28814335',
    'http://yro.slashdot.org/~drDugan/',
    'http://web.sourceforge.com/privacy.php',
    'http://slashdot.org//it.slashdot.org/comments.pl?sid=1314601&#038;cid=28815123',
    'http://slashdot.org//slashdot.org/~Darkness404',
    'http://slashdot.org//radio.slashdot.org',
    'http://slashdot.org//it.slashdot.org/comments.pl?sid=1314601&#038;op=Reply&#038;threshold=1&#038;commentsort=0&#038;mode=thread&#038;pid=28814429',
    'http://slashdot.org//it.slashdot.org/comments.pl?sid=1314601&#038;op=Reply&#038;threshold=1&#038;commentsort=0&#038;mode=thread&#038;pid=28814457',
    'http://slashdot.org//slashdot.org/article.pl?sid=09/07/24/1545238',
    'http://slashdot.org//slashdot.org/comments.pl?sid=09/07/24/1545238&#038;cid=28810581',
    'http://slashdot.org//it.slashdot.org/comments.pl?sid=1314601&#038;cid=28815269',
    'http://slashdot.org//it.slashdot.org/comments.pl?sid=1314601&#038;cid=28814657',
    'http://web.sourceforge.com/terms.php'
    'http://slashdot.org//it.slashdot.org/search',
    'http://slashdot.org//it.slashdot.org/comments.pl?sid=1314601&#038;cid=28814581',
    'http://xkcd.com/612/',
    'http://web.sourceforge.com/advertising',
    'http://slashdot.org//it.slashdot.org/comments.pl?sid=1314601&#038;op=Reply&#038;threshold=1&#038;commentsort=0&#038;mode=thread&#038;pid=28814785',
]

# distance function compares two urls and finds the distance
# uses SequenceMatcher from python standard module difflib
def distance(url1, url2):
    ratio = SequenceMatcher(None, url1, url2).ratio()
    return 1.0 - ratio

# Perform clustering
hc = HierarchicalClustering(urls, distance)
clusters = hc.getlevel(0.2)

pprint.pprint(clusters)
</pre>
<p><br/></p>
<p><strong> Output </strong></p>
<pre lang="python">
[['#articles'],
 ['http://xkcd.com/612/'],
 ['http://web.sourceforge.com/privacy.php'],
 ['http://web.sourceforge.com/advertising'],
 ['http://web.sourceforge.com/terms.phphttp://slashdot.org//it.slashdot.org/search'],
 ['http://yro.slashdot.org/~drDugan/'],
 ['http://slashdot.org//slashdot.org/~Darkness404'],
 ['http://slashdot.org//radio.slashdot.org'],
 ['http://slashdot.org//it.slashdot.org/comments.pl?sid=1314601&#038;op=Reply&#038;threshold=1&#038;commentsort=0&#038;mode=thread&#038;pid=28814785',
  'http://slashdot.org//it.slashdot.org/comments.pl?sid=1314601&#038;op=Reply&#038;threshold=1&#038;commentsort=0&#038;mode=thread&#038;pid=28814429',
  'http://slashdot.org//it.slashdot.org/comments.pl?sid=1314601&#038;op=Reply&#038;threshold=1&#038;commentsort=0&#038;mode=thread&#038;pid=28814457'],
 ['http://slashdot.org//slashdot.org/article.pl?sid=09/07/24/1545238',
  'http://slashdot.org//slashdot.org/comments.pl?sid=09/07/24/1545238&#038;cid=28810581',
  'http://slashdot.org//it.slashdot.org/comments.pl?sid=1314601&#038;cid=28815123',
  'http://slashdot.org//it.slashdot.org/comments.pl?sid=1314601&#038;cid=28815269',
  'http://slashdot.org//it.slashdot.org/comments.pl?sid=1314601&#038;cid=28814385',
  'http://slashdot.org//it.slashdot.org/comments.pl?sid=1314601&#038;cid=28814335',
  'http://slashdot.org//it.slashdot.org/comments.pl?sid=1314601&#038;cid=28814657',
  'http://slashdot.org//it.slashdot.org/comments.pl?sid=1314601&#038;cid=28814581']]
</pre>
<p><br/></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.prashanthellina.com/2009/07/25/clustering-data-using-python/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Determining the difficulty of Arithmetic Operations</title>
		<link>http://blog.prashanthellina.com/2008/07/27/determining-the-difficulty-of-arithmetic-operations/</link>
		<comments>http://blog.prashanthellina.com/2008/07/27/determining-the-difficulty-of-arithmetic-operations/#comments</comments>
		<pubDate>Sun, 27 Jul 2008 16:51:22 +0000</pubDate>
		<dc:creator>prashanthellina</dc:creator>
				<category><![CDATA[math]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[arithmetic]]></category>
		<category><![CDATA[arithmetic algorithms]]></category>
		<category><![CDATA[borrow]]></category>
		<category><![CDATA[carry]]></category>
		<category><![CDATA[long division]]></category>
		<category><![CDATA[problem difficulty]]></category>
		<category><![CDATA[problem generator]]></category>

		<guid isPermaLink="false">http://blog.prashanthellina.com/?p=69</guid>
		<description><![CDATA[I am trying to write a program to test my arithmetic skills. The program should pose arithmetic problems involving the four basic operations &#8211; addition, subtraction, multiplication and division. When the testing session starts, the program should issue problems of less difficulty and the difficulty should be ramped up gradually. A score should be computed [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.prashanthellina.com/images/kid_math_problem.gif" align="left" alt="kid math problem" padding="5"/>I am trying to write a program to test my arithmetic skills. The program should pose arithmetic problems involving the four basic operations &#8211; addition, subtraction, multiplication and division. When the testing session starts, the program should issue problems of less difficulty and the difficulty should be ramped up gradually. A score should be computed based on number of questions and the difficulty of questions. The hope is that if I keep using this program for a little time every day, I&#8217;ll be able to improve my abysmal arithmetic performance <img src='http://blog.prashanthellina.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Note that you will need to have <a href="http://www.python.org">Python</a> installed to try out the script and the examples in this article. I believe that the code is simple enough to be understood with minimal or no understanding of the Python language. If any part of the article is not clear, feel free to point it out to me. Thank you.</p>
<h2>Estimating difficulty</h2>
<p>Let us take a look at the following problems.</p>
<p><strong>Problem 1</strong></p>
<pre lang="python">
5 +
3
</pre>
<p><br/></p>
<p><strong>Problem 2</strong></p>
<pre lang="python">
99999 +
12345
</pre>
<p><br/></p>
<p>Now, which problem is more difficult? It is quite obvious that Problem 2 is more difficult. But why? One answer would be &#8211; &#8220;because Problem 2 involves the addition of two large numbers&#8221;. Fine. Consider this now &#8230;</p>
<p><strong>Problem 3</strong></p>
<pre lang="python">
1000000 +
1000001
</pre>
<p><br/></p>
<p>Is Problem 3 more difficult than Problem 2 because the numbers undergoing addition are bigger? The intuitive answer is &#8220;No&#8221;. But why? Because the time it takes to compute the result is lesser. Why? Because the number of operations performed during the course of solving Problem 3 are lesser than those of Problem 2? No.</p>
<p>Problem 3 is easier than Problem 2 because the nature of the operations performed in Problem 3. The operations there are simpler.</p>
<pre lang="python">
0 + 1,
0 + 0,
1 + 1
</pre>
<p><br/></p>
<p>are arguably simpler operations than</p>
<pre lang="python">
9 + 5,
9 + 2,
4 + 9
</pre>
<p><br/></p>
<p>I&#8217;ve attempted to capture the basic operations we perform while doing arithmetic and assign a level of difficulty to each. I&#8217;ve then emulated the algorithms we follow to compute solutions to such problems. By doing the following, it becomes possible to estimate the &#8220;difficulty&#8221; of an arithmetic problem (involving the basic operations).</p>
<p><center><img src="http://www.prashanthellina.com/images/math_cartoon_yesterday_x.jpg" alt="math cartoon yesterday x"/></center></p>
<h2>Operation difficulties</h2>
<p><br/></p>
<h3>Addition</h3>
<pre lang="python">
addition_difficulties = {
    'digit_zero' : 1,   # any digit added to zero
    'even_even'  : 2,   # sum of even digits
    'odd_odd'    : 2,   # sum of odd digits
    'even_odd'   : 3,   # sum of even and odd digits
    'carry'      : 2    # difficulty of carry (for remembering and then adding)
}
</pre>
<p><br/></p>
<p>Here is the function which computes the addition difficulty. It is a recursive function that can compute the addition difficulty for n numbers. It computes the sum and difficulty of the first two numbers and then inserts the sum at the head of the list of n numbers in place of the two numbers just summed. It then calls itself with the modified list.</p>
<pre lang="python">
def compute_addition_difficulty(*numbers):
    '''
    Generates a difficulty number and sum for given
    list of numbers for the addition operation
    *args - integers
    returns: (sum, difficulty)

    >>> compute_addition_difficulty(0, 0)
    (0, 1)

    >>> compute_addition_difficulty(0, 1)
    (1, 1)

    >>> compute_addition_difficulty(1, 1)
    (2, 2)

    >>> compute_addition_difficulty(1, 2)
    (3, 3)

    >>> compute_addition_difficulty(2, 2)
    (4, 2)

    >>> compute_addition_difficulty(19, 9)
    (28, 5)

    >>> compute_addition_difficulty(99999, 12345)
    (112344, 22)
    '''

    numbers = list(numbers)
    if len(numbers) == 0: return 0, 0

    elif len(numbers) == 1: return numbers[0], 0

    num1, num2 = numbers[:2]
    num1 = str(num1)
    num2 = str(num2)

    max_length = max([len(num1), len(num2)])
    num1 = num1.rjust(max_length, '0')
    num2 = num2.rjust(max_length, '0')

    difficulty = 0
    carry = 0
    r = reversed
    ad = addition_difficulties
    sum = []

    for index, (d1, d2) in enumerate( izip( r(num1), r(num2) ) ):
        d1 = int(d1)
        d2 = int(d2)

        d1_is_even = is_even(d1)
        d2_is_even = is_even(d2)

        if not d1 or not d2: difficulty += ad['digit_zero']
        elif d1_is_even and d2_is_even: difficulty += ad['even_even']
        elif not d1_is_even and not d2_is_even: difficulty += ad['odd_odd']
        elif d1_is_even != d2_is_even: difficulty += ad['even_odd']

        dsum = d1 + d2 + carry

        if dsum > 9:
            carry = 1
            difficulty += ad['carry']
        else:
            carry = 0

        sum.append( str(dsum % 10) )

    if carry:
        sum.append( str(carry) )

    sum.reverse()
    sum = ''.join(sum)
    sum = int (sum)

    numbers = [sum] + numbers[2:]

    sum, sub_difficulty = compute_addition_difficulty(*numbers)

    difficulty += sub_difficulty

    return sum, difficulty
</pre>
<p><br/></p>
<h3>Subtraction</h3>
<pre lang="python">
subtraction_difficulties = {
    'digit_zero' : 1,    # difference of zero and any digit
    'same_digits': 1,    # difference of same values digits
    'even_even'  : 2,    # difference of even digits
    'odd_odd'    : 2,    # difference of odd digits
    'even_odd'   : 3,    # difference of even and odd digits
    'borrow'     : 2,    # doing a borrow
    'twodigit_digit' : 4 # difference of two-digit number and digit
}
</pre>
<p><br/></p>
<p>The subtraction function is also recursive and operates similar to addition in that respect. Before looking at the subtraction algorithm, take a look at how &#8220;borrowing&#8221; is implemented. Let us say we are trying to perform the following,</p>
<pre lang="python">
200005 -
     6
</pre>
<p><br/></p>
<p>After borrowing the number would be, 19999(1)5. We would then subtract 6 from 15 and proceed. The do_borrow() function captures this aspect of substraction.</p>
<pre lang="python">
def do_borrow(num, index):
    if index == len(num)-1: raise Exception('cannot perform borrow')

    num_borrows = 1
    next_digit = int(num[index+1])

    if next_digit > 0:
        num[index+1] = str(next_digit-1)

    else:
        num[index+1] = str(9)
        num_borrows += do_borrow(num, index+1)

    return num_borrows
</pre>
<p><br/></p>
<p>This implementation is a bit confusing so check out this example (do_borrow() applied to the problem mentioned above).</p>
<pre lang="python">
>>> x = list('200005')
>>> x.reverse()
>>> x
['5', '0', '0', '0', '0', '2']
>>> do_borrow(x, 1)
4
>>> ''.join(x)
'509991'
>>> x.reverse()
>>> x
['1', '9', '9', '9', '0', '5']
</pre>
<p><br/></p>
<p>Note that do_borrow returned 4, which indicates that the difficulty of performing this borrow operation is 4. Also note that do_borrow takes a list of digits in reversed order. It takes a list because the result is provided in-place (remember that Python strings are immutable). Now that you have seen how the borrowing works, all introduce the subtraction function.</p>
<pre lang="python">
def compute_subtraction_difficulty(*numbers):
    '''
    Generates a difficulty number and result for given
    list of numbers for the subtraction operation
    *args (tuple) - integers
    returns: (difference, difficulty)

    >>> compute_subtraction_difficulty(0,0)
    (0, 1)

    >>> compute_subtraction_difficulty(1,0)
    (1, 1)

    >>> compute_subtraction_difficulty(1,1)
    (0, 1)

    >>> compute_subtraction_difficulty(2,1)
    (1, 3)

    >>> compute_subtraction_difficulty(4,2)
    (2, 2)

    >>> compute_subtraction_difficulty(4,3)
    (1, 3)

    >>> compute_subtraction_difficulty(100,1)
    (99, 10)

    >>> compute_subtraction_difficulty(5000007,9)
    (4999998, 22)
    '''

    numbers = list(numbers)
    if len(numbers) == 0: return 0, 0
    elif len(numbers) == 1: return numbers[0], 0

    num1, num2 = numbers[:2]
    if num1 < num2: num1, num2 = num2, num1

    num1 = str(num1)
    num2 = str(num2)

    max_length = max([len(num1), len(num2)])
    num1 = list(num1.rjust(max_length, '0'))
    num2 = list(num2.rjust(max_length, '0'))

    difficulty = 0
    borrow = 0
    sd = subtraction_difficulties
    difference = []
    num1.reverse()
    num2.reverse()

    for index, (d1, d2) in enumerate( izip( num1, num2 ) ):
        d1 = int(d1)
        d2 = int(d2)

        d1_is_even = is_even(d1)
        d2_is_even = is_even(d2)

        if d1 > d2:
            if not d1 or not d2: difficulty += sd['digit_zero']
            elif d1_is_even and d2_is_even: difficulty += sd['even_even']
            elif not d1_is_even and not d1_is_even: difficulty += sd['odd_odd']
            elif d1_is_even != d2_is_even: difficulty += sd['even_odd']
            ddiff = d1 - d2

        elif d1 < d2:
            num_borrows = do_borrow(num1, index)
            difficulty += sd['borrow']*num_borrows
            ddiff = 10 + d1 - d2
            difficulty += sd['twodigit_digit']

        elif d1 == d2:
            difficulty += sd['same_digits']
            ddiff = 0

        difference.append( str(ddiff) )

    difference.reverse()
    difference = ''.join(difference)
    difference = int (difference)

    numbers = [difference] + numbers[2:]

    difference, sub_difficulty = compute_subtraction_difficulty(*numbers)

    difficulty += sub_difficulty

    return difference, difficulty
</pre>
<p><br/></p>
<h3>Multiplication</h3>
<pre lang="python">
multiplication_difficulties = {
    'carry'  : 1,   # carry operation, summation difficulty is seperate
    'offset' : 1,
}
</pre>
<p><br/></p>
<p>The basic operation of multiplying big numbers is one digit by one digit multiplication. I've tried to capture the difficulty of this operation as follows</p>
<pre lang="python">
def get_single_digit_multiplication_difficulty(d1, d2):
    '''
    Max value of multiplication of two digits is 81 (9*9).
    Difficulty of d1*d2 increases as the resulting value
    of the multiplication increases. thereby 9*9=81 is most difficult
    and 0*0 is least difficult.

    We will represent this difficult with value from 1-4 (inclusive)

    Note that the difficulty will be incremented by 1 for presence
    of odd digit in the operands (except for 1 and 5 because it is arguably
    easier to multiply)

    >>> get_single_digit_multiplication_difficulty(0, 1)
    (0, 1)
    >>> get_single_digit_multiplication_difficulty(1, 1)
    (1, 1)
    >>> get_single_digit_multiplication_difficulty(2, 5)
    (10, 1)
    >>> get_single_digit_multiplication_difficulty(2, 7)
    (14, 2)
    >>> get_single_digit_multiplication_difficulty(2, 6)
    (12, 1)
    >>> get_single_digit_multiplication_difficulty(8, 9)
    (72, 5)
    >>> get_single_digit_multiplication_difficulty(8, <img src='http://blog.prashanthellina.com/wp-includes/images/smilies/icon_cool.gif' alt='8)' class='wp-smiley' />
    (64, 4)
    '''

    if not d1 or not d2: return 0, 1 

    res = d1 * d2
    difficulty = math.ceil((4/81.) * res)

    # odd numbers are harder to multiply except 1 and 5
    if d1 not in [1,5] and d2 not in [1,5] and (is_odd(d1) or is_odd(d2)):
        difficulty += 1

    return res, int(difficulty)
</pre>
<p><br/></p>
<p>At the next level, the operation would be multiplying a multi-digit number by a single digit, which I call simple multiplication. Over here I take into account the difficulty of carry. I hope the following code is self explanatory.</p>
<pre lang="python">
def compute_simple_multiplication_difficulty(num, digit):
    '''
    >>> compute_simple_multiplication_difficulty(1, 0)
    (0, 1)
    >>> compute_simple_multiplication_difficulty(1, 1)
    (1, 1)
    >>> compute_simple_multiplication_difficulty(2, 1)
    (2, 1)
    >>> compute_simple_multiplication_difficulty(10, 1)
    (10, 2)
    >>> compute_simple_multiplication_difficulty(15, 1)
    (15, 2)
    >>> compute_simple_multiplication_difficulty(15, 5)
    (75, 7)
    >>> compute_simple_multiplication_difficulty(999, 7)
    (6993, 25)
    '''
    num = str(num)

    result = []
    md = multiplication_difficulties
    carry = 0
    difficulty = 0

    for index, d in enumerate(reversed(num)):
        d = int(d)
        res, s_diff = get_single_digit_multiplication_difficulty(d, digit)
        difficulty += s_diff

        if (carry):
            res, carry_sum_difficulty = compute_addition_difficulty(res, carry)
            difficulty += carry_sum_difficulty + md['carry']

        carry = res/10
        result_digit = int(str(res)[-1:])
        result.append( str(result_digit) )

    if carry:
        result.append( str(carry) )

    result.reverse()
    result = ''.join(result)
    result = int(result)

    return result, difficulty
</pre>
<p><br/></p>
<p>Simple multiplication is a basic operation of complex multiplication (multi-digit multiplied by multi-digit). Let us see an example of "complex multiplication" which the multiplication code follows.</p>
<pre lang="python">
    345 x
    123
  -----
   1035
   690x  # result is "offset" or multiplied by 10
  345xx  # result if "offset" or multiplied by 100
  -----
  42435  # result of summation of above numbers
  -----
</pre>
<p><br/></p>
<p>The first part of the multiplication is to perform simple multiplications - 345x3, 345x2, 345x1 and to offset the numbers accordingly. The second part of the operation is to sum the numbers. I used the addition algorithm to achieve the second part. Check out the multiplication function.</p>
<pre lang="python">
def compute_multiplication_difficulty(*numbers):
    '''
    Generates a difficulty number and result for given
    list of numbers for the multiplication operation
    *args (tuple) - integers
    returns: (product, difficulty)

    >>> compute_multiplication_difficulty(0,0)
    (0, 1)
    >>> compute_multiplication_difficulty(1,0)
    (0, 1)
    >>> compute_multiplication_difficulty(1,1)
    (1, 1)
    >>> compute_multiplication_difficulty(2,1)
    (2, 1)
    >>> compute_multiplication_difficulty(5,1)
    (5, 1)
    >>> compute_multiplication_difficulty(5,2)
    (10, 1)
    >>> compute_multiplication_difficulty(17,2)
    (34, 7)
    >>> compute_multiplication_difficulty(17,29)
    (493, 20)
    >>> compute_multiplication_difficulty(17,29,3)
    (1479, 31)
    >>> compute_multiplication_difficulty(1776,29,3)
    (154512, 77)
    '''

    numbers = list(numbers)
    if len(numbers) == 0: return 1, 0
    elif len(numbers) == 1: return numbers[0], 0

    num1, num2 = numbers[:2]
    if num1 < num2: num1, num2 = num2, num1

    num2 = str(num2)

    difficulty = 0
    borrow = 0
    md = multiplication_difficulties
    m_numbers = []

    for index, d in enumerate(reversed(num2)):

        d = int(d)

        m_number, m_diff = compute_simple_multiplication_difficulty(num1, d)
        difficulty += m_diff

        if index:
            m_number = int(m_number * math.pow(10, index))
            difficulty += md['offset']

        m_numbers.append(m_number)

    m_numbers_sum, m_numbers_diff = compute_addition_difficulty(*m_numbers)
    difficulty += m_numbers_diff

    numbers = [m_numbers_sum] + numbers[2:]
    product, m_diff = compute_multiplication_difficulty(*numbers)

    difficulty += m_diff

    return product, difficulty
</pre>
<p><br/></p>
<h3>Division</h3>
<pre lang="python">
division_difficulties = {
    # long division
    'use_digit'       : 1,  # brinding down digit from dividend
    'multiple_lookup' : 1,  # looking up precomputed multiple of divisor
    'quotient_update' : 1,  # updating quotient with digit or period
}
</pre>
<p><br/></p>
<p>Implementing long division posed a problem because my understanding of the mechanics behind the long division algorithm was minimal if not inexistent. I spent some time trying to "reverse-engineer" it. I came across "Egyptian Division" which made things clearer. With a little help, I managed to implement the following division algorithm. Please let me know, if you come up with a better approach.</p>
<pre lang="python">
def compute_division_difficulty(dividend, divisor, precision):
    '''
    Generates a difficulty number, quotient and remainder
    for dividend / divisor
    @precision (int) -- max required number of digits after decimal point in quotient
    returns: (quotient, remainder, difficulty)

    >>> compute_division_difficulty(54, 5, 0)
    (10.0, 4, 6)
    >>> compute_division_difficulty(50, 5, 0)
    (10.0, 0, 6)
    >>> compute_division_difficulty(575, 6, 0)
    (95.0, 5, 60)
    >>> compute_division_difficulty(575, 6, 1)
    (95.829999999999998, 20, 112)
    >>> compute_division_difficulty(6, 9, 1)
    (0.66000000000000003, 60, 50)
    >>> compute_division_difficulty(410, 2, 1)
    (205.0, 0, 54)
    '''

    if not divisor: raise Exception('Division by Zero')
    if not dividend: return 0, 0, 1

    dividend = str(dividend)

    divisor  = str(divisor)

    difficulty = 0
    previous_multiples_difficulty = 0
    precision_reached = 0
    decimal_point_used = 0
    dd = division_difficulties
    num = []
    quotient = []

    num = dividend[0]
    difficulty += dd['use_digit']
    index = 0

    while 1:
        q_digit = int(num) / int(divisor)

        multiple, multiple_difficulty = compute_multiples_difficulty(int(divisor), q_digit)
        difficulty += int(math.fabs(multiple_difficulty - previous_multiples_difficulty))
        if previous_multiples_difficulty: difficulty += dd['multiple_lookup']
        previous_multiples_difficulty = max(multiple_difficulty, previous_multiples_difficulty)

        quotient.append( str(q_digit) )
        difficulty += dd['quotient_update']

        if decimal_point_used:
            precision_reached += 1

        num, sub_difficulty = compute_subtraction_difficulty(int(num), multiple)
        difficulty += sub_difficulty
        num = str(num)

        index += 1

        if index == len(dividend):
            if precision == 0: break
            quotient.append('.')
            difficulty += dd['quotient_update']
            decimal_point_used = 1

        if not decimal_point_used:
            num += dividend[index]
        else:
            num += '0'

        difficulty += dd['use_digit']

        if (len(num) == 1 and not int(num)) or precision_reached >= precision + 1:
            break

    remainder = int(num)
    quotient = float(''.join(quotient))

    return quotient, remainder, difficulty
</pre>
<p><br/></p>
<p>You must've noticed the usage of compute_multiples_difficulty(). In long division, at every step, you will try to find the largest multiple of the divisor less than or equal to the number in hand to proceed further. During the process of division, if you've computed the 5th (say) multiple of the divisor with difficulty N. The computation of 6th multiple at a later point in the division is difficulty(lookup of 5th multiple) + difficulty(6-5th multiple).</p>
<h2>Results</h2>
<pre lang="python">
#Problem 1
>>> compute_addition_difficulty(5, 3)
(8, 2)

#Problem 2
>>> compute_addition_difficulty(99999, 12345)
(112344, 22)

#Problem 3
>>> compute_addition_difficulty(1000000, 1000001)
(2000001, <img src='http://blog.prashanthellina.com/wp-includes/images/smilies/icon_cool.gif' alt='8)' class='wp-smiley' />
</pre>
<p><br/></p>
<p>Problem 1: <strong>2</strong><br />
Problem 2: <strong>22</strong><br />
Problem 3: <strong>8</strong></p>
<h2>The End</h2>
<p>I wish I could have written a more detailed explanation instead of sprinkling this article with code. Time however prevents me from doing so <img src='http://blog.prashanthellina.com/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' /> . It is some relief that Python is a <a href="/2008/07/11/even-a-python-can-be-abused/">wonderful language for readability</a> and the code samples above are pretty close to pseudo-code.</p>
<p>I will post back when I complete the program to generate arithmetic problems by order of difficulty. Until then, Adios amigos. Almost forgot! Here is the <a href="http://code.prashanthellina.com/code/atrainer.py">code</a>.</p>
<p><center><img src="http://www.prashanthellina.com/images/meaningoflife.gif" alt="meaning of life math cartoon"/></center></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.prashanthellina.com/2008/07/27/determining-the-difficulty-of-arithmetic-operations/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Even a python can be abused</title>
		<link>http://blog.prashanthellina.com/2008/07/11/even-a-python-can-be-abused/</link>
		<comments>http://blog.prashanthellina.com/2008/07/11/even-a-python-can-be-abused/#comments</comments>
		<pubDate>Fri, 11 Jul 2008 03:16:47 +0000</pubDate>
		<dc:creator>prashanthellina</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[ubuntu]]></category>
		<category><![CDATA[veveo]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[readability]]></category>
		<category><![CDATA[VB]]></category>
		<category><![CDATA[visual basic]]></category>

		<guid isPermaLink="false">http://blog.prashanthellina.com/?p=67</guid>
		<description><![CDATA[The first programming language I coded in is QuickBasic. I loved the simplicity and especially the IDE. It made things simple for a starter. Later I discovered Visual Basic which extended the same simplicity and added the &#8220;Visual&#8221; element with a splendid editor for GUI. In between I did some projects using Java, C#, C, [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.prashanthellina.com/images/python_abuse.jpg" alt="python abuse" align="left"/>The first programming language I coded in is <strong>QuickBasic</strong>. I loved the simplicity and especially the IDE. It made things simple for a starter. Later I discovered <strong>Visual Basic</strong> which extended the same simplicity and added the &#8220;Visual&#8221; element with a splendid editor for GUI.</p>
<p>In between I did some projects using Java, C#, C, C++. None of these impressed me too much. I hated Java&#8217;s imposition of stiff rules and it&#8217;s dogged adherence to &#8220;everything in a class&#8221; attitude. C# was better. C++ just turned me off because of the monster it is. I did not like C at all because of its total lack of automated memory handling (like GC). I&#8217;ve been doing a lot of coding in C now-a-days as part of my job and I must admit that I like it a lot for its simplicity in primitives and promise of &#8220;closeness to hardware&#8221; and hence the predictability and performance.</p>
<p>I did a small part of my final year project using Python. However, for some unfathomable reason, Python did not impress me at all then. When I started working at Veveo I used Python for a project and got hooked. It&#8217;s simplicity and &#8220;readability&#8221; got me. The power of wielding this tool got me drunk <img src='http://blog.prashanthellina.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Python was designed from the beginning to be a &#8220;easy to read&#8221; language. Most, if not all, of the syntax is intuitive. The indentation adds to the readability aspect. The policy of &#8220;only one way to do a thing&#8221; does wonders for readability. Everyone does a certain thing only the &#8220;one&#8221; way. If you are wondering why the heck I am talking so much about readability, you should consider the fact that an average programmer spends <strong>most</strong> of his time &#8220;reading&#8221; code. You have to read your code after you&#8217;ve just written it. You&#8217;ve to read your code the next day when you resume work. You&#8217;ve to read your code the moment a bug is found. You have to read your code when someone asks you how some aspect of it works a couple of months later. You&#8217;ve have to read your code when making a teeny-weeny feature addition. I just cannot emphasize enough how much time is spent is just reading. There have been times when I would spend a whole day just reading code and finally making &#8220;a single line of code change&#8221; at the end of the day!</p>
<p>So there it is. Python makes it possible to write readable code and that does wonders to programmer productivity. Maintaining your code becomes easier. Understanding your collegues code becomes easier and most of all understanding code written by someone across the world becomes easier &#8211; so you can start reusing components more quickly and with more confidence than ever before.</p>
<p><strong><big>I simply love Python.</big></strong></p>
<p>You must be wondering where I am taking this discussion&#8230; because the title says something about a Python being abused. Who is abusing the Python anyway?</p>
<p>I&#8217;ve noticed how newbies code in Python and found it particulary intriguing. What is interesting here is that every person comes from a certain programming background and are used to doing things in a certain way. When they are thrown into a situation where they have to learn a new language and write some code in it, they invariably apply the vast experience they have in their language of choice <img src='http://blog.prashanthellina.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>I&#8217;ve had the oppurtunity of observing people from Java, C, Perl, VB (yes, Perl) backgrounds writing code in Python. The Java guys stress test multiple inhertitance in Python and bring the much cherished &#8220;everything in a class&#8221; practice to the Python. The C guys who are more often than not obsessed with performace and optimization put their brains to work and implement a strcpy using a &#8220;for loop&#8221; and insist on doing a &#8220;shift&#8221; instead of &#8220;division/multiplication&#8221;. The Perl guys just don&#8217;t seem to like the alphanumerals. They craft Python code with ingenious application making it look very concise. The more characters in one line the better the code. The more non-alphanumerals the better coder you are. That&#8217;s the way of the &#8220;Perl&#8217;ies&#8221;. The VB guys languish for a while complaining constantly about the lack of a proper IDE and after trying out various Python editors, decide to call it quits and go home to comfortable VB. I know, I know &#8230;.. I was a VB guy too and I did search for IDE&#8217;s too &#8230; But then I found VIM and everything was good <img src='http://blog.prashanthellina.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>I&#8217;ve had the good (snigger) fortune of maintaining some of these brilliant artifacts and had my share of nightmares and laugh-outs. I thought I had seen it all, until I saw something today. I told myself &#8212; &#8220;Never underestimate a brilliant C programmer who has found exec and eval in Python&#8221; &#8230; Yes, you can quote me on this <img src='http://blog.prashanthellina.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<pre lang="Python">
    guido = "is speechless"
    larry = "went nuts"
    sergey = "has seen it all now"
    ...
    ...
    for idx in ['guido','larry','sergey']:
        idxv=eval(idx)
        if(not idxv and dd.has_key(idx)):
            idxv='%s="%s"' %(idx,dd[idx])
            exec(idxv)

    ...
    ...
</pre>
<p><br/><br />
<strong>No&#8230;.. I don&#8217;t think I&#8217;ve seen it all&#8230;. <img src='http://blog.prashanthellina.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </strong></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.prashanthellina.com/2008/07/11/even-a-python-can-be-abused/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Nose &#8211; TDD &#8211; Python</title>
		<link>http://blog.prashanthellina.com/2008/05/22/nose-tdd-python/</link>
		<comments>http://blog.prashanthellina.com/2008/05/22/nose-tdd-python/#comments</comments>
		<pubDate>Thu, 22 May 2008 15:02:08 +0000</pubDate>
		<dc:creator>prashanthellina</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[nose]]></category>
		<category><![CDATA[tdd]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[tools]]></category>

		<guid isPermaLink="false">http://blog.prashanthellina.com/?p=65</guid>
		<description><![CDATA[What, why I&#8217;ve been reading up on TDD and it has struck me as particularly useful methodology to achieve &#8220;clean code that works&#8221;. TDD encourages writing unit tests to cover all the code (because by definition, you write a test before a line of code is written). Because all your code is covered you are [...]]]></description>
			<content:encoded><![CDATA[<h3>What, why</h3>
<p>I&#8217;ve been reading up on <a href="http://en.wikipedia.org/wiki/Test-driven_development">TDD</a> and it has struck me as particularly useful methodology to achieve &#8220;clean code that works&#8221;. TDD encourages writing unit tests to cover all the code (because by definition, you write a test before a line of code is written). Because all your code is covered you are freed from the fear of breakage due to change and can instantly be more confident and productive. Also, the test cases act as a specification in code &#8211; very useful.</p>
<p>Python has standard modules, <a href="http://docs.python.org/lib/module-unittest.html">unittest</a> and <a href="http://docs.python.org/lib/module-doctest.html">doctest</a> to help you write test cases. I simply love doctest. It alleviates much of the pain of writing a test case (setup and all) besides acting as &#8220;executable documentation&#8221;. The unittest module has a Java legacy and is not to my taste. Also, I wanted to find a solution that would help in automated test enumeration (discovery) in my source directories without having to write any &#8220;infrastructure&#8221; code. One more thing I was looking for was a way to run both unit tests and doc tests together.</p>
<p>After a bit of searching, I found &#8220;<a href="http://somethingaboutorange.com/mrl/projects/nose/">Nose</a>&#8220;. Nose is a clone of &#8220;<a href="http://codespeak.net/py/dist/test.html">py.test</a>&#8221; which I liked better than the original (subjectively). To get a feel of &#8220;Nose&#8221;, I set up some python test files.</p>
<p>The following is the directory structure and the contents of the files. I&#8217;ve put in both unit tests and doc tests in the files to see how &#8220;Nose&#8221; handles them. Also, the tests are spread across directories. Note that I had to put an &#8220;__init__.py&#8221; to allow &#8220;Nose&#8221; to import tests in a subdirectory.</p>
<h3>The setup</h3>
<p><strong>The directory structure</strong></p>
<pre lang="bash">
prashanth@prashanth-desktop:~/tmp$ tree
.
|-- bingo.py
|-- somedir
|   |-- __init__.py
|   `-- test_another.py
`-- test_prashanth.py

1 directory, 4 files
</pre>
<p><br/></p>
<p><strong>bingo.py</strong></p>
<pre lang="python">
def boing(a, b):
    '''
    >>> boing(10, 20)
    30
    '''
    return a+b

def boing1(a, b):
    '''
    >>> boing1(10, 20)
    40
    '''
    return a+b
</pre>
<p><br/></p>
<p><strong>test_prashanth.py</strong></p>
<pre lang="python">
def test_a():
    assert 1

def test_b():
    print "hello"
    assert 0
</pre>
<p><br/></p>
<p><strong>somedir/test_another.py</strong></p>
<pre lang="python">
def test_bingo():
    raise Exception('hgello')
</pre>
<p><br/></p>
<h3>Installing &#8220;Nose&#8221;</h3>
<pre lang="bash">
sudo easy_install nose
</pre>
<p><br/></p>
<p>If you don&#8217;t have easy_install, head over <a href="http://somethingaboutorange.com/mrl/projects/nose/">here</a> to get information on installation.</p>
<h3>Running the tests</h3>
<p>Now that &#8220;Nose&#8221; is installed, let us run the tests,</p>
<pre lang="bash">
nosetests --with-doctest
</pre>
<p><br/></p>
<p>The output is</p>
<pre lang="bash">
..E.F
======================================================================
ERROR: somedir.test_another.test_bingo
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/nose-0.10.2-py2.5.egg/nose/case.py", line 182, in runTest
    self.test(*self.arg)
  File "/home/prashanth/tmp/somedir/test_another.py", line 2, in test_bingo
    raise Exception('hgello')
Exception: hgello

======================================================================
FAIL: test_prashanth.test_b
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/nose-0.10.2-py2.5.egg/nose/case.py", line 182, in runTest
    self.test(*self.arg)
  File "/home/prashanth/tmp/test_prashanth.py", line 7, in test_b
    assert 0
AssertionError:
-------------------- >> begin captured stdout << ---------------------
hello

--------------------- >> end captured stdout << ----------------------

----------------------------------------------------------------------
Ran 5 tests in 0.057s

FAILED (errors=1, failures=1)
</pre>
<p><br/></p>
<p>The first line in the output is the "test progress" indication (..E.F) . When a test succeeds, a '.' is written. When a test fails, an 'F' is written. When a test throws an Exception, an 'E' is written. Very useful to get a sense of progress as a huge test suite being executed.</p>
<p>"Nose" captures the stdout and stderr when a test case fails to help you debug the issue. To <a href="http://ivory.idyll.org/articles/nose-intro.html">learn more about using "Nose" go here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.prashanthellina.com/2008/05/22/nose-tdd-python/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>N-gram data from Project Gutenberg</title>
		<link>http://blog.prashanthellina.com/2008/05/04/n-gram-data-from-project-gutenberg/</link>
		<comments>http://blog.prashanthellina.com/2008/05/04/n-gram-data-from-project-gutenberg/#comments</comments>
		<pubDate>Sun, 04 May 2008 16:40:14 +0000</pubDate>
		<dc:creator>prashanthellina</dc:creator>
				<category><![CDATA[data mining]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[text processing]]></category>
		<category><![CDATA[gutenberg]]></category>
		<category><![CDATA[ngrams]]></category>
		<category><![CDATA[project gutenberg]]></category>
		<category><![CDATA[text parsing]]></category>

		<guid isPermaLink="false">http://blog.prashanthellina.com/?p=63</guid>
		<description><![CDATA[I&#8217;ve been working on Wordza.com for which I needed Ngram data from a sufficiently large corpus. Initially, I thought of using Wikipedia data which I already have on my disk, but decided on using Project Gutenberg data as it is more representative of the general usage of English language. Get Project Gutenberg Ngram data The [...]]]></description>
			<content:encoded><![CDATA[<p>
  I&#8217;ve been working on <A href="http://www.wordza.com" name="Wordza">Wordza.com</A> for which I needed Ngram data from a sufficiently large corpus. Initially,  I thought of using Wikipedia data which I already <A href="/2007/12/21/topic-extraction-using-wikipedia-data/">have on my disk</A>, but decided on using <A href="http://www.gutenberg.org">Project Gutenberg</A> data as it is more representative of the general usage of English language.
</p>
<h2>Get Project Gutenberg Ngram data</h2>
<p>
The Ngram data contains bi-grams and tri-grams for now. I plan to generate uni-grams soon. I&#8217;ve made the data available here so you can download and use it! This data contains all of the e-books hosted by Project Gutenberg (which means the data contains English, French, German and other languages). If you want an English only dataset, check back in a week or two. I am in the process of generating the same.
</p>
<p>
  The Ngram data containing bi-grams and tri-grams. Each line is prepended with the occurence count.<br/><br />
<A href="http://www.prashanthellina.com/docs/gutenberg_data/gutenberg_ngrams.tar.bz2">gutenberg_ngrams.tar.bz2</A> (<strong>624 MB</strong>)<br/></p>
<p><br/></p>
<p>This is the compressed tarball of all the txt files in Project Gutenberg (as of a week before this blog post). Note that you don&#8217;t need this file unless you want to generate the Ngrams yourself using the scripts provided below.<br/><br />
<A href="http://www.prashanthellina.com/docs/gutenberg_data/gutenberg_files.tar.bz2.0">gutenberg_files.tar.bz2.0</A>,<br />
<A href="http://www.prashanthellina.com/docs/gutenberg_data/gutenberg_files.tar.bz2.1">gutenberg_files.tar.bz2.1</A>,<br />
<A href="http://www.prashanthellina.com/docs/gutenberg_data/gutenberg_files.tar.bz2.2">gutenberg_files.tar.bz2.2</A> (<strong>5.3 GB</strong>)<br/></p>
<p>My webserver (Apache) has a problem serving out files bigger than 2GB, so I had to split the file up. After you download the splits, you have to join them like this.</p>
<pre lang="BASH">
mv gutenberg_files.tar.bz2.0 gutenberg_files.tar.bz2
cat gutenberg_files.tar.bz2.1 >> gutenberg_files.tar.bz2
cat gutenberg_files.tar.bz2.2 >> gutenberg_files.tar.bz2
</pre>
<p><br/></p>
<p>To decompress the files, you will need bunzip2 on *nix/Cygwin. On Windows, use 7zip.
</p>
<h2>Generate the data yourself</h2>
<p>
In case you want to generate the Ngrams yourself by processing the Project Gutenberg data files, follow these instructions. You will have to get the Project gutenberg data files. Use the following command to get all the English language files in txt format.</p>
<pre lang="bash">
mkdir gutenberg
cd gutenberg
wget -w 2 -m "http://www.gutenberg.org/robot/harvest?filetypes[]=txt&#038;langs[]=en"
</pre>
<p><br/></p>
<p>The txt files are compressed and stored in files ending with .zip extension. These zip files are spread across multiple directories. The following command will move the zip files into the &#8220;gutenberg&#8221; directory you created in the above step.</p>
<pre lang="BASH">
for i in `find . -name "*.zip"`; do mv $i . ; done;
</pre>
<p><br/></p>
<p>Now that all the zip files are in the same directory, unzip the zip files.Some zip files may contain files other than .txt&#8217;s. The following command extracts only .txt&#8217;s in the zip files.</p>
<pre lang="BASH">
cd ..
mkdir gutenberg_txt
for i in `find gutenberg -name "*.zip"`; do unzip $i \*.txt -d gutenberg_txt/ ; done;
cd gutenberg_txt
for i in `find . -name "*.txt"`; do mv $i . ; done;
cd ..
</pre>
<p><br/></p>
<p>The gutenberg txt files have gutenberg headers and footers which should be removed lest they skew the frequency of Ngrams. The script &#8220;remove_gutenberg_text.py&#8221; does exactly this. The &#8220;generate_ngrams.py&#8221; script creates uni, bi and tri-grams of whatever text is piped into it. The following command pipes all the txt files through both the scripts to create the ngrams file.</p>
<pre lang="BASH">
for i in `find gutenberg_txt/ -name "*.txt"`; \
do cat $i | python remove_gutenberg_text.py | \
grep -i -v "project gutenberg" |\
 python generate_ngrams.py >> gutenberg_ngrams; done;
</pre>
<p><br/></p>
<p>Now you have to count the number of times an ngram occurs. The following sequence of commands process the ngrams file generated above and produce a file with the frequency counts of the ngrams. Note that the &#8220;512K&#8221; option to sort is because I had to run these scripts on my host which kills processes that take too much memory. If you have a machine with a lot of memory, sorting can be significantly faster if you use a higher value, say &#8220;1G&#8221;.</p>
<pre lang="BASH">
sort -S 512K -T tmp_sort/ gutenberg_ngrams > gutenberg_ngrams.sorted
uniq -c gutenberg_ngrams.sorted > gutenberg_ngrams.counted
sort -S 512K -T tmp_sort/ gutenberg_ngrams.counted > gutenberg_ngrams.counted.sorted
</pre>
<p><br/>
</p>
<h3>Gutenberg data processing scripts</h3>
<ul>
<li><A href="http://code.prashanthellina.com/code/remove_gutenberg_text.py">remove_gutenberg_text.py</A> &#8212; removes Project Gutenberg header and footer from txt files</li>
<li><A href="http://code.prashanthellina.com/code/generate_ngrams.py">generate_ngrams.py</A> &#8212; generate uni, bi and tri-grams for any text</li>
</ul>
<h2>Do get back</h2>
<p>If you use this data, I would really appreciate if you get back with details about how you used it in the context of your project</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.prashanthellina.com/2008/05/04/n-gram-data-from-project-gutenberg/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Alexa rank: A script to get the rank for any site</title>
		<link>http://blog.prashanthellina.com/2008/04/22/alexa-rank-a-script-to-get-the-rank-for-any-site/</link>
		<comments>http://blog.prashanthellina.com/2008/04/22/alexa-rank-a-script-to-get-the-rank-for-any-site/#comments</comments>
		<pubDate>Tue, 22 Apr 2008 17:49:55 +0000</pubDate>
		<dc:creator>prashanthellina</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[alexa]]></category>
		<category><![CDATA[alexa rank]]></category>
		<category><![CDATA[programmatically]]></category>
		<category><![CDATA[script]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://blog.prashanthellina.com/?p=61</guid>
		<description><![CDATA[What is Alexa rank? Alexa collects statistics about visits by internet users to websites through the Alexa Toolbar. Based on the collected data, Alexa computes site ranking. By examining the Alexa rank of a site, you can get a rough idea of how popular the site is. Many argue that Alexa rank is misleading but [...]]]></description>
			<content:encoded><![CDATA[<h2>What is Alexa rank?</h2>
<p><a href="http://www.alexa.com">Alexa</a> collects statistics about visits by internet users to websites through the <a href="http://download.alexa.com/">Alexa Toolbar</a>. Based on the collected data, Alexa computes site ranking. By examining the Alexa rank of a site, you can get a rough idea of how popular the site is. Many argue that <a href="http://www.mattcutts.com/blog/thoughts-on-alexa-data/">Alexa rank is misleading</a> but it has its uses.</p>
<h2>The Alexa rank script</h2>
<p>You can find out the Alexa rank for any site by using this page. However, if you want to programatically get the Alexa rank, you can do it using <a href="http://code.prashanthellina.com/code/get_alexa_rank.py">this script</a>.</p>
<p><strong><a href="http://code.prashanthellina.com/code/get_alexa_rank.py">Get the Alexa rank script</a></strong></p>
<h2>Using the script</h2>
<p>After downloading the script, give it execute permission by doing this. You will need to have Python installed.</p>
<pre lang="BASH">
chmod +x get_alexa_rank.py
</pre>
<p><br/></p>
<pre lang="BASH">
$ ./get_alexa_rank.py google.com
popularity rank = 2
reach_rank = 1

$ ./get_alexa_rank.py wikipedia.com
popularity rank = 7
reach_rank = 6

$ ./get_alexa_rank.py blog.prashanthellina.com
popularity rank = 557287
reach_rank = 482289

$ ./get_alexa_rank.py www.inexistantsite.com
popularity rank = -1
reach_rank = -1
</pre>
<p><br/></p>
<h2>How does the script work?</h2>
<p>If you make a http request for the following url,</p>
<blockquote><p>http://data.alexa.com/data?cli=10&#038;dat=snbamz&#038;url=$URL</p></blockquote>
<p>after replacing $URL with the url of the site for which you need the Alexa rank, the following XML response is sent out. I tried with &#8220;http://blog.prashanthellina.com&#8221;.</p>
<pre lang="xml">
<ALEXA VER="0.9" URL="blog.prashanthellina.com/" HOME="0" AID="=">
<RLS PREFIX="http://" more="0">
</RLS>
	<SD TITLE="A" FLAGS="">
<POPULARITY URL="prashanthellina.com/" TEXT="557287"/>
<RANK DELTA="+70225"/>
<REACH RANK="482289"/>
</SD>
</ALEXA>
</pre>
<p><br/></p>
<p>The script parses the XML response and extracts POPULARITY/@TEXT and REACH/@RANK.</p>
<p>If you are looking for a PHP script for doing the same, <a href="http://googlepagerankin.wordpress.com/2008/02/01/alexa-rank-checking-scriptalexa-rank-checker-script/">check this out</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.prashanthellina.com/2008/04/22/alexa-rank-a-script-to-get-the-rank-for-any-site/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Selecting a random row from a table in mysql</title>
		<link>http://blog.prashanthellina.com/2008/04/08/selecting-a-random-row-from-a-table-in-mysql/</link>
		<comments>http://blog.prashanthellina.com/2008/04/08/selecting-a-random-row-from-a-table-in-mysql/#comments</comments>
		<pubDate>Tue, 08 Apr 2008 16:12:42 +0000</pubDate>
		<dc:creator>prashanthellina</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[query]]></category>
		<category><![CDATA[random record]]></category>
		<category><![CDATA[tip]]></category>

		<guid isPermaLink="false">http://blog.prashanthellina.com/?p=60</guid>
		<description><![CDATA[I have come across more than one instance when I had to select a random record from a table in a MySQL database. Here is how to do it. The simple but slow method SELECT * FROM mytable ORDER BY RAND() LIMIT 1; Although simple, the above query can be very slow on tables which [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.prashanthellina.com/images/database_symbol.png" alt="database" align="right" style="border:0px"/><br />
<blockquote><strong>I have come across more than one instance when I had to select a random record from a table in a MySQL database. Here is how to do it.</strong></p></blockquote>
<h2>The simple but slow method</h2>
<p><br/></p>
<pre lang="post">
SELECT * FROM mytable ORDER BY RAND() LIMIT 1;
</pre>
<p><br/></p>
<p>Although simple, the above query can be very slow on tables which have a large number of records. This happens because MySQL makes a temporary table with a random number assigned to each row. It then sorts the table and returns the one record from the sorted list.</p>
<h2>Random record selection by id generation : A better and faster method</h2>
<p>The table will need to have an &#8216;id&#8217; field which is an auto-incrementing integer. The approach is to generate a random number which falls in the range of the values of the &#8216;id&#8217; field. MySQL MAX() and MIN() grouping functions allow you to choose the maximum and minimum values of a field.</p>
<h3> The python approach </h3>
<p>Sample python code to do the random record selection</p>
<pre lang="python">
import random
...
cursor.execute('SELECT MIN(id) FROM mytable')
min_value = cursor.fetchone()
cursor.execute('SELECT MAX(id) FROM mytable')
max_value = cursor.fetchone()

random_id = random.randint(min_value, max_value)

cursor.execute('SELECT * FROM mytable WHERE id = %s', (random_id,))
random_record = cursor.fetchone()
</pre>
<p><br/></p>
<h3> Pure SQL approach </h3>
<p>MySQL offers a RAND() function which generates a random floating point number between 0.0 and 1.0. We will use this value and scale it to fit into the max &#8211; min value range of the id field.<br />
<br/></p>
<pre lang="sql">
SELECT * FROM mytable WHERE id = (SELECT MIN(id) + FLOOR((MAX(id)+1) * RAND()) FROM mytable ) LIMIT 1;
</pre>
<p><br/></p>
<p>So next time you have to select a random record, you know what to do <img src='http://blog.prashanthellina.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p><strong>Additional information</strong></p>
<ul>
<li><a href="http://www.phptoys.com/e107_plugins/content/content.php?content.28">http://www.phptoys.com/e107_plugins/content/content.php?content.28</a>
<li><a href="http://www.carlj.ca/2007/12/16/selecting-random-records-with-sql/">http://www.carlj.ca/2007/12/16/selecting-random-records-with-sql/</a>
<li><a href="http://akinas.com/pages/en/blog/mysql_random_row/">http://akinas.com/pages/en/blog/mysql_random_row/</a>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.prashanthellina.com/2008/04/08/selecting-a-random-row-from-a-table-in-mysql/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Interfacing Python with C using ctypes</title>
		<link>http://blog.prashanthellina.com/2008/01/07/interfacing-python-with-c-using-ctypes/</link>
		<comments>http://blog.prashanthellina.com/2008/01/07/interfacing-python-with-c-using-ctypes/#comments</comments>
		<pubDate>Mon, 07 Jan 2008 17:03:31 +0000</pubDate>
		<dc:creator>prashanthellina</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[c-api]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[ctypes]]></category>
		<category><![CDATA[introduction]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[swig]]></category>

		<guid isPermaLink="false">http://blog.prashanthellina.com/2008/01/07/interfacing-python-with-c-using-ctypes/</guid>
		<description><![CDATA[Python is a wonderful &#8220;very high level&#8221; language with an elegant design. It is an ultimate tool to rapidly develop applications. However, when it comes to performance (speed and memory), Python sucks. It is not meant for performance. So what do you do after building a quick prototype in python if you want it to [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.python.org">Python</a> is a wonderful &#8220;very high level&#8221; language with an elegant design. It is an ultimate tool to rapidly develop applications. However, <strong>when it comes to performance (speed and memory), Python sucks</strong>. It is not meant for performance. So what do you do after building a quick prototype in python if you want it to be lean and mean?</p>
<p>From the beginning, Python&#8217;s designers understood this use-case and exposed a &#8220;C&#8221; API. <strong>Using the Python C-API, one can write &#8220;modules&#8221; in C</strong> which can then be imported into Python. This solution is not bad and provided you have enough patience, works great.  There is a tool called <a href="http://www.swig.org/">SWIG</a> which can generate the &#8220;glue&#8221; code around C code. It <strong>automates writing of code using C-API</strong> and makes it easier for one to maintain the C &#8220;module&#8221;. However, since SWIG generates code, when some problem occurs, it is quite painful to debug through the wrapper code. For the lazy developers out there (like me <img src='http://blog.prashanthellina.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  ), this can be quite a deterrent.</p>
<p>I&#8217;ve been working on a project for using <a href="/2007/10/17/ways-to-process-and-use-wikipedia-dumps/">Wikipedia data</a> to <a href="/2007/12/21/topic-extraction-using-wikipedia-data/">assign &#8220;topics&#8221; to arbitrary pieces of English text</a>. The code written in pure Python takes about 3 minutes to run. When I profiled this code, I found that most of the time was being spent in disk reads (through the db). I decided, after examining the data I had, to load all of it into memory. I tried doing this in pure Python and saw the memory usage creeping up beyond the 4GB capacity when I decided to save my poor machine from thrashing by killing the hog!</p>
<p>On doing some tests, I found that <strong>Python consumes about 32 bytes to store an integer</strong>! Hmmm&#8230;. time to move the data structures to C for more efficient memory usage. I started looking around for tools to interface C code with Python when I came across &#8220;ctypes&#8221; and immediately fell in love with it.</p>
<p><strong>ctypes lets you load any &#8220;shared object&#8221; or &#8220;dynamic link library&#8221; in a Python program</strong> and unobtrusively call functions. You can construct C datatypes required by C functions (pointers, longs, ints, chars, arrays, structs) and even specify callback functions to be passed to C code! I will give you a basic introduction to ctypes to get you started with Python &#8211; C interfacing.</p>
<h3>Code</h3>
<p><big><strong>test.c</strong></big></p>
<pre lang="C">
#include <stdio.h>

// you can initialize stuff in this function
// it is called when the so is loaded
void _init()
{
    printf("Initialization of shared object\n");
}

// you can do final clean-up in this function
// it is called when the so is getting unloaded
void _fini()
{
    printf("Clean-up of shared object\n");
}

int add(int a, int b)
{
    return(a+b);
}

int sum_values(int *values, int n_values)
{
    int i;
    int sum = 0;

    for (i=0; i<n_values; i++)
    {
        sum += values[i];
    }

    return sum;
}
</pre>
<hr/>
<p>We have to compile test.c to create the shared object.</p>
<pre lang="bash">
gcc -fPIC -c test.c
ld -shared -soname libtest.so.1 -o libtest.so.1.0 -lc test.o
</pre>
<hr/>
<p><big><strong>test.py</strong></big></p>
<pre lang="python">
from ctypes import *

# load the shared object
libtest = cdll.LoadLibrary('./libtest.so.1.0')

# call the function, yes it is as simple as that!
print libtest.add(10, 20)

# call the sum_values() function
# we have to create a c int array for this
array_of_5_ints = c_int * 5
nums = array_of_5_ints()

# fill up array with values
for i in xrange(5): nums[i] = i

# since the function expects an array pointer, we pass is byref (provided by ctypes)
print libtest.sum_values(byref(nums), 5)
</pre>
<hr/>
<pre lang="bash">
python test.py
</pre>
<h3>Output</h3>
<div style="background-color: black; color: white">
Initialization of shared object<br />
30<br />
10<br />
Clean-up of shared object
</div>
<p><br/><br />
How much simpler can Python - C interfacing become? ctypes is a standard module from Python 2.5.</p>
<h3>Resources</h3>
<ul>
<li><a href="http://starship.python.net/crew/theller/ctypes/">ctypes</a>
<li><a href="http://www.ibm.com/developerworks/library/l-shobj/">Shared objects for the object disoriented! - IBM Developer Works</a>
<li><a href="http://www.swig.org/">swig</a>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.prashanthellina.com/2008/01/07/interfacing-python-with-c-using-ctypes/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Topic extraction using Wikipedia data</title>
		<link>http://blog.prashanthellina.com/2007/12/21/topic-extraction-using-wikipedia-data/</link>
		<comments>http://blog.prashanthellina.com/2007/12/21/topic-extraction-using-wikipedia-data/#comments</comments>
		<pubDate>Fri, 21 Dec 2007 11:49:14 +0000</pubDate>
		<dc:creator>prashanthellina</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[wikipedia]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[graph]]></category>
		<category><![CDATA[graphviz]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[semantic analysis]]></category>
		<category><![CDATA[text processing]]></category>
		<category><![CDATA[visualization]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/12/21/topic-extraction-using-wikipedia-data/</guid>
		<description><![CDATA[In an earlier article, I mentioned that I was trying to use Wikipedia data to do news article clustering to make it easy for me follow news feeds. I have made some progress. I&#8217;ve written an algorithm to produce a list of Wikipedia articles relevant to the input text. Input text has to be in [...]]]></description>
			<content:encoded><![CDATA[<p><center><br />
    <img src="http://www.prashanthellina.com/images/wiki_topic_graph_header.png" alt="decorative graph header"/><br />
</center></p>
<p><br/></p>
<p>In an earlier <a href="/2007/10/17/ways-to-process-and-use-wikipedia-dumps/">article</a>, I mentioned that I was trying to use Wikipedia data to do <strong>news article clustering</strong> to make it easy for me follow news feeds. I have made some progress. I&#8217;ve written an algorithm to produce a list of Wikipedia articles relevant to the input text. Input text has to be in English. The algorithm will not work well for very short pieces of text. At least a paragraph or two with sizable text are required. The list of Wikipedia articles will represent the &#8220;topic&#8221; of the input text.</p>
<h3>Test run</h3>
<p>To test the algorithm, I gave the text from an earlier article from this blog (<a href="/2007/12/10/accessing-your-home-computer-from-the-internet/">Accessing your home computer from the internet</a>). The top Wikipedia articles in the output are</p>
<ul>
<li>Internet</li>
<li>Domain Name System (DNS)</li>
<li>IP Address</li>
<li>Hypertext Transfer Protocol (HTTP)</li>
<li>Modem</li>
<li>World Wide Web (WWW)</li>
<li>Domain Name</li>
<li>Dynamic Host Configuration Protocol (DHCP)</li>
<li>Internet Service Provider</li>
<li>Network Address Translation (NAT)</li>
<li>Firewall</li>
</ul>
<h3>How it works?</h3>
<p>The basis of the algorithm is to find Wikipedia article titles occuring in the input text. The &#8220;found&#8221; set of Wikipedia articles are then used to construct a sub-graph from the Wikipedia graph (formed by linkages between Wikipedia pages). The most interconnected nodes happen to be relevant. However, as I have not applied any filtering on the input text, a lot of &#8220;junk&#8221; matches happened. For example, the word &#8220;let&#8221; is picked up and it matches a Wikipedia article by the same title which redirects to Lashkar-e-Toiba. This is totally irrelevant to the input text. To remove such spurious matches, I dropped all the least interconnected nodes and constructed a sub-graph with the remaining nodes. In the sub-graph, I did recomputation for node interconnection.</p>
<p>Below is the output of the first phase. This graph contains all nodes found from matching phrases in the input text. The <strong>nodes of darker blue are more relevant than lighter ones</strong>. The <strong>darker and thicker a link is, the more relevant</strong> it is.<br />
<a href="http://www.prashanthellina.com/images/wiki_topic_full_graph_big.png"><br />
    <img src="http://www.prashanthellina.com/images/wiki_topic_full_graph.png" alt="full graph with all found wikipedia titles"/><br />
</a><br />
Download higher resolution image <a href="http://www.prashanthellina.com/images/wiki_topic_full_graph_big.png">here</a>. <strong>8.2MB</strong></p>
<p>A lot of extracted articles are not relevant to the input text. Some of these spurious nodes are totally <strong>disconnected from the main body</strong> of the graph.<br />
<img src="http://www.prashanthellina.com/images/wiki_topic_disconnected_nodes.png" alt="disconnected nodes in the full graph"/></p>
<p>This is a slightly higher resolution picture of a <strong>section of the full graph</strong> above.<br />
<img src="http://www.prashanthellina.com/images/wiki_topic_full_graph_section.png" alt="section of the full graph containing some relevant nodes"/></p>
<p>Below is the <strong>output of second phase</strong> of the algorithm where relevant nodes are extracted and a <strong>sub-graph</strong> computed. Node and edge relevances are recomputed within this set.<br />
<a href="http://www.prashanthellina.com/images/wiki_topic_sub_graph_big.png"><br />
    <img src="http://www.prashanthellina.com/images/wiki_topic_sub_graph.png" alt="sub graph containing only relevant nodes"/><br />
</a><br />
Download the higher resolution image <a href="http://www.prashanthellina.com/images/wiki_topic_sub_graph_big.png">here</a>. <strong>1.9MB</strong></p>
<p>All the graphs above were produced using <a href="http://www.graphviz.org">Graphviz</a>.</p>
<h3>What next</h3>
<p>I tried applying the logic to some sample input texts and results look very encouraging. The next step towards news article clustering would be apply the topic extraction algorithm to multiple news articles and look for common Wikipedia articles (maybe plain intersection). I still have not given much thought to this stage. Once I do, I will post back.</p>
<p>As I said before, Wikipedia amazes me every time I use it. The wealth of information (both as text and as interconnects) is astounding. As a token of appreciation, I&#8217;ve donated a small amount to the current Wikipedia donation round. If you like Wikipedia and have used it, do consider making a donation.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.prashanthellina.com/2007/12/21/topic-extraction-using-wikipedia-data/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Language People &#8211; Interesting picture</title>
		<link>http://blog.prashanthellina.com/2007/11/29/language-people-interesting-picture/</link>
		<comments>http://blog.prashanthellina.com/2007/11/29/language-people-interesting-picture/#comments</comments>
		<pubDate>Thu, 29 Nov 2007 06:50:37 +0000</pubDate>
		<dc:creator>prashanthellina</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[picture]]></category>

		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/11/29/language-people-interesting-picture/</guid>
		<description><![CDATA[I like the representation for Logo, Machine Language, Prolog and Ada. Wonder what &#8220;N.W&#8221; is&#8230; (the Modula-2 guy is holding it). I wish python was featured too but the picture says &#8220;&#8217;85&#8243;. Python did not even exist then! original from here]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.prashanthellina.com/images/language_family.gif"><img src="http://www.prashanthellina.com/images/language_family_small.png" alt="language people thumbnail"/></a></p>
<p>I like the representation for Logo, Machine Language, Prolog and Ada. Wonder what &#8220;N.W&#8221; is&#8230; (the Modula-2 guy is holding it). I wish python was featured too <img src='http://blog.prashanthellina.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  but the picture says &#8220;&#8217;85&#8243;. Python did not even exist then!</p>
<p><a href="http://luisguillermo.com/PL_ega.gif">original from here</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.prashanthellina.com/2007/11/29/language-people-interesting-picture/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Ways to process and use Wikipedia dumps</title>
		<link>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/</link>
		<comments>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/#comments</comments>
		<pubDate>Wed, 17 Oct 2007 16:44:13 +0000</pubDate>
		<dc:creator>prashanthellina</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[text processing]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/</guid>
		<description><![CDATA[&#160; Wikipedia is a superb resource for reference (taken with a pinch of salt of course). I spend hours at a time spidering through its pages and always come away amazed at how much information it hosts. In my opinion this ranks amongst the defining milestones of mankind&#8217;s advancement. Apart from being available through http://www.wikipedia.org, [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.prashanthellina.com/images/wikipedia_logo_small.png" align="left" alt="http://en.wikipedia.org"/> &nbsp;<br />
<strong><a href="http://www.wikipedia.org">Wikipedia</a></strong> is a superb resource for reference (taken with a pinch of salt of course). I spend hours at a time spidering through its pages and always come away amazed at how much information it hosts. In my opinion this ranks amongst the defining milestones of mankind&#8217;s advancement.</p>
<p>Apart from being available through http://www.wikipedia.org, the data is provided for download so that you can create a mirror locally for quicker access. This is very convenient when you are not connected to the internet, say when you are on the move.</p>
<h4>Setting up a local copy of Wikipedia</h4>
<p><strong>Windows</strong><br />
If you have Windows installed, Webaroo is an easy way to get Wikipedia locally as a &#8220;web pack&#8221;. Check out Webaroo <strong><a href="http://www.webaroo.com">here</a></strong>. Another way on Windows is to use <strong><a href="http://wikifilter.sourceforge.net/">WikiFilter</a></strong>. I tried WikiFilter and found it a good option (It is open source, so you can tweak it). It takes up around 3 to 3.5GB on your disk.</p>
<p><strong>Linux</strong><br />
This <strong><a href="http://www.pilhokim.com/pilhowiki/index.php?title=EChronicle:Importing_Wikipedia">page</a></strong> has instructions to setup on Linux. &#8220;<strong><a href="http://www.softlab.ntua.gr/~ttsiod/buildWikipediaOffline.html">Building a (fast) Wikipedia offline reader</a></strong>&#8221; is a good option too.</p>
<p><strong>Any operating system</strong><br />
Wikipedia provides <strong><a href="http://static.wikipedia.org/">static wiki dumps</a></strong> for download which should work fine on any operating system that supports a decent web browser. Although I have not tried it, I have heard that the dumps can take up as much as 80GB of space on your disk.</p>
<p><strong>Windows Mobile, iPhone and Blackberry</strong><br />
To access Wikipedia from your mobile, check out <strong><a href="http://vtap.com">vTap</a></strong> from <strong><a href="http://corporate.veveo.net/">Veveo</a></strong>. I must tell you that I work for this company but I am being very objective in suggesting this service to you. A Java version is being developed and will be out soon. Since the space on mobile devices is very limited, the data is hosted on vTap servers and network connectivity is required.</p>
<h4>Other uses for Wikipedia data dumps</h4>
<p>In being such a vast repository of knowledge, Wikipedia is useful in many other ways. I want to use Wikipedia&#8217;s data to handle the feeds I read every day. The same news article comes in from different sources and multiple times from the same source and I end up reading all of them. I am going to try and use Wikipedia to help me automatically pull together these news articles and cluster them around topics.</p>
<p>The first step in this experiment would be to get the data dumps from Wikipedia and process them to load them into a Mysql database. Once we get the data into a database, things become more manageable.</p>
<p><strong>Getting the dumps</strong></p>
<p>    Wikipedia is <strong>huge</strong> and this reflects in the data dumps. It took me about 40 hours to get the articles xml alone on my home connection. Put together all the relevant files of Wikipedia dumps come to 5 GB. I got the dumps from <strong><a href="http://download.wikimedia.org/enwiki/20070908/">here</a></strong>. You can check for more recent ones <strong><a href="http://download.wikimedia.org/enwiki/">here</a></strong>.</p>
<p><strong>Files to get</strong></p>
<ul>
<li> pages-articles.xml.bz2 (2.8 GB) &#8211; xml containing page texts
<li> redirect.sql.gz (10.5 MB) &#8211; sql for redirected pages info
<li> page.sql.gz (318.9 MB) &#8211; minimal page information (id, page title)
<li> externallinks.sql.gz (439.6 MB) &#8211; links from pages to external sites
<li> categorylinks.sql.gz (235.5 MB) &#8211; categories
<li> pagelinks.sql.gz (1.2 GB) &#8211; inter wiki page links
</ul>
<p>Since DreamHost (My Web host) offers a lot of disk space and the network connection is way better than the one I have at home, I downloaded the files to my DreamHost space. (read more about DreamHost <strong><a href="http://blog.prashanthellina.com/2007/10/13/dreamhost-my-wonderful-web-host/">here</a></strong>)</p>
<p>I was able to download all the files using wget but for &#8220;pages-articles.xml.bz2&#8243;. For some reason I cannot understand, Wget was bailing out after downloading a few bytes (this seems to be a DreamHost specific issue). To work around this issue, I wrote this python script</p>
<p><strong>get_wiki_file.py</strong></p>
<pre lang="python">
import urllib2, sys

url = "http://download.wikimedia.org/enwiki/20070908/enwiki-20070908-pages-articles.xml.bz2"
outfname = 'enwiki-20070908-pages-articles.xml.bz2'

o = open(outfname, 'wb')
r = urllib2.urlopen(url)
total_bytes = 0
counter = 0

while 1:
        bytes = r.read(10240)
        total_bytes += len(bytes)
        o.write(bytes)
        if len(bytes) < 10240: break
        counter += 1
        if counter % 10 == 0:
                counter = 0
                print "%.2f MB" % (total_bytes/1024.0/1024.0)
o.close()
</pre>
<p><strong>Preparing dumps</strong></p>
<p>The next step was to extract all the archives. I used <em>bunzip2</em> for .bz2 files and <em>gunzip</em> for the .gz files. I tried loading one of the large .sql files into the database and the process was killed (I guess this is because DreamHost does not like resource hungry processes running for a long time). To work around this I had to split all the big .sql files into smaller chunks.</p>
<p>for example</p>
<pre lang="bash">
split -d -l 50 ../enwiki-20070908-page page.input.
</pre>
<p>-l option tells split how many lines per split we need and -d tells split to use numerical suffix (which will be useful soon).</p>
<p>However the pages-article file is an xml and not sql. To load it into the database, I had to first convert it to a .sql dump. xml2sql is a handy program for doing this. You can get it <strong><a href="http://meta.wikimedia.org/wiki/Xml2sql">here</a></strong>.</p>
<pre lang="bash">
xml2sql -v -m pages-articles.xml
</pre>
<p>This command will produce text.sql, page.sql and revision.sql. However, I ran into a problem here because xml2sql was leaking and slowly rose to 88 MB of resident memory when it got killed by the DreamHost process. I tried running <strong>valgrind</strong> on it but could not find any leaks (they must be getting freed on exit).</p>
<p>This forced me to split the huge xml into manageable parts so that xml2sql would stay within 88MB. I wrote this python script for splitting the xml.</p>
<p><strong>split_xml_dump.py</strong></p>
<pre lang="python">
import os, os.path

fname = "enwiki-20070908-pages-articles.xml"
total_page_counter = cur_page_counter = 0
o = None
outf_counter = 0
outdir = 'pages_xml_splits/'

for line in open(fname):
        if line.startswith("<mediawiki") or line.startswith("</mediawiki>"):
                continue

        if o is None or cur_page_counter == 250000:
                cur_page_counter = 0
                outf_counter += 1
                outfname = 'pagexml.%d' % outf_counter
                outfname = os.path.join(outdir, outfname)
                print outfname

                if o:
                        o.write('</mediawiki>\n')
                        o.close()

                o = open(outfname, 'wb')
                o.write('<mediawiki>\n')

        if line == '  </page>\n':
                cur_page_counter += 1
                total_page_counter += 1
                if total_page_counter % 10000 == 0: print total_page_counter

        o.write(line)

if o:
        o.write('</mediawiki>\n')
        o.close()

print total_page_counter
</pre>
<p>Once the file got split not chunks of 250,000 articles each, I used xml2sql on each chunk to get the corresponding text.sql.</p>
<p><strong>Loading the dumps into the database</strong></p>
<p>I had now readied all the input (a bunch of .sql files) and had to load them into the database. Before I started the load, I had to create a database to hold the tables and the "<strong>text</strong>" table.</p>
<pre lang="sql">
mysql -h hostname -u username -ppassword
> create database wiki;
> create table `text` (
  old_id int unsigned NOT NULL auto_increment,
  old_text mediumblob NOT NULL,
  old_flags tinyblob NOT NULL,
  PRIMARY KEY old_id (old_id)
) MAX_ROWS=10000000 AVG_ROW_LENGTH=10240;
</pre>
<p>It took about 90 minutes for the splitting to get over. This script will load the .sql files into the database one after the other.</p>
<p><strong>load_splits.py</strong></p>
<pre lang="python">
import os
import os.path
import glob
import shutil

while 1:
        fnames = [os.path.basename(f) for f in glob.glob('splits/*.input.*')]
        fnames = [(int(f.split('.')[2]), f) for f in fnames]
        fnames.sort()

        if len(fnames) == 0: break
        print "found %d files" % len(fnames)

        fname = os.path.join('splits/', fnames[0][1])
        to_fname = os.path.join('processed_splits/', fnames[0][1])
        error_fname = os.path.join('processed_splits/', fnames[0][1] + '.error')

        print "processing %s" % fname

        cmd = 'mysql -h hostname -ppassword -u username wiki < "%s"' % fname
        result = os.system(cmd)
        if result != 0:
                shutil.move(fname, error_fname)
        else:
                shutil.move(fname, to_fname)

        print "processed %s" % fname
</pre>
<p>Loading the .sql files into the database will take a long long time. I started it yesterday morning and it is still running! As the data is loading, you can check out this <a href="http://upload.wikimedia.org/wikipedia/commons/4/41/Mediawiki-database-schema.png">Wikipedia database schema diagram</a>.</p>
<p>In a continuation to this article, I will write about how I will use the Wikipedia database to streamline my news feeds.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/feed/</wfw:commentRss>
		<slash:comments>22</slash:comments>
		</item>
		<item>
		<title>vTap Windows Mobile source code</title>
		<link>http://blog.prashanthellina.com/2007/09/29/vtap-windows-mobile-source-code/</link>
		<comments>http://blog.prashanthellina.com/2007/09/29/vtap-windows-mobile-source-code/#comments</comments>
		<pubDate>Sat, 29 Sep 2007 14:52:35 +0000</pubDate>
		<dc:creator>prashanthellina</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[veveo]]></category>

		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/09/29/vtap-windows-mobile-source-code/</guid>
		<description><![CDATA[Veveo has released the source code for the windows mobile client application. This is great because it gives you a way to fine tune our app to suit your needs. You can sign up for the developer program here to receive updates from Veveo. Get the source here.]]></description>
			<content:encoded><![CDATA[<p>Veveo has released the source code for the windows mobile client application. This is great because it gives you a way to fine tune our app to suit your needs. You can sign up for the <strong><a href="http://vtap.com/developer.html">developer program</a></strong> here to receive updates from Veveo. Get the source <strong><a href="http://vtap.com/vtap_wince_release_122.zip">here.</a></strong></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.prashanthellina.com/2007/09/29/vtap-windows-mobile-source-code/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Word arithmetic puzzle generator</title>
		<link>http://blog.prashanthellina.com/2007/08/26/word-arithmetic-puzzle-generator/</link>
		<comments>http://blog.prashanthellina.com/2007/08/26/word-arithmetic-puzzle-generator/#comments</comments>
		<pubDate>Sun, 26 Aug 2007 13:21:34 +0000</pubDate>
		<dc:creator>prashanthellina</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[puzzle]]></category>
		<category><![CDATA[word arithmetic]]></category>

		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/08/26/word-arithmetic-puzzle-generator/</guid>
		<description><![CDATA[I was digging through some old code of mine when I came across this script. To me, this script was a demonstration of the beauty and elegance of the python language. You might have come across a puzzle like this: AID + ICED = IDEA. What digits would you assign to the characters A,C,D,E,I so [...]]]></description>
			<content:encoded><![CDATA[<p>I was digging through some old code of mine when I came across this script. To me, this script was a demonstration of the beauty and elegance of the python language.</p>
<p>You might have come across a puzzle like this:<br />
AID + ICED = IDEA. What digits would you assign to the characters A,C,D,E,I so that the arithmetic will work?<br />
The answer is A=0, C=4, D=5, E=1, I=2. </p>
<p>The &#8220;word arithmetic&#8221; script can generate such puzzles given a word list and characters.<br />
Here is the <a href="http://code.prashanthellina.com/code/word_arithmetic.py">code</a> and a <a href="http://code.prashanthellina.com/code/words">sample word list</a>.</p>
<p>Running the script:</p>
<pre lang="BASH">
python word_arithmetic.py --words-file words --letters AEIBCDFGHI
</pre>
<p>In the command above you are asking the script to generate word combinations where A=0, E=1, I=2, B=3 and so on.</p>
<p>Sample output</p>
<pre lang="BASH">
letters = A=0, E=1, I=2, B=3, C=4, D=5, F=6, G=7, H=8, I=9
FEE + AHA = FIE
DACCA + BEEBE = HEDGE
ACHED + AHA = CHID
BIG + BAH = GAD
BIB + ACED = HAH
DADA + HIE = DICE
BIG + CAF = AHAB
ABB + CHE = DEC
AID + ICED = IDEA
...
...
</pre>
]]></content:encoded>
			<wfw:commentRss>http://blog.prashanthellina.com/2007/08/26/word-arithmetic-puzzle-generator/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Matrix Desktop</title>
		<link>http://blog.prashanthellina.com/2007/08/22/matrix-desktop/</link>
		<comments>http://blog.prashanthellina.com/2007/08/22/matrix-desktop/#comments</comments>
		<pubDate>Wed, 22 Aug 2007 16:32:26 +0000</pubDate>
		<dc:creator>prashanthellina</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[desktop]]></category>
		<category><![CDATA[gnome]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://blog.prashanthellina.com/2007/08/22/matrix-desktop/</guid>
		<description><![CDATA[What you see above is how my desktop looks now. You need to be using gnome to get this working. Nautilus draws the desktop (including the icons) for you in gnome by default. We have to tell it to stop doing that, so we can do the matrix animation in its place. Nautilus can be [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.prashanthellina.com/images/matrix_desktop.gif" alt="Matrix Desktop Animation" /></p>
<p>What you see above is how my desktop looks now. You need to be using gnome to get this working.</p>
<p>Nautilus draws the desktop (including the icons) for you in gnome by default. We have to tell it to stop doing that, so we can do the matrix animation in its place. Nautilus can be configured using gconf.</p>
<pre lang="BASH">gconftool-2 --type bool --set /apps/nautilus/preferences/show_desktop false</pre>
<p>Now that we have the desktop to ourselves, let us ask xscreensaver &#8216;glmatrix&#8217; to start drawing itself in the desktop window (&#8216;root&#8217; window).</p>
<pre lang="BASH">/usr/lib/xscreensaver/glmatrix -root</pre>
<p>If you want the animation every time you log in, open &#8220;~/.config/autostart/glmatrix.desktop&#8221; in your text editor and paste the following.</p>
<pre lang="BASH">
[Desktop Entry]
Version=1.0
Encoding=UTF-8
Name=No name
Name[en_IN]=Desktop matrix
Exec=/usr/lib/xscreensaver/glmatrix -root
X-GNOME-Autostart-enabled=true
</pre>
<p>(<small>suggested by ElecBoy</small>) After playing around, if you want to get back to your default desktop, do </p>
<pre lang="BASH">gconftool-2 --type bool --set /apps/nautilus/preferences/show_desktop true &#038;&#038; nautilus</pre>
]]></content:encoded>
			<wfw:commentRss>http://blog.prashanthellina.com/2007/08/22/matrix-desktop/feed/</wfw:commentRss>
		<slash:comments>41</slash:comments>
		</item>
	</channel>
</rss>

