All your aliases are belong to you

I like setting up shortcuts to frequently used commands whether I used Windows or Linux. I use the terminal often and create shortcuts to frequently used commands using “alias” feature of BASH. This has saved me considerable time in the past. However, I recently felt that if I could have a helper tool to monitor my usage of commands and automatically suggest candidates for aliasing, that would be useful. The output of that is Aliaser.

Aliaser works by monitoring your bash history. It analyses command frequency and suggests candidates for aliasing. It manages aliases so created. The feature I like most in Aliaser is that it reminds you to use the aliases you created by showing tips on opening a new terminal session.

Download Aliaser from
Read More »

Query Wikipedia from your terminal

I refer Wikipedia frequently. I use this BASH function to help me do that from the terminal. For explanation of how this works head over here.

BASH function

# wiki 
# eg: wiki India
#     wiki Apple_Inc
#     wiki Anglo_Saxon
    dig +short txt $

Example usage

prashanth@prashanth-desktop:~$ wiki India
"India, officially the Republic of India ( '\; see also other Indian languages), is a country in South Asia.
It is the seventh-largest country by geographical area, the second-most populous country, and the most
populous democracy in the world. Bounded by t" "he Indian Ocean on the south, the Arabian Sea on
the west, and the Bay of Bengal on the east, India has a coastline of ..."

prashanth@prashanth-desktop:~$ wiki Anglo_Saxon
"Anglo-Saxons (or Anglo-Saxon) is the term usually used to describe the invading tribes in the south
and east of Great Britain starting from the early 5th century AD, and their creation of the English
nation, lasting until the Norman conquest of 1066. The " "Benedictine monk, Bede, identified
them as the descendants of three Germanic tribes:"

Command-line language translation

Here is a simple utility created using Python for translating text from various languages into English. It uses the Google AJAX API to do this.


prashanth@prashanth-desktop:~$ translate bonjour
prashanth@prashanth-desktop:~$ translate guten morgen
Good morning

Read More »

On setting up USB RAID

I bought two Dane-Elec 8GB USB drives recently. Flash memory (as opposed to Hard disk storage) has faster “seek” capability. This is inherent in the design as flash memory is solid state whereas hard disks are electro-mechanical with a “head” that needs to be moved around using a “drive” mechanism. Since seek times are better on flash drives, they are faster when you are reading or writing a lot of small files.

However flash drives do not have sustained data transfer rates that hard disks have (i.e throughput). My thought process what that the throughput can be made up by slapping together two or more USB drives and applying software RAID 0 over them. Below are some performance results and they look encouraging.
Read More »

Extracting relevant text from HTML pages

Some time back I had done some work on extracting topics from an arbitrary piece of text using Wikipedia data. Recently I thought of a concept to put that algorithm to work. As a part of this project, I need to extract relevant text from an arbitrary HTML page. By relevant I mean the “meat” of the page devoid of navigation links and side-content.

This algorithm has the following steps:

  • make doc from html data (clean html)
  • identify content nodes (nodes having substantial content)
  • prune xml tree to remove irrelevant nodes
  • get the most linked node from pruned tree (subtree contains relevant text)
  • make the dot graph

Read More »

Clustering Data using Python

As a part of a project I am working on, I had to cluster urls on a page. After some light googling I found, python-cluster. You can find below a simple python script to illustrate the usage of python-cluster library.
Read More »

XMonad: A Window Manager for “real” people :)

XMonad in Gnome I have been a happy Gnome user for many years now and only recently started thinking about switching to KDE 4.2 when Ubuntu 9.04 (Jaunty Jackalope) comes out. However, it so happened that I bought two new widescreen monitors and setup a dual-monitor environment. This is when I started realizing the Gnome was clumsy at best when it comes to managing windows across monitors.

The reason I bought multiple monitors is to maximize my work area so I do not have to keep switching between overlapping windows. Gnome it seems is ill-suited to effectively and effortlessly managing space.
Read More »

Microsoft Surface: Some Videos

I had written earlier about my experience with Microsoft Surface. I’ve captured some videos of me using it. Here they are …
Read More »

My Dual Monitor Setup

Greetings webizen, I tried hard to get back to my blogging schedule but my laziness got the better of me. I am back truly with a new batch of posts which I will publish over the next few days.

Recently, I went to Veveo’s main office near Boston, USA and had the privilege of experiencing a finger freezing winter! (Not to mention a three day power-cut which I spent under multiple layers of blankets).

A few weeks before I was going to return to India, I started shopping. One of things I bought was a Refurbished NVidia GeForce 9800 GTX+ by EVGA. It was the best deal I could find on NewEgg in terms of value for money. I bought it primarily to have a betterr experience when flying over Paris … in Google Earth.
Read More »

Microsoft Surface Unboxing

Today, we received the shipment from Microsoft at Veveo. If you have not heard of Microsoft Surface before, It is a touch screen based computer embedded in a table. The surface of table is illuminated from underneath by a projector (rear-projection) and touch input is implemented by reflecting IR radiation off the fingers and then being captured by five IR camera hidden inside the unit.

To learn more about Microsoft Surface head over to:

Read More »

Determining the difficulty of Arithmetic Operations

kid math problemI am trying to write a program to test my arithmetic skills. The program should pose arithmetic problems involving the four basic operations – addition, subtraction, multiplication and division. When the testing session starts, the program should issue problems of less difficulty and the difficulty should be ramped up gradually. A score should be computed based on number of questions and the difficulty of questions. The hope is that if I keep using this program for a little time every day, I’ll be able to improve my abysmal arithmetic performance :)
Read More »

Even a python can be abused

python abuseThe first programming language I coded in is QuickBasic. I loved the simplicity and especially the IDE. It made things simple for a starter. Later I discovered Visual Basic which extended the same simplicity and added the “Visual” element with a splendid editor for GUI.

In between I did some projects using Java, C#, C, C++. None of these impressed me too much. I hated Java’s imposition of stiff rules and it’s dogged adherence to “everything in a class” attitude. C# was better. C++ just turned me off because of the monster it is. I did not like C at all because of its total lack of automated memory handling (like GC). I’ve been doing a lot of coding in C now-a-days as part of my job and I must admit that I like it a lot for its simplicity in primitives and promise of “closeness to hardware” and hence the predictability and performance.
Read More »

Look who’s downloaded Firefox 3!

Firefox 3 has been getting rave reviews ever since it got in to beta. The blogosphere was abuzz with reports on how much more efficient and snappier FF3 is compared to its earlier incarnations and more so with respect to the competition (Opera, IE7, Safari).

Features like “Places” (Bookmarks on steroids), Cairo for rendering and OS specific widgets have made the best browser better. At the time of writing of this blog post 6 million plus downloads from around the world have already happened.

Everywhere I look at work, I see the “Download Day” certificate from Mozilla corp … I got one myself too :)

But the question is ….

Do you know who else got the certificate ???

Read More »

Nose – TDD – Python

What, why

I’ve been reading up on TDD and it has struck me as particularly useful methodology to achieve “clean code that works”. TDD encourages writing unit tests to cover all the code (because by definition, you write a test before a line of code is written). Because all your code is covered you are freed from the fear of breakage due to change and can instantly be more confident and productive. Also, the test cases act as a specification in code – very useful.
Read More »

Project Gutenberg Ngram data: English only

In my earlier post, I’d posted links to the Project Gutenberg Ngram data I had computed for e-books of all languages. If you are interested in only the English data, get these files instead.

These two files are splits of a compressed file which contains all of the Project Gutenberg English e-books downloaded about a week before the date of this post.

gutenberg_en_files.tar.bz2.0 (2.0GB)

gutenberg_en_files.tar.bz2.1 (1.4GB)

Unigrams along with frequency count from the text data above

gutenberg_en_unigrams.tar.gz (7.4MB)

Bi-grams and Tri-grams along with frequency count from the text data above

gutenberg_en_bi_tri_grams.tar.gz (493MB)

I had to split the files because my webserver has a limitation in serving out files larger than 2GB. After downloading the files, do this

mv gutenberg_en_files.tar.bz2.0 gutenberg_en_files.tar.bz2
cat gutenberg_en_files.tar.bz2.1 >> gutenberg_en_files.tar.bz2
rm gutenberg_en_files.tar.bz2.1

If you find the data useful, I’d be delighted to hear the context in which you made use of it.

N-gram data from Project Gutenberg

I’ve been working on for which I needed Ngram data from a sufficiently large corpus. Initially, I thought of using Wikipedia data which I already have on my disk, but decided on using Project Gutenberg data as it is more representative of the general usage of English language.

Read More »

Wordza – A Smart Word Quizzer

I’d thought of making a word quizzer as a web application to improve my vocabulary when I took the GRE test a couple of years back. I’d written one in Visual Basic 6 when I wrote SAT :), but desktop applications are boring!

I got inspired to bring my long standing idea to fruition and the outcome is Wordza.

Read More »

Alexa rank: A script to get the rank for any site

What is Alexa rank?

Alexa collects statistics about visits by internet users to websites through the Alexa Toolbar. Based on the collected data, Alexa computes site ranking. By examining the Alexa rank of a site, you can get a rough idea of how popular the site is. Many argue that Alexa rank is misleading but it has its uses.

The Alexa rank script

You can find out the Alexa rank for any site by using this page. However, if you want to programatically get the Alexa rank, you can do it using this script.
Read More »

Selecting a random row from a table in mysql


I have come across more than one instance when I had to select a random record from a table in a MySQL database. Here is how to do it.

Read More »

How to generate domain names, place names … product names ?

One of the trickiest and enjoyable parts of starting something new (be it a website, project, band) is naming it! Sometimes a good name can be quite elusive and cause more than the deserved share of brain ache. Here is a list of automated services around the internet that will help you get name suggestions.

Let us name a domain!

domain tools

DomainTools takes a concept as input and comes up with domain name suggestions. Let us say you are starting a website about “Vacations in Mexico”. Go to their website and type in “Mexico Vacations” in the text box and click on the button to get suggestions.
Read More »

Visualizing mpeg4 motion compensation vectors using mplayer

The MPEG4 video encoding process makes use of block motion compensation to achieve compression. The motion compensation process serves to produce the intra frames which are the frames between keyframes. I’ve always been fascinated by this process and was delighted to find out that my favorite video player, mplayer, allows one to visualize this process. I tried it and it is wonderful!
Read More »

Creating video thumbnails using ffmpeg

Generating thumbnails/screenshots of a video is useful in many ways. Youtube and many other video sites use this to show a preview of the video as a small thumbnail. Google video captures a series of thumbnails from a video at various time intervals to show a better video preview.
Read More »

Songza – music search engine and jukebox


Every once in a while, someone comes up with a way of doing things in an extremely obvious and simple way. When this happens, a zillion others say, “of course that’s the way to do it!”. Songza is a music search engine and jukebox that is dead simple to use. You should try it to really grok how simple the interface is.
Read More »

Rendezvous with Rama – Goodbye Mr. Clarke

I am a huge fan of the science fiction genre. Arthur C Clarke is one of my favorite science fiction writers after Isaac Asimov. It saddens me to have learnt that he has passed away. Most people get reminded of “2001: A Space Odyssey” when they hear the name Arthur C Clarke. I get reminded of “Rendezvouz with Rama“, a brilliantly conceived novel that set my imagination on fire. For all the Clarke fans out there, “Rendezvouz with Rama“.
Read More »

Watching Television on Linux: setting up a TV Tuner card

A couple of weeks back, I went shopping looking for a TV tuner card that is compatible with Linux. Googling had told me that “Hauppauge” card was known to be compatible. However, I could not find it anywhere in the market (SP Road, Bangalore, India). On going to one of the shops, I found a “Pinnacle PCTV 50i” card. I had heard from many people before that Pinnacle was a good card for Windows both in terms of quality of decoding and software provided. I checked in google to ensure that Pinnacle card would work on Linux. I found that the card uses Philips’ SAA7134 chipset for which drivers are available in Linux. I went on and bought the card for Rs 2000 ($50).

Pinnacle PCTV 50i

The card is available through
Pinnacle PCTV Analog PCI 50i – TV / radio tuner / video input adapter – PCI – SECAM, PAL
Read More »

All NCERT text books!

I recently came to know that NCERT was providing all the text books (class 1 – 12) for download. However, I found their interface hard to use for browsing through. So I wrote a crawler in python to get all the books to my webserver.

Along with storing the books, I generated thumbnails for every chapter. Navigation pages have also been generated to help browse through the books easily. You can access this dump here.

The crawler code is here.
Read More »

FOSSConf08 – A disappointment

I gave a talk at FOSSConf08 yesterday and came away feeling very disappointed and let down! It was a poorly organized event where total chaos prevailed. My friend Venkat and I found ourselves unable to show our presentation slides because of an overheating projector whose brightness could not out-match the sunlight coming in from the open windows!

Venkat’s talk got delayed by nearly 30 minutes because the machine provided to him was ancient running a version of Fedora which could not automount the USB drive he had.

The only thing we were happy about was the audience who patiently listened to chalk screeching on the blackboard!

To any of you out there organizing a conference, here is a piece of advice,

The contributors spend their time and money visiting such events to interact with fellow enthusiasts. The least, you, as an organizer can do is to make sure everything is smooth and taken care of. I am not saying, you should bear the travel expenses etc. Just make sure something as basic as a projector works!

Here is my poor presentation which did not see the light of the day:)

Create PDF thumbnails using ImageMagick on Linux

imagemagick logoI have a bunch of PDF files for which I wanted to generate thumbnails. On looking around a bit, I found “ImageMagick“. Since I have Ubuntu installed, I did

sudo apt-get install imagemagick

It is not too big a download at around 740KB.

To create a thumbnails for all pages in the PDF document (say test.pdf which has 3 pages), do

convert -thumbnail x300 test.pdf test.png
> test.pdf test-0.png test-1.png test-2.png

Read More »

Apple’s MacBook Air compared to a Sony Vaio TZ

A couple of months back, a colleague and me were looking around for the thinnest ultra-portable laptop around and landed on a Sony Vaio. We however felt unconvinced after checking out the specifications. We thought “There is only so much you can fit into that space”. Apple astonished us today with the introduction of “Macbook Air” to their product line-up. It’s as if Apple is telling us “There is a lot that can go into that”.
Read More »

KDE4 on Kubuntu – Impressions and Screenshots

kde4 logoI love Gnome and its simplicity and use it regularly. I loathe KDE and its complexity. Although I used KDE about 5 years back, ever since Ubuntu was released, I’ve been using Gnome. However, when KDE4 was announced, I decided to check it out with an open-mind and re-evaluate.
Read More »

Marvin – The manically depressed robot!

Marvin the robotThe first time I read Hitch-hiker’s guide to the Galaxy, I got bored mid-way and stopped reading. I thought it was one totally pointless non-sensical rambling story. The movie version of it was played recently on TV and I watched from somewhere in the middle. The first character I saw was Marvin!. I got hooked. If I ever build a thinking robot, I’m going to call it Marvin!

Here are a couple of Marvin videos.

Interfacing Python with C using ctypes

Python is a wonderful “very high level” language with an elegant design. It is an ultimate tool to rapidly develop applications. However, when it comes to performance (speed and memory), Python sucks. It is not meant for performance. So what do you do after building a quick prototype in python if you want it to be lean and mean?

From the beginning, Python’s designers understood this use-case and exposed a “C” API. Using the Python C-API, one can write “modules” in C which can then be imported into Python. This solution is not bad and provided you have enough patience, works great. There is a tool called SWIG which can generate the “glue” code around C code. It automates writing of code using C-API and makes it easier for one to maintain the C “module”. However, since SWIG generates code, when some problem occurs, it is quite painful to debug through the wrapper code. For the lazy developers out there (like me :) ), this can be quite a deterrent.
Read More »

Inconvenient Truth – Al Gore on Global Warming

Inconvenient Truth PosterI just finished watching “Inconvenient Truth” – A documentary film by Al Gore detailing the rationale behind the truth of Global warming. I found the film captivating and moving. I personally believe that human activities contribute to Global warming and that we all should consciously do our part to counter this threat.

According to Wikipedia,

An Inconvenient Truth is an American Academy Award-winning documentary film about global warming, presented by former United States Vice President Al Gore and directed by Davis Guggenheim.

An Inconvenient Truth focuses on Al Gore and his travels in support of his efforts to educate the public about the severity of the climate crisis. Gore says, “I’ve been trying to tell this story for a long time and I feel as I’ve failed to get the message across.” The film nearly follows a Keynote presentation (dubbed “the slide show”) that Gore presented throughout the world. It intersperses Gore’s exploration of data and predictions regarding climate change and its potential for disaster with Gore’s life story.


Topic extraction using Wikipedia data

decorative graph header

In an earlier article, I mentioned that I was trying to use Wikipedia data to do news article clustering to make it easy for me follow news feeds. I have made some progress. I’ve written an algorithm to produce a list of Wikipedia articles relevant to the input text. Input text has to be in English. The algorithm will not work well for very short pieces of text. At least a paragraph or two with sizable text are required. The list of Wikipedia articles will represent the “topic” of the input text.
Read More »

Google’s Knol – A new Wikipedia?


Google has announced “Knol”. In their words

a new, free tool that we are calling “knol”, which stands for a unit of knowledge. Our goal is to encourage people who know a particular subject to write an authoritative article about it.
Read More »

Accessing your home computer from the internet

networkingI recently bought a computer to use at home for development. Sometimes I have to access stuff (code, pictures, bittorrent) on my machine when I am away from home. I keep my machine running all the time and recently upgraded my internet connection from 128 kbps to 512 kbps. If you are in Bangalore, India, Airtel offers unlimited bandwidth at 512 kbps for Rs 1499 ($38) a month.

Since I had the essentials in place, I started setting up things to make my machine accessible from anywhere. These are the goals I had in mind.

  1. A domain name – to access my machine without having to remember an ip
  2. ssh access – so I could login and muck around
  3. http access – so I could host a mini-site with photos etcetera

Read More »

Firefox tattoo from FOSS.IN 2007

Firefox tattoo

My colleague got me a tattoo from FOSS.IN which I promptly wore. Go Firefox, go!

Language People – Interesting picture

language people thumbnail

I like the representation for Logo, Machine Language, Prolog and Ada. Wonder what “N.W” is… (the Modula-2 guy is holding it). I wish python was featured too :) but the picture says “’85”. Python did not even exist then!

original from here

Building a low-cost bad-ass “server” machine

serverI have been playing around with Wikipedia data and tried doing some byte pushing on my Dreamhost web space. Since this is shared web space, the processing power and memory available are limited. I was able to create database tables in mysql by parsing the wiki xml dump and some extra processing as well to construct some custom derived tables but I had to constantly write code keeping in mind the resource constraints. Although it is fun doing this, it detracts from my actual goal (wikipedia data). I decided to build my own “server” for doing stuff like this, which would double as a “home theatre”.
Read More »

Generating call graphs for understanding and refactoring python code

I have always been a fan of visualizations as I believe firmly that it is easier to crunch visual information than anything else. Visualizations are especially helpful for finding out patterns in data that are not expected and for patterns that are difficult to express textually in a concise manner.

The beast

A couple of weeks back I had the task of refactoring a very badly written piece of python code which violated tons of programming guidelines because of rampant usage of global variables, horrendous variable names and functions whose body extended to a couple of hundred lines of code!
Read More »

Wikipedia Category Graph Generator

I was in the process of trying to understand the classification schemes available in Wikipedia (categories, lists and navigation maps) when I came across this nifty tool. It is very useful to understand the inter-relationships between Wikipedia categories.

You can check it out here: – vTap for any device with a browser

Veveo released a mobile web browser friendly version of the vTap service a couple of days back. The cool thing about this is that it will work on any device with support for a basic browser and realplayer (count most mobile phones in). Since it is meant to enable vTap at the lowest common denominator level, incremental search does not work. If you miss incremental search, you just have to wait a little longer for the Java version to come out.

I was able to search videos on a “Sony Ericsson k710i” on Airtel in Chennai, India but video playback did not work as Airtel is blocking “rtsp” streams. If it works for you, do drop me a line with your phone model and name of network operator.

Making Ubuntu 7.10 (Gutsy) look slicker

It has been three weeks since I upgraded to Gutsy from the development repositories. Gutsy got released just a little more than a day back. While going through the news from the blogosphere about this event, I wished Ubuntu had released a “non-brown” desktop. I don’t like brown and have seen quite a few others complaining. Brown is unique but to me it smells of being different for the sake of being different.

The good thing about being on Linux as most of us know is the possibilities it offers for customization. A few years back customization would have required a few hours of focussed hacking but I am surprised at how easily I was able to purge the brown look and create a slick black setup. OS X and Vista please step aside, the position of “OS with Best Customizable Eye Candy” is taken.

Read More »

Ways to process and use Wikipedia dumps  
Wikipedia is a superb resource for reference (taken with a pinch of salt of course). I spend hours at a time spidering through its pages and always come away amazed at how much information it hosts. In my opinion this ranks amongst the defining milestones of mankind’s advancement.

Apart from being available through, the data is provided for download so that you can create a mirror locally for quicker access. This is very convenient when you are not connected to the internet, say when you are on the move.

Read More »

DreamHost: My wonderful web host

I was hosted with “routhost” until April this year when I decided I needed more features like SSH access, build environment so I would be able to download and compile applications and most of all more disk space. After much hunting around I discovered DreamHost. The feature list is astounding. Here are a few to give you an idea

Read More »

Ubuntu Gutsy Gibbon and Linux on the Desktop

I’ve been using Ubuntu Feisty and waiting to get Gutsy when the release comes out. However my curiosity got the better of me and I could not resist upgrading from the beta repositories. The first thing I noticed after upgrade was the amount of polish and attention to detail. Everything looks slick (thanks to Compiz).

Instructions to upgrade from feisty.

Read More »

Embed vTap in your page

The vTap widget is finally out!

Read More »

vTap Windows Mobile source code

Veveo has released the source code for the windows mobile client application. This is great because it gives you a way to fine tune our app to suit your needs. You can sign up for the developer program here to receive updates from Veveo. Get the source here.

“vTap is dope!” – Phenomenal Feedback

Post-launch is a very exciting time. Listening to users and incorporating feedback into the product has been our primary activity for two weeks now. Feedback has been overwhelmingly positive and it quite clear that we have a winning technology on our hands. Bug reports have been pouring in as expected and what has surprised me is how fast we have been reacting to these. The credit goes not only to Veveo’s management and of course to us devs’s :) but also to the thousands of beta testing users around the world. Thank you very much people for your valuable feedback and do keep it coming. If you got an itch, we will scratch it for you :)

If you still haven’t seen our product, check it out here.

Read More »

vTap launched!

After months of tinkering away, we have finally launched VTap – A ground breaking mobile search engine for videos. It’s been a very exciting time for all us from Veveo, especially the past few days. The joy of launching the product and watching people play around is simply inexpressible!

Read More »