I like setting up shortcuts to frequently used commands whether I used Windows or Linux. I use the terminal often and create shortcuts to frequently used commands using “alias” feature of BASH. This has saved me considerable time in the past. However, I recently felt that if I could have a helper tool to monitor [...]
Categories: linux,programming,python,text processing,Uncategorized,veveo
Tagged: bash, linux, productivity, python, script
- Published:
- August 28, 2009 – 8:05 am
- Author:
- By prashanthellina
Some time back I had done some work on extracting topics from an arbitrary piece of text using Wikipedia data. Recently I thought of a concept to put that algorithm to work. As a part of this project, I need to extract relevant text from an arbitrary HTML page. By relevant I mean the “meat” [...]
Categories: programming,python,text processing,web
- Published:
- July 27, 2009 – 4:46 pm
- Author:
- By prashanthellina
As a part of a project I am working on, I had to cluster urls on a page. After some light googling I found, python-cluster. You can find below a simple python script to illustrate the usage of python-cluster library.
Categories: programming,python,text processing
Tagged: clustering, python, script
- Published:
- July 25, 2009 – 9:24 am
- Author:
- By prashanthellina
Today, we received the shipment from Microsoft at Veveo. If you have not heard of Microsoft Surface before, It is a touch screen based computer embedded in a table. The surface of table is illuminated from underneath by a projector (rear-projection) and touch input is implemented by reflecting IR radiation off the fingers and then [...]
Categories: computer hardware,text processing,veveo
Tagged: computer, gadget, interface, microsoft, surface, touch, unboxing
- Published:
- December 30, 2008 – 5:16 am
- Author:
- By prashanthellina
In my earlier post, I’d posted links to the Project Gutenberg Ngram data I had computed for e-books of all languages. If you are interested in only the English data, get these files instead. These two files are splits of a compressed file which contains all of the Project Gutenberg English e-books downloaded about a [...]
Categories: data mining,linux,text processing
Tagged: data mining, ngram, project gutenberg, text processing
- Published:
- May 13, 2008 – 9:54 pm
- Author:
- By prashanthellina
I’ve been working on Wordza.com for which I needed Ngram data from a sufficiently large corpus. Initially, I thought of using Wikipedia data which I already have on my disk, but decided on using Project Gutenberg data as it is more representative of the general usage of English language.
Categories: data mining,linux,programming,python,text processing
Tagged: gutenberg, ngrams, project gutenberg, python, text parsing
- Published:
- May 4, 2008 – 9:58 pm
- Author:
- By prashanthellina