INCLUDE_DATA
( to ) ? be : ! be;
sidebar left sidebar right

Recent text processing

  • All your aliases are belong to you

    I like setting up shortcuts to frequently used commands whether I used Windows or Linux. I use the terminal often and create shortcuts to frequently used commands using “alias” feature of BASH. This has saved me considerable time in the past. However, I recently felt that if I could have a helper [...]

  • Extracting relevant text from HTML pages

    Some time back I had done some work on extracting topics from an arbitrary piece of text using Wikipedia data. Recently I thought of a concept to put that algorithm to work. As a part of this project, I need to extract relevant text from an arbitrary HTML page. By relevant I mean the “meat” [...]

  • Clustering Data using Python

    As a part of a project I am working on, I had to cluster urls on a page. After some light googling I found, python-cluster. You can find below a simple python script to illustrate the usage of python-cluster library.

  • Microsoft Surface Unboxing

    Today, we received the shipment from Microsoft at Veveo. If you have not heard of Microsoft Surface before, It is a touch screen based computer embedded in a table. The surface of table is illuminated from underneath by a projector (rear-projection) and touch input is implemented by reflecting IR radiation off the fingers and then [...]

  • Project Gutenberg Ngram data: English only

    In my earlier post, I’d posted links to the Project Gutenberg Ngram data I had computed for e-books of all languages. If you are interested in only the English data, get these files instead.

    These two files are splits of a compressed file which contains all of the Project Gutenberg English e-books downloaded about a week [...]

  • N-gram data from Project Gutenberg

    I’ve been working on Wordza.com for which I needed Ngram data from a sufficiently large corpus. Initially, I thought of using Wikipedia data which I already have on my disk, but decided on using Project Gutenberg data as it is more representative of the general usage of English language.

555481 views

Prashanth Ellina is powered by WordPress

No Complaints Shifter Series Theme by Buzzdroid.com
Computers blogarama - the blog directory Blog Flux Directory Blog Directory & Search engine Computer Blogs - Blog Catalog Blog Directory Computers blogs Bloggeries Blog Directory blog directory Computers Blog Blog Search, Blog Directory p Listed in LS Blogs the Blog Directory and Blog Search Engine Blog Review Blog search - categorized blog directory Link With Us - Web Directory Find Blogs in the Blog
Directory Blog Directory