What, why I’ve been reading up on TDD and it has struck me as particularly useful methodology to achieve “clean code that works”. TDD encourages writing unit tests to cover all the code (because by definition, you write a test before a line of code is written). Because all your code is covered you are [...]
Categories: linux,programming,python
Tagged: linux, nose, programming, python, tdd, testing, tools
- Published:
- May 22, 2008 – 8:20 pm
- Author:
- By prashanthellina
In my earlier post, I’d posted links to the Project Gutenberg Ngram data I had computed for e-books of all languages. If you are interested in only the English data, get these files instead. These two files are splits of a compressed file which contains all of the Project Gutenberg English e-books downloaded about a [...]
Categories: data mining,linux,text processing
Tagged: data mining, ngram, project gutenberg, text processing
- Published:
- May 13, 2008 – 9:54 pm
- Author:
- By prashanthellina
I’ve been working on Wordza.com for which I needed Ngram data from a sufficiently large corpus. Initially, I thought of using Wikipedia data which I already have on my disk, but decided on using Project Gutenberg data as it is more representative of the general usage of English language.
Categories: data mining,linux,programming,python,text processing
Tagged: gutenberg, ngrams, project gutenberg, python, text parsing
- Published:
- May 4, 2008 – 9:58 pm
- Author:
- By prashanthellina