Index ¦ Archives ¦ Atom ¦ RSS

All NCERT text books!

I recently came to know that NCERT was providing all the text books (class 1 - 12) for download. However, I found their interface hard to use for browsing through. So I wrote a crawler in python to get all the books to my webserver.

Along with storing the books, I generated thumbnails for every chapter. Navigation pages have also been generated to help browse through the books easily. You can access this dump here.

The crawler code is here.

These are the steps to run it yourself.

# scrape ncert site to get details about book pdf urls
python > ncert_books_data

# create a directory to store all downloaded files
mkdir ncert_books

# download all the pdfs and cover page images
cat ncert_books_data | python ncert_books

# generate thumbnails for each pdf (requires ImageMagick)
find ncert_books/ -iname "*.pdf" > all_pdf_files
sed -i "s/^/0\t/" all_pdf_files

# resize the book cover images
python ncert_books

# generate the navigation pages
python ncert_books

For the thumbnail generation to work, you will need ImageMagick. Read this for more information.

© Prashanth Ellina. Built using Pelican. Theme by Giulio Fidente on github.