Index ¦ Archives ¦ Atom ¦ RSS

All NCERT text books!

I recently came to know that NCERT was providing all the text books (class 1 - 12) for download. However, I found their interface hard to use for browsing through. So I wrote a crawler in python to get all the books to my webserver.

Along with storing the books, I generated thumbnails for every chapter. Navigation pages have also been generated to help browse through the books easily. You can access this dump here.

ncertbooks.prashanthellina.com

The crawler code is here.

These are the steps to run it yourself.

# scrape ncert site to get details about book pdf urls
python get_ncert_books_data.py > ncert_books_data

# create a directory to store all downloaded files
mkdir ncert_books

# download all the pdfs and cover page images
cat ncert_books_data | python download_ncert_books.py ncert_books

# generate thumbnails for each pdf (requires ImageMagick)
find ncert_books/ -iname "*.pdf" > all_pdf_files
sed -i "s/^/0\t/" all_pdf_files
python generate_thumbnails.py

# resize the book cover images
python resize_image_thumbnails.py ncert_books

# generate the navigation pages
python generate_navigation_pages.py ncert_books

For the thumbnail generation to work, you will need ImageMagick. Read this for more information.

© Prashanth Ellina. Built using Pelican. Theme by Giulio Fidente on github.