Catching Flies: Digital Library of India

IMPORTANT NOTE (3 May 2015) - The Digital Library of India has a new interface, one that works!
http://www.dli.ernet.in:8080/dlix/
THE FOLLOWING POST WAS WRITTEN IN 2012 AND IS RETAINED FOR HISTORY

This post is about the Digital Library of India, a rather poorly designed website born from excellent intention but languishing in an unusable state in spite of being associated with one of the most enlightened of Indian educational institutions. Attempts to contact them through email seem to produce no response so here goes one more attempt at communication.

The sad thing about this site is that there is actually good material underneath, some of which is not even available on the Internet Archive (although in some cases I have been copying content to that site, reasons for which will become apparent below).

Reactions to the home page, easily improved with a bit of thinking (aloud perhaps and the public can help)

The first page itself appears rather poorly designed, although this is in the same style as some of the NIC designed government sites from northern India. This is rather unfortunate given that this is hosted from perhaps one of India's best educational institutions (IISc). If the first page is a let down (several of the links from the first page including to partner sites in Hyderabad and Noida are dead), the results from a search can be even more annoying. I have no idea how it searches for Indic text but in the case of English it is marginally functional for words in the title but that cannot be said about searches by author. The metadata on the site is often incorrectly spelt making search quite useless. When you do find material, it is a good idea to check on the Internet Archive, the scans made there from American libraries are of a much better quality not to mention with better presentation.


The metadata can often be incorrect in spelling or missing

The results of a search lead to two links - BookReader-1 and BookReader-2 and taking one of these two routes will take you to the point where 99% of the audience will exit the website. Here is what clicking BookReader-1 on my Mozilla Firefox browser does. It tries to download a TIF file (the first page, which is often a blank white page) and it tells me that I need to install certain additional software.

BookReader-1 a deadend for most lay-users

Book reader 2 seems to show a rather promising layout in a new window but that says you need to install Apple Quicktime. I do not think most people will make any headway with this interface.

Exit point 2

Now all this is a terrible pity because it can be easily improved. There are books here that are really rare and worthy of readership. A bit of ingenuity is needed to extract the material. I sometimes take material out and reupload them to the Internet Archive where the website runs an OCR (at least for English or other Latin scripts) to make the text searchable. The Internet Archive does not require any plugins to be installed but it needs JavaScript for the online-reader.

Here are a few tips to getting material from the Digital Library of India.

You need to generate a list of links to the pages for a book. You can easily figure out the format, but for Windows users who do not have the time or ingenuity for it - here is a little utility to help you. In case it does not run, you might need to install the VB runtime which you can get on the Microsoft website. If you see errors about comdlg32.ocx missing, follow instructions here. Now run this application. Right click and copy the link indicate in front of BookReader-1 above (it says "click here" - but do not).

Copy the BookReader-1 link into this and create a list of links into a text file

Now click "Make Download list" to generate a list of links in a text file. You then give this list to a download manager - I use Free Download Manager

Provide the list of downloads to Free Download Manager

Ensure that you download all the files to a directory of your choice. Once you are done downloading the files, which might take time, you can see the TIF image files of each page. You could use any image viewer to go through them and IrfanView is a good option. If you are more savvy, you can convert all the page files into a single PDF or other favourite file format.

If you make a PDF out of it you can upload them to the Internet Archive as I sometimes do. Examples include a translation of the Gajasasthra or a work on the Sanksrit names of Indian birds by Raghuvira. These are little-known pamphlets published on a small scale in India and therefore not easily available. They are what scholars in the western world would call grey literature, but it does not have to be that. The quality of the scans on the site is poor. Indeed the Digital Library of India could well do a good job of scanning material in colour and use more modern loss-less formats like JPEG2000 which allow for streaming at variable levels of detail. According to a 2006 paper describing the project - getting books to scan is hard. Now it makes all the more sense to make a good copy. The project people only need to learn from the incredibly well-done and large scale Biodiversity Heritage Library. The cost of doing all this scanning is ultimately far less than actually trying to acquire printed material in the public libraries across India. If one argues that the site is indeed a library, one could even surmount certain problems with the copyright act. After all libraries are allowed to loan books to their customers without paying extra royalty to the authors. The principle of first sale could then perhaps be applied. Learning about the current amounts spent by government public libraries, the kind of books purchased and the quality of service provided will only serve to pain any citizen with a conscience.

The Digital Library of India is thus an amazing resource but it can easily be run in a way that could be of better use. The site could use some expertise on copyrights - a lot of material may not technically be copyright free but it would seem like they are using the principle that orphaned works are ok to copy. Now that is indeed a very good idea but it would be even better if they legitimized it by working with the lawyers who amend the Copyright Laws. The Indian Copyright Law stands in contravention of the spirit of the RTI Act. The RTI Acts says that all information generated by public bodies be made available through the cheapest medium while the Copyright Act happily goes on to state that all Government works will remain copyrighted for 70 years. Personal attempts to point this out to a law-related organization in Bangalore that proposed amendments to the Copyright Act failed and comments on an NIC website seeking feedback on the proposed Copyright Act amendments also reached nowhere. If ever the government can clean up its act (or Acts), it should scan up all the material in its public libraries, state archives and government bodies and put them all in a digital library apart from being a storehouse of born-digital documents and other material produced as it functions. I am sure they can easily work at the same scale as the Internet Archive. Attempts to communicate with even enlightened organizations like the IISc that host the Digitial Library of India seem to be useless. So perhaps we just have to trudge along and make improvements the hard way by fixing things as mere individuals.

Some other digital libraries