Digitizing the Library of Congress

The Library of Congress has always been the foremost cultural establishment of the United States. Established in 1800 by Congress and signed into law by President John Adams, the Library of Congress was designed to be the top federal cultural institution in the country.

The new library grew exponentially during the 19th century, quickly becoming one of the leading research libraries in the World by the 1850s. By the 1990s, the Library of Congress was cited as the largest library in the world, a title that it still holds. Housed in multiple buildings in Washington, D.C., as well as other buildings in Virginia, the library has universal collections from an enormous variety of subjects, formats or origins. It is estimated that the library has more than 150 million items in its collection, including 22 million books, 5 million maps, and 13 million photos. Research materials are available in more than 450 languages, and it is estimated that two-thirds of the materials acquired are in languages other than English.

The Digitization Process

With the advent of the digital revolution, this massive collection was often the target of many digitization programs, many of them being abandoned before even beginning. The digitization process, now under full swing, is estimated to take decades, and will allow scientists, scholars, as well as the general public, to have easy access to this huge resource.

After a five-year pilot program, which lasted between 1995 and 2000, the Library of Congress National Digital Library Program (NDLP) came into effect. The new program, managed by the Preservation Research and Testing Division in Washington, D.C. and the National Audio-Visual Conservation Center in Culpeper, Virginia, will try to reproduce the collection in various digital formats. The technology used during the pilot program was extended to the whole collection and involved scanners, audio and video digitization devices, digital cameras, and human labor for editing, re-keying, and encoding texts.

Also, cutting edge technology is sometimes employed for older maps and documents. For instance, scanning electron microscopy allows preservation specialists to recreate damaged manuscripts, torn maps, and other old documents. The technology identifies the micro-chemical nature of the inks and pigments used, recreating the whole structure of the text. Similarly, modern X-ray technology is employed to detect otherwise imperceptible changes done by the original artists or writers.

The digitization process uses standardized industry formats, such as SGML (Standard Generalized Markup Language) files, PDF files, and JPEG files, all of which can be viewed with conventional software products.

The Access to the Digital Content of the Library of Congress

The access to the bibliographic database of the library will be free to the public and will employ a new way of searching. Users will not have to start their visit by searching the catalog, as in previous days. Instead, librarians will help users create a set of favorite journals that will ease access to the best information and the latest acquisitions. Special indexes will be established for most types of media or collections, making browsing easier and more efficient.

The new Digital Library of Congress is designed to be a rich extension to every desktop or notebook, classroom or personal library, and will make use of cloud computing technology. Similar to private organizations, where multiple employees can work simultaneously on the same project, teachers, students, scholars will be able to access the multitude of resources to create their own works.