Summary of application for funding presented to the Mellon Foundation 2007
The Access to Music Archives project aims to create a portal to access music archival collections regardless of their geographic location. The project will develop software to harvest metadata from existing online catalogues for music archival materials and at the same time create a prototype database for institutions that lack the ability to develop their own, in order to ensure broad participation. The individual databases will be maintained by the institutions owning the archives, and thus the project will not be responsible for updating the content perpetually. This pilot project is the cooperative work of major musical institutions in Europe and North America under the auspices of IAML (the International Association of Music Libraries, Archives, and Documentation Centres). President Massimo Gentili-Tedeschi, says, “This is a project of paramount importance to music scholars, as there is no current resource that allows for locating music primary resources in archives easily.”
The Current Situation
The complexity of music archives regarding their locations, formats, provenance, and access necessitates a project to unify and make accessible the sources for the ease of users. Music archives may exist within music institutions, historical societies, museums, libraries, or other non-music related archives. Music archival collections contain documents in a variety of formats and media, such as paper (music manuscripts, documents, and letters, for example), recordings (from wax cylinders, 78s, LPs, cassettes, videocassettes, CDs, DVDs, etc.), and photographs (or other visual formats); to date, only a fraction of this priceless material is available through online catalogues.
The archival collections may be generated by individuals or institutions, such as composers, choirs, orchestras, opera companies, universities and conservatories, churches, military groups, publishers, and any music producing institutions. The level of description of these archives, too, can vary widely: full cataloguing and/or EAD (Encoded Archival Description) finding aids, minimal cataloguing, hand-written card catalogues, or no access at all. By our estimation, less than 5% of the finding aids are available online. When online access is available, different institutions may have different metadata, online catalogue systems, or search engines. Additionally, archival material may be included within much larger collections which are not archives or music archives. Even well-established databases with standards such as those from national libraries, such as Kalliope in Germany, Cecilia in the United Kingdom, and NUCMC in the United States, each require a user to apply different search strategies to glean information on the same subject from the different databases.
This state of affairs creates barriers and denies access to researchers to locate archives on topics of interest, as they are faced with a multiplicity of distinct search interfaces and unknown locations. Existing search engines have two drawbacks: they do not allow for domain-specific searches, thereby providing too many irrelevant search results; in addition, many music archives of value are not yet accessible on the internet or referenced there, contributing to the lack of relevant results.
The Pilot Project
Considering the current state of music archives, the AMA Working Group is embarking on a project that will provide single-stop federated searches for users. This registry will periodically harvest metadata from existing archival databases taking part in the project (henceforth called data providers). Using dynamic harvesting will ensure the currency of the data in the registry, and obviate manual imports, duplicate cataloguing, and other operations which are not sustainable in the long run. At the same time, AMA will create a database for institutions which cannot afford or are unable to establish one; any institution that is willing will be able to participate. This database, too, will be subject to harvesting like others. We anticipate this project will take two years, beginning in Summer 2007.
AMA plans to carry out a survey to identify existing music archives and collections, as well as their existing cataloguing infrastructure (if any) and its suitability for OAI (Open Archive Initiative) access. The network1 and members of the IAML national branches will help publicize the survey and encourage participation by their fellow archivists. The survey will be carried out via a web form on the Society's home page, allowing for the convenience of the survey taker as well as ease of gathering the survey data. The aim of the survey is twofold: to identify the specifics about the collections (form of access, scope of materials, links to finding aids and/or digital images, ISAD-compliance, technical information, and physical condition) and to determine the eventual scope and size of the project. A survey carried out by the Society in 1988 revealed that there is a great variety in the registration and processing of music archival materials in the different types of musical and non-musical institutions such as libraries, museums, and archival institutions. The new survey will update us on their current technologies and the technical approach our project should take considering such a diverse group. It will also allow participants to comment on the survey and the project. Moreover, the survey will also inform us on the scope and scalability of our final project. The survey will be one of the first steps of our pilot project, and the survey design is already underway. This survey will also inform us where national archives groups may need to be established to manage the full project and ensure its sustainability in the future.
Some ground work has already been laid in establishing the metadata model for the registry. AMA members have agreed that the principal fields in the database will be based on ISAD-G (the General International Standard Archival Description) as laid out by ICA (the International Council on Archives). Most of the established archival databases already use this standard, including UK's Cecilia, Germany's Kalliope, and the US's NUCMC. We have agreed that if we allow a core bibliographic record of only 4 basic fields as originally suggested by ICA, with the option of including more fields, we will encourage more archives to join the project and allow access to in-process collections as well. These minimal-level records can always be filled in with more information at a later date. After the informational survey, adjustments will have to be made as the survey results come in and when the database starts harvesting actual contents with different kinds of data. Other issues that have yet to be addressed include multiple languages and name authorities. We will look to IAML's other projects such as RISM and RILM as models on some of these issues because these projects have a proven record in resolving these particular issues.
Basic Architecture and Technologies
The server will harvest metadata from the participating institutions using the OAI protocol. Other protocols (such as Z39.50) may have to be considered in those cases where adding an OAI access point is technically or financially unfeasible.
The server will provide an interface to search or browse the collected metadata. Each metadata record will point to the original record in the participating institution, which may contain more information than the one in the registry does. The underlying software system we envision is an open source framework called SDX2, which is currently being used in two other portals at the technical lead's institution, IRCAM (the Institut de Recherche et Coordination Acoustique/Musique).
Each data provider will expose its metadata in a common format by putting it in a repository for the central registry to harvest via the selected protocol, while it will continue to administer its catalogue using its own formats. In other words, every data provider must enable harvesting via the selected protocol, and map its metadata to the common format. Those providers in the pilot project whose archives contain other material in addition to music archives will have to group them in distinct sets3 to allow for selective harvesting. The common metadata model will be designed in adherence to international archive standards. It will allow for the description of multi-level collections down to the item level. The registry will provide searching and browsing across all data providers, with the capability of restricting the scope to specific providers. Results will include the appropriate normalized records, which will contain the descriptions and locations of the corresponding collections or items. Each such record will also provide access to the original local metadata, which may contain more detailed information, and to the digitized item if available and accessible. Each data provider will be able to attach optional access rights to any level of their metadata (collections, items) and data (if available), thereby controlling access. The registry will also provide to those organizations which do not have an existing catalogue or database a basic web site allowing them to catalogue their collections. This site will then be harvested like the others.
The user query interface will be web-based, making it widely accessible.
One of the principal tasks of the pilot project will be the development of the harvesting software. It will also include collaboration with the participating institutions in the adaptation of their systems to allow for harvesting.
Several organizations, all IAML members owning significant music archives, have agreed to participate in the pilot project: the British Library, the Bibliothèque nationale de France, the Staatsbibliothek zu Berlin, the Den Haag Gemeente Museum (NIMI), and a few smaller institutions will participate. We expect to have a total of eight institutions of varying types participating in our pilot project. This defined number of collections with diverse types of metadata (different countries, different kinds of institutions with different sizes) will be a good test for us to tease out the main design issues of our final database: the diversity in its platforms, languages and data models, access rights, scalability, and sustainability.
For the full project, we are seeking participation from all institutions with holdings of music archive collections.
1 IAML has members in 52 countries and 25 established national branches.
2 SDX is based on open source software (Apache Tomcat and Cocoon). We are actually considering using a more recent environment, the PKP Open Archives Harvester, currently put to use by IRCAM in the French contemporary music interinstitutional portal project.
3 An OAI set is an optional construct for grouping items for the purpose of selective harvesting.