"Digital Libraries: Advanced Methods and Technologies, Digital Collections"
Report on the 4-th All-Russian Scientific Conference (RCDL'2002)
On October 15 - 17, 2002, RCDL'2002 - the Fourth All-Russian Scientific Conference "Digital Libraries: Advanced Methods and Technologies", took place at the Joint Institute for Nuclear Research, Dubna.
Digital Libraries (DL) is a field of research and development aiming to promote the theory and practice of processing, dissemination, storage, search and analysis of various digital data. Digital Libraries acting as knowledge depositories can be considered as complex information systems, development and use of which require solution of numerous scientific, technological, methodological, economic, legal and other issues. Digital Libraries technologies are rapidly developing. Challenges in semantics, integration of information, perceptions of presentation of various kinds of data call for significant innovations. Development of Digital Libraries technologies is becoming more and more essential for raising the standards of health care, education, science and economy, as well as the quality of life on the whole. Projects devoted to generation of the digital form of information concerning the Earth, Universe, Literature, Art, Environment and Humans, accumulated by humanity, are examples of the intensive development of global information repositories.
RCDL'2002 is the fourth conference on this subject (1999 - St.Petersburg, 2000 - Protvino, 2001 - Petrozavodsk). The principal objective of the series of the conference is to promote the constituting of a community of Russian experts involved in researches and development related to Digital Libraries. The Conference offers such a community an opportunity to discuss ideas and outcomes and to make contacts for closer cooperation. Much attention is focused on advanced applications and technologies. Besides, the pilot applications and digital collections developed in frame of the RFBR grants on digital libraries and other programs are also discussed at the conferences. The Conference also promotes the study of international experience, development of the international cooperation on Digital Libraries. The Conference languages are Russian and English. In 2002 Yannis Ioannidis (University of Athens, Greece) helped to prepare the Conference acting as its European Coordinator. Christine Borgman (UCLA, USA) provided a liaison with the ACM SIGIR. The Program Committee of RCDL'2002 included PC members from abroad.
97 Extended Abstracts were submitted for the conference. The Program Committee reviewed all of them and selected 59 submissions for regular sessions and 13 - for poster presentations. The Conference Proceedings including full papers were published before the conference. RCDL'2002 was supported by the grants of the Russian Foundation for Basic Research and the Ministry for Science of the Russian Federation.
104 specialists from 16 Russian cities and 15 foreign attendees from Germany, Hungary, Latvia, Moldova, Ukraine and USA took part at the conference.
2. The Program of the Conference
2.1. Conference Structure
2.2. General features of the program
Digital libraries are more and more actively coming into use of scientific organizations and universities. At present, one can hardly imagine a western university having no well developed digital library. The digital libraries created on the basis of licensed agreements with publishing houses displace a subscription to journals. At the same time, most digital libraries of the universities functionally resemble the traditional ones or are combined with them in hybrid libraries. In perspective the digital libraries should become the depositories of knowledge. The program of RCDL'2002 reflects this wide range of various interpretations of the "digital library" concept. The problems of hybrid libraries development are mixed in the program with the presentations of digital libraries and information systems for organization of science. The considered examples of specialized scientific collections show that from the informational, structural and functional viewpoints they are far beyond the traditional library capabilities. The problems of developing a virtual astronomical observatory are considered in detail in a context of international cooperation. Significant attention in the program of the conference is given to the creation of integrated depositories of scientific information - the global ones in particular areas of knowledge and formed by means of the advanced technologies (mediators) and traditional approaches. Problems of creation of digital archives were also discussed. Special research papers are collected in the sessions of semantics of information resources, methods of representation, retrieval and indexing of documents. The conference emphasized digital libraries in education as elements of the virtual educational environment. Particular expectations are imposed on them in connection with the global transformation of education under influence of the information technologies. Besides the special session devoted to these issues, the International Expert Meeting of the UNESCO Institute for Information Technologies in Education was organized at the conference in form of a round-table discussion considering the state of digital libraries in education.
2.3. Characterization of Presentations
Session 1. The talks given by the representatives of several large Russian libraries discussed the status, applied technologies and perspectives of the development of a digital component in their hybrid libraries.
Session 2 (Digital libraries for education). Mary Marlino (UCAR, USA) reviewed professionally-oriented community aspects of creating narrowly profiled digital libraries for education (on an example of DLESE - the Digital library for Earth System Education ). A role of the professional community during the development of such a subject-oriented library and mutual influence of the community and the library were explained. A new approach to organization of educational courses on the basis of the materials, accumulated in digital libraries, was presented in the talk given by Alex Ushakov on behalf of a team from the University of California at Santa Barbara working on the well known Alexandria Digital Library Project. In this approach the conceptual medium of a subject domain is emphasized. Applying this hypothesis and using the course of physical geography as an example, the build-up of a teaching environment emphasizing a leading role of concepts and digital libraries at training was exhibited. The engineering aspects of creating a digital library for airspace education were surveyed in the talk given by E.B. Kudashev.
Session 3 (Semantic aspects of information resources). A talk given by L.A. Kalinichenko and N.A. Skvortsov was focused on using of the DAML + OIL ontological model draft standard, developed by W3C, for the subject mediation applying a reversible mapping of this ontology into a canonical model of the mediator. The ideas of a "visual thesaurus", "visual" metadata and indexing of visual data looked debatable, though not motivated enough in the paper presented by I.M. Zatsmann.
Session 4. A brief overview of the problems of developing digital libraries for organization of science was presented by JINR, by a group of physical institutes of RAS and by the Kazan State University. E.N. Filinov and A.V. Boychenko once again attempted to consider the standards of representation of digital library resources simultaneously for science, culture, and education. The presented material was not synchronized well enough with the actual state of resource representations used in modern digital libraries reflecting a gap with the world community in this area of the intensively developing technologies.
Session 5. Guenther Eichhorn presented his paper regarding the large digital library of publications in the field of astronomy - The Astrophysics Data System (ADS). It is an impressive collection. O.B. Dluzhnevskaya and O.Yu. Malkov told about the plans of integration of the Russian scientific astronomy community in the international movement in a direction of the Virtual Astronomical Observatory (VAO). For this purpose, the project of the Russian Virtual Observatory as a component for integration into the International VAO is being worked out. The talk by V.V. Vitkovsky and his colleagues presented information on the contribution of the Special astrophysical observatory of RAS into VAO. Talks on astronomical collections were also presented at Sessions 7 and 9. At Session 7 the talks from Russia and the Ukraine were devoted to the creation of databases of archives of photographic plates accumulated in Pulkovskaya and Crimean observatories. Session 9 dealt with the issues of application of various technologies to the creation of astronomical collections (object models for pulsar data in object-oriented environment (A.E. Avramenko's paper) and XML for various observational data (V.V. Vitkovsky's paper). The first talk characterized using the object interoperability on the basis of CORBA/DCOM, the second talk was devoted to usage of Web-services and their interoperation on the basis of SOAP, WSDL, and UDDI technologies.
An important role in the structure of the conference played Session 6 devoted to Data Grid and perspectives of using this architecture in digital libraries. The invited talk by Ilya Zaslavsky from the San Diego Supercomputer Centre contained a brief survey of the technologies developed at this centre - the Storage Resource Broker (SRB), a representative of Data Grid, and MIX, the mediator implementing the Global as View approach to integration of heterogeneous data sources. These architectures are still considered separately, though their integration is expected in the future. The talk given by V.V. Korenkov explained the structure of the large project of the European Union on Data Grid and involvement of Russia in this project. These two talks allowed the conference attendees to compare various Data Grid architectures being developed by the global community.
Sessions 10 and 13 were devoted to the methods of representation and retrieval of documents. Benjamin M. Gross (UIUC, USA) has analyzed procedures of work with e-mail (modes of choice of addresses, sorting of letters according to categories, etc.) and proposed his variant of the prototype of a system using a memory for messages (mails) as a relational database at a lower layer and a set of services on an upper layer (for example, a text and metadata indexing service) for improving organization of storage of messages, their sampling, addressing and navigating. Many of the offered solutions can also be applied for organization of digital collections of other types. We would like to mark the talks presented by specialists from the St.-Petersburg State University on the activities supported by the RFBR grants and devoted to research on a possibility of automatic detecting of HTML-documents having similar structure (i.e. receiving information facilitating creation of wrappers) and possibility of using information about a content of documents in a neighborhood of identified Web-pages for the retrieval quality improvement. The paper presented by B.V. Dobrov and N.V. Lukashevich is dedicated to the development of multilingual information systems, including facilities for automatic processing, indexing and retrieval of documents in "multilingual" collections. Principles of development and filling by scientific information (in various areas of science) of the Integrated Distributed Information System (IDIS) of the Siberian Branch of RAS applying extended document object model (DOM) was presented in the talk given by Yu.I. Shokin, A.M. Fedotov and Yu.V. Leonov. The paper by M.V. Gubin presented the results of researches intended for choosing a method of the indexed files compression (the basic index structure used for text retrieval).
Session 11 (Integrated scientific repositories). Professor Bernd Wegner (Institute of Mathematics of the Technical University in Berlin) noted that in case of development of knowledge bases having a form of a digital library the creation of global repositories is required. This, in its turn, is related to the three aspects: memorizing the accessible digital materials, archiving such materials with the purpose of their preservation for the future generations and transforming published materials into the digital form providing good access and retrieval possibilities for potential readers. The paper was devoted to some details of this activity, in particular, for EMANI- Electronic Mathematics Archives Network Initiative (international project) and ERAM - Electronic Research Archive in Mathematics (German project) - the projects applying distributed network architecture. Besides that, a plan of development of the global Electronic Library on Mathematics (DML and RusDML) was characterized.
Two papers presented at this session (with involvement of specialists from the Institute of Mathematics of RAS, the Institute of Informatics Problems of RAS, the Institute of Cytology and Genetics of Siberian Branch of RAS and the Institute of Computational Mathematics and Mathematical Geophysics of Siberian Branch of RAS) were devoted to various issues of implementation of distributed digital libraries in the area of molecular biology, biotechnology and medicine, and, in particular, to implementation of the Gene Discovery / GeneExpress system and involving TRRD, SWISSPROT databases (structure and functions of proteins, their classification, etc.), EMBL/GenBank (sequences of DNA and RNA), and Medline. Regretfully, the form of presentation of the material heavily relied on the familiarity with the terminology and concepts of the related research area.
Session 12 (Integration of heterogeneous collections). Yu.S. Zatuliveter underlined a forthcoming problem of transforming the Internet in a programmable metacomputer by activation of functionalities of the network computers for global system (suppression of information noise, structuring and integration of information resources, automatic control over computing resources) and user's tasks. It was noted that the Grid - technology is the first serious step in this direction.
Two papers of this session (presented by V.A. Kapustin and O.L. Zhizhimov with their co-authors) were devoted to possibilities and tools of applying Z39.50 protocol for creation of profiled distributed information systems (standardization of metadata, schemes of data). And last but not least, the session demonstrated the Library Subsystem of the Integrated system of information resources of RAS as a medium of the library registries providing access to the materials of the libraries of RAS Institutes (this is a joint work of Computing Centre of RAS and the Centre of scientific telecommunications and information technologies of RAS).
In frame of the Session 14 (Archives) the talk given by Paul Braslavsky (Ural Branch of RAS) and Tomas Krichel (USA) should be noted. It was devoted to a technology of organizing repositories accessible through Web, to formats and usage of the Dublin Core metadata standard in accordance with the OAI (Open Archive Initiative) protocol for the academic organizations, their documents and collections. The papers by the team of authors from the Institute of problems of information transfer of RAS and the Institute of Informatics Systems of Siberian Branch of RAS characterized the technologies of creation and usage of a text-graphics database on the history of the Russian fundamental science applying the archive of RAS and personal archives.
Session 15 (Document indexing). In frame of this session, two papers from St.-Petersburg were presented. A. Koryavko and I. Nekrestyanov considered the problem of building-up retrieval systems in Web with alternative approaches to the rating of the "usefulness" of Web-pages for a particular user applying not only a content of a document, but also a metainformation on both the document and the user (including his previous inquiries, what documents and how much time he read them after his query, etc. This approach provides for more effective ranking of documents). Capabilities of one of the representatives of the page ranking methods based on information on relationships between Web-pages (the Kleinberg algorithm) were analyzed and extended. Facilities for searching in the environment of semi-structured data were discussed in the talk given by B.S. Khvostichenko and B.A. Novikov.
3. Expert Meeting "Digital Libraries in Education" (UNESCO IITE)
During the conference, on October 15, 2002, the Institute for Information Technologies in Education of UNESCO, in cooperation with RCDL'2002, JINR and IPI RAS organized an International Expert Meeting "Digital libraries in Education". According to the plan of its activities, the IITE UNESCO develops the project on applying digital libraries in education. The purpose of the Expert Meeting was to present the Analytical Survey "Digital libraries in Education" prepared by an international group of experts. The content of the Survey was presented at the Meeting by professor L.A. Kalinichenko.
In the Survey, some technological aspects of developing digital libraries are considered summarizing selected projects in the USA and Europe. For instance, in the USA, the National Digital library is being developed in the field of science, technology, engineering and mathematics (NSDL) that is oriented on the usage in education and science. NSDL (a first version of the system becomes operational in December, 2002) is developed as an integrated distributed information environment. NSDL provides a possibility of access to a large volume of heterogeneous digital objects including mulimedia, georeferenced objects, the objects representing measurement data, samples under study and even expensive instruments for remote access (like an electronic microscope). In view of such a variety of information objects, NSDL supports a multiple set of various standards of metadata. Interfaces of such systems are evolved from traditional one, based on keywords, in the direction of more semantic interfaces (for example, usage for queries of benchmarks of the Atlas of Literacy developed recently in the USA). Quite an intensive development of NSDL is planned, including possible positioning of this library as a sub-structure of the federal government.
CITIDEL, an example of NSDL component, is an interactive digital library in the field of computer science and information technologies. Another example is a networked digital library of theses and dissertations (NDLTD). These are distributed infrastructures with multilingual access, support of multiple methods of information retrieval and metadata acquisition. NDLTD is supported at a state level in a number of countries - in Australia, Brasil, Germany, India, Korea, USA, as well as by some national libraries (including the British Library). An interesting example of the high-quality educational library in the area of a specific subject domain is DLESE (the Digital Library for Earth System Education).
It is important to note that alongside with supporting traditional digital objects, the infrastructures are developed, in which information objects become the data streams measured in real time (for example, results of atmospheric measurements on the Earth surface, in higher layers of the atmosphere, radar measurements, satellite observations, etc). In the USA the data networks have been developed that deliver such real time measurements to the hundreds of universities. There are projects intended to make such data streams a part of the NSDL information. New "cyber-infrastructures" are progressing for science and education providing new approaches to the creation of digital libraries. In the data grids the term xGrid, where x means a data domain, designates a structure (for example, BioGrid), joining the experts, information and tools in this subject area. One of the purposes of such grids is the open publication of scientific information.
The Analytical Survey considers the evolution of these projects at least in a five year perspective. Besides that, the advanced approaches of development of such libraries for educational purposes are analyzed. The Survey is completed by guidelines for the next stage of the UNESCO project - the development of the educational modules oriented on various groups of learners in developing countries.
The following conference attendees took part in the discussion of the Analytical Survey: Dr. Mary Marlino (UCAR, Boulder, CO, USA), Alex Ushakov (UC in Santa Barbara, CA, USA), Prof. Bernd Wegner (TU Berlin, Germany), Dr. Stephan Koernig (TU Darmstadt, Germany), Prof. V.P. Shirikov (JINR, Dubna, Russia), Dr. S.A. Khristochevsky (IITE UNESCO), Prof. A.G. Marchuk (ISI SB RAS, Novosibirsk, Russia), Dr. V.N. Zakharov (IPI RAS), and others. The Meeting recommended to publish and widely disseminate the Analytical Survey and to proceed to the next phase of the UNESCO project.
4. The Conference Recommendations
At the closing session of the conference the participants approved the following recommendations:
RCDL'2002 Organizing Committee Chair
© Joint Institute for Nuclear Research, 2001 - 2002