M. CHEYNEY1, P. GLOOR2, D.B. JOHNSON1, F. MAKEDON1,
J. MATTHEWS1 AND P. METAXAS3
1 Department of Mathematics and Computer Science
Dartmouth College, 6211 Sudikoff Laboratory, Hanover, NH 03755, USA
E-Mail: {cheyney,djohnson,makedon,James.W.Matthews}@dartmouth.edu
2 Union Bank of Switzerland
8021 Zurich, Bahnofstrasse 45, Switzerland
E-Mail: Peter.Gloor@zh014.ubs.ubs.ch
3 Department of Computer Science
Wellesley College, 106 Central Street, Wellesley, MA 02181, USA
E-Mail: PMetaxas@lucy.wellesley.edu
Abstract: Academic conferences are a long-standing and effective form of multimedia communication. Conference participants can transmit and receive information through sight, speech, gesture, text and touch. This same-time, same-place communication is sufficiently valuable to justify large investments in time and travel funds. Printed conference proceedings are attempts to recapture the value of a live conference, but they are limited by a fragmented and inefficient approach to the problem. We addressed this problem in the multimedia proceedings of the DAGS'92 conference. The recently published CD-ROM delivers text, graphic, audio, and video information as an integrated whole, with extensive provisions for random access and hypermedia linking. We believe that this project provides a model for future conference publications and highlights some of the research issues that must be resolved before similar publications can be quickly and inexpensively produced.
In preparing the DAGS'92 Multimedia Conference Proceedings we aimed to address these shortcomings, and therefore deliver more of the value of an academic conference to our audience. Recently, conference organizers have realized the endless possibilities that the multimedia productions can have, and some first efforts on multimedia proceedings have already been published (Rada, 1993; MacSciTech, 1992). Even though these are efforts to the right direction, we believe that they fall short of the multimedia abilities.
The DAGS'92 CD-ROM (Gloor, Makedon & Matthews, 1993) delivers text, graphic, audio, and video information as an integrated whole, with extensive provisions for random access and hypermedia linking. We believe that this project provides a model for future conference publications and highlights some of the research issues that must be resolved before similar publications can be quickly and inexpensively produced. The experience gained from this effort will be applied not only to multimedia conference proceedings of the future, but also to multimedia textbooks and learning environments. We view this as the strongest point of our experiment and research, and a direction that should be pursued further.
Background
In June 1992 Dartmouth College hosted the first annual Dartmouth Institute for Advanced Graduate Studies (DAGS) symposium, on the topic of "Issues and Obstacles in the Practical Implementation of Parallel Algorithms and the Use of Parallel Machines." The symposium program consisted of eight talks by invited speakers plus thirteen contributed talks, presenting a total of twenty-two papers (one invited talk spanned two papers). Since the topic is considered the central problem in the area of parallel computation today, it was the intention of the organizers to make the results as widely available and accessible as possible to the parallel computing research and teaching communities. In addition to the usual printed Proceedings (Johnson, Makedon & Metaxas, 1992), it was decided to publish multimedia proceedings that would capture as much as possible of the conference atmosphere.

Figure 1: The Talks interface.
The CD-ROM includes: A navigation shell that facilitates hierarchical navigation in hyperspace, hypertext of the papers presented, movies of the invited speaker delivering the conference talks along with their slides being marked during the talk, hyperlinks connecting relevant parts of the proceedings, and bibliography. In the future, it would be nice to include animations of some of the algorithms presented. Furthermore, the system is extendible in the sense that the user is able to create his/her own hyperlinks among objects that he/she deems relevant, do search on keywords and keep notes on the documents.

Figure 2: The hypertext interface.
In a typical session, a user using our system could first get a quick overview of the contents of the CD-ROM. Then he/she could follow a talk on a particular topic that seems interesting, by watching the movies of the slides and of the speaker in separate windows. The user could also get an overview of the talk by using the pop-up menu containing the section titles of the talk, or by skimming though the slides. If, at some particular point, the speaker mentions a theorem without proof, as is often the case in conferences, the user of our system could jump into the hypertext to read the omitted theorem proof in detail. Then, he/she could go back to the talk, continue reading the hypertext bringing on the screen several windows containing relevant information, or do a search on some keyword to find out who else mentioned this keyword during the conference. Assuming that the search brought up several candidate sections of papers, the user could jump into the new paper and continue reading from there, or even jump into the video movie of the second speaker and see how the material was presented during that talk. In every place, the user could make his/her own remarks on the subject being read and keep them for later examination or for filling in the background gaps.
There is considerable flexibility built in our system; evaluating which parts are crucial and should be kept and enhanced, and which could be dispensed without affecting the overall performance of the multimedia proceedings, is part of future research and evaluation.
The first step in producing the multimedia proceedings was to collect the raw material. The speakers' presentations were videotaped, their overhead slides were copied, and their papers were collected. To make the synchronization easier, we used two videocameras for the videotaping; one focusing on the speaker and one on the projected slides. It turned out that the second was by far the most useful. After the collection, all these materials were converted to digital form for computer-based processing. Due to the space constraints of CD-ROM, our intended delivery medium, we decided to deliver only the eight invited talks in a full audio/visual form.
Despite the huge storage provided by the Compact Disk technology we knew that we could not fit even the eight one-and-a-half hour video tracks on a CD-ROM. Therefore we decided to display a one-minute video loop of each speaker. This was certainly the right decision because little valuable information was lost this way; it turns out that a minute of video is sufficient to convey a sense of a speaker's appearance and mannerisms, while at the same time is less distracting to the person studying the talk. Furthermore, breaking the synchronization between the audio and video tracks allowed us to edit the audio without introducing video skipping. Editing both the audio and the video data of the speakers, in a way that preserves synchronization and smooth transition between edited frames, seems as a formidable task which we did not have the tools to undertake.
The audio track of each talk was digitized and then edited to remove pauses and noise words such as "umm"s and "ahh"s. The edited talks were roughly half as long as the originals, and much more listenable. To improve the quality, we amplified most of them using a commercial sound processing application.
The overhead slides were scanned and edited for clarity and contrast; they were also made smaller to fit in the appropriate window of the interface program. One of the problems we encountered in this process was the fact that, after diminution, the slides that were poorly handwritten had low readability. So, some of the slides had to be retyped.
Then, using the original videotapes as guides, we synchronized the slides to the edited audio tracks using a commercial video editor. The resulting "movie" reproduces the most important features of a talk, the speaker's words and slides, and preserves their temporal connection. These "movies" were indexed to allow random access to a list of primary topics, and to allow more sensitive linking between the papers and the talks.
The video loops were digitized with a low-end video capture board and compressed to provide efficient playback from a CD-ROM. The loops were selected to have similar opening and closing frames, so that the loop transition would not be distracting to viewers. The loops were kept small to enhance playback performance and to keep the lack of synchronization between the speaker's lips and words from becoming a distraction.
Given the variety of the playback speeds of the commercial CD-ROM drivers available, some computer configurations will have a hard time displaying both the slides video and the speakers video loops on the screen at a comfortable speed. For that reason, we have given to the user the ability to stop the speaker video loop and replace it with a static color picture of the speaker.
The final, and by far the most time-consuming stage, was to prepare the twenty-two papers. Wanting to produce a truly hypermedia product and not to sacrifice its usability, we decided to present the papers in a hypertext form, using an advanced hypertext engine. We used the Gloor/Dynes hypertext system (Gloor, Dynes & Lee, 1993) that was developed at MIT for the CD-ROM version of the "Introduction to Algorithms" textbook (Cormen, Leiserson & Rivest, 1990) and is based on Apple's HyperCard software.
We first broke the twenty-two papers into hypertext "nodes," and assigned each node a "node level" that reflected its degree of generality. For example, all the abstracts of the papers are on level 1. This way, a user can quickly become familiar with the themes of the presented papers by visiting all the first level nodes of the Proceedings. This "chunking" process was handled manually by computer scientists with expertise in the subject area. It appears that one cannot automate this process unless the authors have written their papers following some carefully predefined specifications. We also tried to have authors provide chunking information, but the results were not sufficiently consistent from paper to paper, so in the end all chunking was performed at Dartmouth.
We, then, converted all the papers to HyperCard text. Every author provided postscript versions of the papers and half of them provided electronic versions in TeX or LaTeX format; the latter were converted to HyperCard form with homemade utilities and manual cleanup. Unfortunately, we could not use postscript files for the hypertext engine, so the remaining papers were scanned, processed by optical character recognition software, and then manually corrected. A great number of errors were introduced by the scanning and recognition process, and some pages were simply retyped. A number of text features required special treatment. HyperCard does not support subscripts or superscripts, so, special fonts were used in their place. Uncommon symbols were similarly provided by custom fonts. This still leaves out very complex equations, which were scanned and displayed as graphics using custom software.
Figures were scanned and edited for clarity. The text was manually marked to provide links to citations, tables, and figures. Each hyperlink in the text appear as bold-faced word. The system supports multiple windows containing scanned figures, tables or bibliographic information. Finally, hyperlinks leading to referenced sections were introduced.
The final step was to integrate all these elements into a single user interface, and to test the resulting system. The interface was designed to be simple, usable and attractive. Extensive color graphics and on-line help facilities were built into the navigation shell. The system was tested on a number of machines with different capabilities and configurations. Special care was taken to optimize the transfer data ratio so it performs well on a variety of commercially available CD-ROMs.
The production of the DAGS'92 Multimedia Proceedings has confronted us with two subjects for future research. One is the question of how such projects can be assembled in a short period of time, with less manual labor. Even though, in the beginning, some production delays were caused by evaluation of the best alternative choice in problems we faced, the experience gained has resolved most of these problems and we now need to focus on the automation of the manual steps.
Some of the time-consuming steps, such as digitizing papers and slides, could be eliminated if the source material was available in electronic form. Removing pauses and noise words from audio sources automatically should also be possible, with sufficiently sophisticated techniques. We currently experiment with more advanced software that can help in that direction (Matthews, Gloor & Makedon, 1993)
Converting linear text to a hierarchy of hypertext nodes may be the most intractable problem; it is difficult to see how to replace the human expert, in the near term. It is clear that projects such as this will not be undertaken if they require the multiple man-years of labor we invested; so progress must be made in automating the process. We are currently working on a carefully specified list of rules and instructions that could help the authors divide an article in hypernodes. Ultimately, we hope that they would use this list in preparing the final version of the paper. This approach, if successful, will eliminate the most difficult problem of future productions.
The second subject for research concerns the effectiveness of the final result. We believe that the multimedia proceedings provide most of the content of live talks and linear papers, with the advantage of random access and hypertext linking. Of particular interest is using this technology in the production of hypermedia books. It is often the case that a good textbook author is also a good speaker; hypermedia textbooks, based on the results of our experiment, are expected to have positive impact on the writing of future textbooks. Further exploration of this is very important. But this proposition needs careful evaluation, and the cost/benefit of certain features (such as video loops and hypertext) needs to be scrutinized in order to justify the production resources they require.
We believe that the DAGS'92 Multimedia Proceedings is a step towards academic publications that more fully reproduce the experience of a live conference or classroom. Currently such publications are one of a kind, expensive to produce, and with clear but unmeasured advantages over their traditional counterparts. Our experience highlights these shortcomings, but also suggests that with further research improved systems can be built with less effort, and greater rewards.
Rada, R. (1993) (prod. chair) Proceedings CD-ROM of the First ACM International Conference on Multimedia, 1-6 August, Anaheim, California.
MacSciTech (1992). Proceedings CD-ROM of the 1992 MacSciTech Conference on Scientific and Engineering Applications for the Macintosh, 15-17 January, San Francisco, California.
Johnson, D.B., Makedon, F. & Metaxas P. (1992). (eds.) Proceedings of the 1992 DAGS Conference, 23-27 June, Hanover, New Hampshire.
Gloor, P., Dynes, S. & Lee, I. (1993). Animated Algorithms, MIT Press, Cambridge, Massachussetts.
Cormen, T., Leiserson, C. & Rivest, R. (1990). Introduction to Algorithms, MIT Press, Cambridge, Massachussetts.
Matthews, J., Gloor, P. & Makedon, F. (1993). VideoScheme: A Programmable Video Editing System for Automation and Media Recognition. In Proceedings of the ACM Multimedia '93, Anaheim, California.
Gloor, P., Makedon, F. & Matthews, J. (1993) (eds.) Multimedia Proceedings of the DAGS '92 Conference, TELOS/Springer-Verlag.