From an Early Electronic-Publishing Concept towards an Advanced Electronic Information Handling

André HECK
Strasbourg Astronomical Observatory
11, rue de l'Université
F-67000 Strasbourg, France

Abstract

The current dramatic evolution in information technology is bringing major modifications in the way scientists communicate. The concept of `electronic publishing' is too restrictive and has often different, sometimes conflictual, interpretations. It is giving way to the broader notion of `electronic information handling' encompassing the diverse types of information, the different media, as well as the various communication methodologies and technologies. New problems and challenges result also from this new information culture, especially on legal, ethical, and educational grounds. The procedures for validating `published material' and for evaluating scientific activities will have to be adjusted too. `Fluid' information is becoming a common concept. Electronic publishing cannot be conceived without link to knowledge bases and information resources, nor without intelligent information retrieval tools.

Setting the Background

It is a truism to say that we are undergoing nowadays a kind of `revolution' in information technology (Heck 1995a) with far-reaching impacts. As far as communication is concerned, many consider we are currently living in a period which is as important for mankind as the XVth century that saw Gutenberg's invention of the movable-type printing process. The popularization of hypertextual structures has also added new degrees of freedom. Information has lost its essentially frozen nature and is becoming increasingly fluid. We are all becoming authors of sets of documents immediately highly visible and retrievable.

Maybe because of the cosmic character of their disciplines and their daily planetary exchanges, astronomers and space scientists have certainly been among the first ones to realize the potentialities of these new methodologies and technologies. The aim of this paper is to share some views from the experience gained in these fields and from the changes and trends noticed. Let me start however by a couple of remarks.

First of all, when one has the honor to be an invited speaker at such a prestigious conference before such a distinguished audience, one has to be very careful not to succumb to a possible temptation of playing the guru, by predicting the future and putting forward statements that could turn out completely wrong later on - or by omiting totally something that would become of universal use. History is full of such situations and those of you who have read the recent book by Bill Gates, ``The Road Ahead'' (1995), know that he is himself quite cautious in this respect, quoting actually a few juicy examples. Close to us and in line with the theme of this meeting, how many people foresaw the emergence of the today omnipresent World-Wide Web (hereafter WWW or `the web')? Therefore, let me take here an important precaution: I reserve the right to be wrong and let future decide.

The second important caveat to be brought up from the beginning is that electronic publishing (EP) is often misunderstood and/or interpreted in various, sometimes conflicting, ways (Heck 1992b, 1992c), and I am sure there will be the case at this conference too.

First of all, there is still too frequently a confusion made between electronic and desktop publishing (DTP). The former one can be understood as a way of producing locally through relatively sophisticated software packages and laser printers high-quality printed material ready for reproduction by a publisher with the traditional camera-ready technique. The latter term would rather concern the electronic submission of material straight to the publisher who will work directly on the electronic files and get the `documents' ready for being `published' through a succession of computer-assisted steps.

Additionally there are still too frequent timorous and/or conservative attitudes in view of what is possible with the current development of technologies and methodologies. Too many people remain short of the potentialities of the new medium and see still EP as little more than putting on line an electronic version of something that is existing also on paper. This leads to semantically conflictual expressions such as `electronic preprints'!

Do not misunderstand me. I am not saying that putting on line a printed document is wrong, but this is by large insufficient. Why? Simply because the electronic medium is exactly what that means, a new medium per se, complementary of the existing ones, and because its usage should imply - and even require - dedicated techniques, policies, and strategies. Thus we should be careful not extrapolating too much deeply-rooted old habits to new methodologies and communication tools, neither to remain on a - maybe understandable, but regrettable - inertia. This is such an obvious statement that I wonder whether it is really necessary to illustrate it. Comparisons are often drawn with the advent of radio, or better television. The introduction of a new medium does not lead to the disappearance of former ones (in this case newspapers and magazines). It calls however for a specific approach tailored for it in the same way that, on TV, they do not zoom on newspaper pictures or broadcast people reading magazines (cf. also Ritchie 1992's description of the canal technology used by the railway and Moore 1991's splitted bell-shaped curve of the adoption life cycle of new technologies).

The emergence of that new medium is currently best represented by the WWW (but what will it be to-morrow?) using Internet and associated networks as vectors. WWW is based on hypertext and hypermedia (see e.g. Nielsen 1990 & Landow 1992) which are closer to the mental structure of many people who find then a natural response on the web. Hypertext (a term coined by Nelson 1967) has been around for quite some time already (see e.g. Smith & Weiss 1988 and the subsequent papers of that special issue of the ACM Communications on hypertext they edited).

The WWW is a magnificent communication tool that has been called the `fourth media' ( Internet World, April 1995) and which is de facto a fantastic cross-disciplinary, cross-educational and cross-social meeting ground (see also Erickson 1996). It is a highly dynamic domain evolving rapidly. The explosion of electronic documents is actually not a bed of roses as a new medium, new facilities and new possibilities bring in naturally new questions, new challenges and new problems. There is plenty of work ahead on the grounds of ethics, law, security, fragility, education, and so on. We shall tackle only a few specific points here (for more ample discussions, refer to Heck 1995a,b,d).

Briefly speaking, a new culture is taking place (see also Drucker 1993, Rutkowski 1994, Soros 1994, ...) and this context must be taken into account in order to tackle EP with an appropriate perspective well tuned on all levels.

Early EP launch in astronomy and space communities

The ultimate aim of astronomers or space scientists is to contribute to a better understanding of the universe and consequently to a better comprehension of the place and role of man in it. To this purpose, together with theoretical studies, they carry out observations to obtain data that will undergo treatments and studies leading to the publication of results. The whole procedure can include several internal iterations or interactions between the various steps as well as with external fields (instrumental technology, ...), scientific disciplines and information handling methodologies. The trend is also clearly towards panchromatic astronomy as opposed to `photonic provincialism' (Wells 1992).

In the astronomy and space community, things have been moving early. In 1991, Strasbourg Astronomical Observatory hosted the first international colloquium on desktop and electronic publishing (Heck 1992a). At the same time, steps were also taken on the other side of the Atlantic thanks to colleagues of the `American Astronomical Society' (AAS). If you wish to get a precise idea of their achievements, please refer to the excellent paper by Peter Boyce and Heather Dalterio in the January 1996 issue of Physics Today.

A look at the expansion of the WWW (Rutkowski 1994) and a reference to the words of Joseph Hardin from NCSA at ADASS 1993 indicate that astronomers and related space scientists have surely been among the first ones to realize the potentialities of the web which is considered currently as the most efficient way of sharing electronic information. Strasbourg Astronomical Observatory has also hosted last year the first international meeting on the WWW in astronomy and related sciences (Egret & Heck 1995a, Heck 1995c). Another meeting on `Strategies and Techniques of Information for Astronomy' will be held next June at the European Science Foundation (ESF)

The role of data centers such as the `Centre de Donnees astronomiques de Strasbourg (CDS) (see e.g. Egret et al. 1995) and other similar resources (refer to descriptions in Egret & Albrecht 1995) is significantly changing. Their activities have been evolving from mere compilation, critical evaluation, archival and dissemination of data towards being those of ever more complex electronic information hubs adding links and structure to the inter-related pieces of information and towards distributed specialized repositories of different types of data and material. The example of CDS is representative in the sense that, as originally a world-wide reference for data on astronomical objects, it is now also providing links to abstracts, published papers and papers in press, yellow-page services, and so on.

As the astronomy librarians are becoming increasingly involved too, it is appropriate to mention also the conference entitled ``Library and Information Services in Astronomy II (LISA-II) (Murtagh et al. 1995) aiming at i.a. `discussing interface areas between astronomical libraries and the wide range of online and other astronomical computer-based services which are becoming ever more widespread'.

Electronic authoring

In case this did not appear clearly from the discussions above, a `publication' is to be understood here as a `public announcement' (Webster 1976), no implicit assumption being made as to the medium used. Since, in other words, `publishing' is making information public, documents on the WWW have to be considered as `publications'. In our understanding, the concept of `information' covers the observational material, the more or less reduced data extracted from it, the scientific results, as well as the accessory material used by the scientists in their work (blibliographical resources, increasingly important yellow-page services, and so on).

The earlier comparison with TV falls short however when it is realized that each electronic-network user can become ipso facto an author/creator on the web. A phenomenon that has to be appreciated is that, independently from any validation procedure, servers and web documents of persons and organizations with notoriety and reputation will be visited regularly with preference and confidence, so disrupting the current chronology preprint-submission-publication.

One could even wonder whether servers of `preprints', of conferences, not to forget those of personal documents and productions, will not take over if the procedures of the learned societies, the commercial publishers and other traditional channels remain slow and heavy, failing to respond to the dynamism, the fluidity and the visibility available via the electronic vector.

Fluidity of nowadays information - EIH

Unfortunately, most of what can be read about EP relies still implicitely on fixed information, not taking into account that we have entered for good the era of fluid information, i.e. a material that can be continuously updated, upgraded, enlarged, improved, modified, and so on. This new concept implies of course the subsidiary ones of document (in)stability and of document genetics: beyond its own permanent possible evolution, a document can give birth to subsidiary ones, first linked to itself; the relevance of some of these can then supplant with time that of the original document that would virtually `die'.

Here we have a real challenge to the conventional approaches of information handling and to the usual legal policies (copyright, ...) and financial ones (subscriptions, ...). Forgetting this `fluidity' aspect would be equivalent as staying with CD-ROMs that are frozen repositories of fixed information or vice versa.

Mentalities, habits and policies will have to adjust themselves progressively, with the usual human delayed reaction time. Each step contributing to the enlargement of virtual libraries will bring us every time closer to the original Ted Nelson (1981)'s vision of a virtual encyclopaedic library (Xanadu project). Refer also to Nelson's comments in the November 1995 issue of Byte.

In the everyday life, EP is evolving towards `Electronic Information Handling' (EIH), a broader and more flexible concept with additional degrees of freedom, better adapted to the fluid and living nature of today's information material. The classical scheme involving authors, editors, referees and publishers is also changing and can very complex (see for instance the working diagramme of the Star*s Family of Astronomy and Related Resources, Heck 1994 & 1995c).

It is obvious that EP policies have not yet reached a final degree of maturity in spite of the fact that some of them are already quite elaborated (see e.g. Denning & Rous 1995 on the ACM electronic publishing plan). Few of them go beyond an `electronization' of a paper document and of the previous procedures used to deal with it. Certainly these are made faster and more flexible, but still remain short of satisfactorily answering the fluid nature of today's information and the living character of information retrieval (IR) on the WWW. Authors themselves have often not yet fully realized the extremely high visibility a document they put on the web can reach quickly, well beyond the usual circles, so performant are the tools currently available to search for information on the networks. It is important to become fully conscious of this and to prepare accordingly the documents put on line.

Evaluation - Recognition - Validation

The phenomenology of publishing is not only motivated by the need of sharing information, but also strongly conditioned by recognition, a necessity that should not be underestimated. Recognition is seeked for getting positions (grants and salaries), for obtaining acceptance of proposals (leading to data collection), and for achieving funding of projects (allowing materialization of ideas). The general evaluation process applied for financing research and that conditions the need for recognition will have to be unavoidably adapted to the emergence of the new electronic medium.

The traditional media will have to leave a slot for the newcomer which will progressively reach its deserved importance and naturally become part of the evaluation process, i.e. the assessment of activities for individuals and organizations. Funding institutions, expert committees, learned societies, and so on, will have to get ready for it.

This implies of course another step: the adaptation of validation procedures (`refereing' material) or quality assurance (not to be confused with `quality control'). As it was already pointed at the 1991 DTP Colloquium (Heck 1992b), it has become increasingly difficult to distinguish between the so-called grey literature and the formal one. Thus reliable validation procedures are more than ever necessary ... if the same cycles validation-evaluation-recognition are kept, together with their underlying philosophy, culture, and subsequent policies.

However are these really compatible with the very dynamic nature of electronic handling of information (EIH)? It will be difficult for people to refrain putting on line finalized electronic documents without waiting for the unavoidable delays in approval and release (on paper or on a server). How will this be compatible in turn with the copyright policies and the financial aspects (subcriptions, invoicing of downloads, ...)? I did not find satisfactory answers to these questions in the EP literature or in the EP plans I am aware of. We shall need to be inventive in this respect.

Other issues and last comments

There are many other issues at stake that could be discussed in the light of the experience gained and of the intuition from the changes and trends noticed over the past years (refer also for more ample discussions to Heck 1995a,b,d).

The maintenance process of information resources must be continuously improved from lessons learned with time and by using the most adequate tools. Generally speaking, information has to be collected, verified, de-biased, homogenized, and made available not only in an efficient way, but also through operationally reliable means (it becomes useless if plugged into a confidential network or reachable through deficient routers). Redundancies have to be avoided; precision is and details can be extremely important. Last but not least, the continuous political evolution of the world has also to be taken into account and one must be permanently in alert on practical aspects (refer e.g. to Heck 1995c for detailed considerations specific to the `Star*s Family of Astronomy and Related Resources').

If scientists have a natural tendency to design projects and software packages involving the most advanced techniques and tools, there is in general less enthusiasm for the painstaking and meticulous long-term maintenance which builds up however the real substance of the resources. This has also to be carried out by knowledgeable scientists or documentalists and cannot be delegated to unexperimented clerks.

Information retrieval per se is raising a number of `evaluation' issues (see e.g. Harman 1992 and the subsequent papers of the corresponding special issue). The fashion is now shifting towards designing and experimenting quality control processes. This might be a very serious matter or a big joke. Until further evidence is brought up, we believe that the best quality assurance (accuracy, homogeneity, consistency exhaustivity, ...) has to be achieved when collecting and entering the data themselves. None of the algorithms currently available has really convinced us of their absolute necessity and satisfactory efficiency. Again here, developing such processes is an appealing challenge for scientists, but most of the algorithms designed work statistically. For a resource, it does not matter much whether the material querried is accurate up to 95% or 98%. The user wants to find the piece of information he/she is looking for, and, if found, this has to be accurate.

This explosion of documents on the web is not a bed of roses. New facilities and new possibilities bring in naturally new questions and new problems. Some WWW servers have already reached a quite fair degree of maturity. Others are still a bit in a wild stage by lack of structure and homogeneity or simply because they offer, let us say it frankly, a significant amount of rubbish of little interest. Although quite a few features have been adopted de facto by the developers of documents on the web, there will be, sorry, there is, a definite need for a WWW ethical charter. It could concern quite a number of features from the substances of the documents themselves to their aesthetical presentation and a number of recommended functionalities.

Hypertext itself is too often badly used to structure the documents. Pointing to external resources should also be preferred to cutting and pasting in, which might be even considered as criminal acts. In any case, proper credit to the material used should always be clearly indicated. Only well-tested documents should be put on line. It is easy to create working directories on which browsers can be run locally. URLs should not be changed unless absolutely necessary and, in such cases, links should be provided from the obsolete ones to the new ones (see more on web policies in Heck 1995b).

Since WWW browsers make it so easy to download the original files, crediting the sources appropriately becomes critically important. It is actually smarter and more elegant to insert a hyperlink to the original document since it will point then always to the freshest version of the file. This brings us to security issues, involving monitoring, restricted access, confidentiality and so on. Away from governmental policies (such as the Clipper chip project in the US that is raising substantial controversy), there is no golden rule on security issues: it is up to each local `webmaster' to put the appropriate securities according to the material concerned. Some resources require appropriate clearance (password, account number and so on); others will be only partially retrievable in a specific query (such as large copyrighted databases); finally, other documents are freely accessible and usable, conditioned to a minimum of ethical behavior (see above).

Legal aspects (copyright, electronic signatures, ...) are also extremely important and jurists are busy setting up references for the computerized material. Particularly in this case, there might be variations from country to country when the law already exists. However, with the world globalization of electronic communications, one can expect - and hope for - a quick harmonization of the various references and procedures. On such matters, refer to Pam Samuelson's very interesting regular column `Legally Speaking' in the ACM Communications (particularly Samuelson 1993 & 1966).

At a time when authors/creators of electronic documents are increasingly worried about the easy possible alteration of their work, proper credit to the material used should always be clearly indicated. Since the browsers make it so easy to download the original files, crediting the sources appropriately becomes critically important. It is also smarter and more elegant to insert a hyperlink to the original document since it will point then always to the freshest version of the file.

Non-negligible educational aspects will have to be taken into account as to the introduction and training of young and not-so-young people to the new technologies within the various communities. This is true not only for scientists, but also for librarians and documentalists who will see their role significantly changing within their institution and who will be increasingly dealing with a virtual material.

To the frequent question `Do we still need the classical publishers?', our answer is `Yes', simply because we still see the electronic medium as complementary of the traditional ones. Of course, there are quite a few valuable arguments raised against it (see e.g. Berners-Lee 1992 & Heck 1992b), especially at a time when library costs are spiraling upwards as opposed to the electronic material where the costs are widely distributed and where there is apparently a dilution of the overall invested energy and manpower. There are also quite a few arguments in favor as the expertise (and the means) by commercial publishers to protect copyrighted works. But how often actually had this procedure to be applied in the past and do we really need it in the future?

It is a fact that creators, authors, contributors, and so on, are worried by the fragility of their work under electronic format (making illicit copying easier, etc. - in this respect, see e.g. Samuelson 1994), but this fear is basically linked to the aspect of recognition. This, and more generally the ethical behavior, will have to be adapted to the new communication practices.

It is also clear that the advent of the web makes interdisciplinary communications much easier, more emulating and more inspirational. One must however be cautious with encyclopaedic tendencies resulting from the electronic evolution and, even within a specific scientific discipline, one must refrain from engulfing enormous amounts of energy and manpower in oversized endeavours with questionable return.

Since this conference is also linked to UNESCO, let me finish by saying that, from a number of resources we are maintaining on the web, the world-wide distribution of servers and facilities is far from being homogeneous. If one can rightfully question the policy of maintaining some peoples in a quasi-permanent status of assisted entities, it is certainly part of our duty to reduce such inequalities.

Bibliography

Abbreviations and acronyms

  AAS       American Astronomical Society (USA) 
  ACM       Association for Computational Machinery (USA) 
  ADASS     Astronomical Data Analysis Softawre and Systems 
  ALD       Astronomy from Large Databases 
  CDS       Centre de Donnees de Strasbourg (France) 
  DTP       DeskTop Publishing 
  ECHT      European Conference on Hypertext 
  EIH       Electronic Information Handling 
  EP        Electronic Publishing 
  ESF       European Science Foundation 
  ESO       European Southern Observatory 
  ICSU      International Council of Scientific Unions 
  IR        Information Retrieval 
  ISBN      International Standard Book Number 
  ISDN      International Services Digital Network 
  IT        Information Technology 
  LISA      Library and Information Services in Astronomy 
  NCSA      National Center for Supercomputer Applications 
  UNESCO    United Nations Educational, Scientific and Cultural Organization 
  URL       Uniform Resource Locator 
  WAW       Weaving the Astronomy Web 
  WWW       World-Wide Web

[Talk delivered at the ICSU Press/UNESCO Conf. on Electronic Publishing in Science (Paris, 19-23 Feb. 1996)]
Books main page.
Publications main page.
© Copyright André HECK, current year.