UDC Consortium

Consortium Members

UDC Consortium, PO Box 90407, 2509 LK The Hague, The Netherlands
Tel.: (+31) 70 314 0509    Fax: (+31) 70 314 0667    E-mail: udc@kb.nl Bibliography Members UDC Publications UDC News Outline About UDC Master Reference File UDC Users



The UDC MRF Database Development and Design – a historical review

by P. D. Strachan and F. M. H. Oomes, UDC Consortium, The Hague
(Written 1993, chapter 'Content of the MRF" updated Nov 2001)

Introduction | Compilation process | Database structure | MRF Content | Further development

 

Introduction

In March 1990 a Task Force on UDC System Development that in October 1988 had been established by the UDC Management Board, submitted its final report. Its scope had been defined as follows:

... to advise the UDC Management Board ... concerning appropriate long-term, strategic development of the Universal Decimal Classification as in its entirety an effective, flexible and durable system for use in classifying recorded information and knowledge.

In view of this somewhat global and anyhow flexible definition, it may surprise - and demonstrate the pragmatic approach of the Task Force - that it said in its first and primary recommendation:

A "standard version" of c. 60.000 subdivisions, in English, in machine readable format should be created. It should be supported by a semantic network and have a much more consistently faceted structure than at present.

This database should be completed within two years and provide the individual publishers of UDC versions the material for the compilation of their editions. It should also be the basis for revision of the schedules and the starting point for "Extensions and Corrections to the UDC". To achieve this a consortium of interested institutions should be set up. The Management Board accepted the recommendations of the Task Force and per 1 January 1992 FID transferred the intellectual ownership and the responsibility for maintenance and development of the UDC to the UDC Consortium.


Sources

At that time the creation of the machine readable version — in the meantime baptized Master Reference File (MRF) — of the schedules had already started. For practical reasons the International Medium Edition, published by BSI Standards was selected as basis for this database. The text was already available in digitalized form and, taking into account the necessary updating, its size corresponded fairly well with the 60.000 notations recommended by the Task Force. This basis has been modified and supplemented by:

  1. All revisions authorized in Extensions and Corrections to the UDC, Series 10: 1 (1978) up to 14:3 (1992).
  2. Entries selected from a number of editions of around medium size that had been published after the International Medium Edition. This included the Japanese Medium Edition (1984) (especially science and technology), the Hungarian Large Abridged Edition (1991), the Serbo-Croatian Medium Edition (1991) and the French Medium Edition (first volume 1990).
  3. Additions to fill gaps in hierarchies and arrays that resulted from the selective nature of medium editions, but did not match with the required consistency of the MRF.

 

Compilation process

The database was compiled using UNESCO's Micro CDS/ISIS version 3.0. The development of the database design, the formats ("worksheets") for printing, editing and display was done by Gerhard Riesthuis, senior lecturer at the University of Amsterdam and David Strachan. Drs. Riesthuis also wrote the different programs for conversion of the various sources to CDS/ISIS files.

The design had to take account of a highly complicated process, caused by the fact that the database had to be compiled from various sources. Firstly the conversion to CDS/ISIS of the files of the International Medium Edition; secondly the materials from Extensions and Corrections before Series 13 that had to be keyed in and converted; then the conversion of the already existing text files of Extensions and Corrections 13/14 up to EC14:3 that was published in October 1992.

Besides there existed separate lists of cancellations and modifications, including replacements of cancellations, to IME and finally of the selections made from the more recent medium editions.

For practical reasons the entire UDC was divided in ca. 30 subject sections, each of which was completed and edited separately. For almost each of those sections separate databases had to be built for the material from each of the applicable sources. The diagram below is a very simplified representation of the compilation process and one should realize that this had to be done for each of the ca. 30 sections mentioned above.

The last stage of final editing included, among other things, checking of references, the translation of entries of which only a German (or in some cases French) text was available and expansion of the IME in line with more recent medium editions.

 

Database structure

The field structure of the CDS/ISIS database had to account for the various components of an entry in UDC-schedules and for the individual sources of the database content as well as for the different type of intervention during the process of editing. While compiling, some of the original fields turned out to be superfluous or not practical, so they were never used. Some fields were only declared because they allowed for selections supporting the editorial operation or to produce printed output in a certain format. The design has known several versions of which the final one is almost complete listed below. Some fields were divided in subfields that could be individually accessed by the CDS/ISIS software. To fill in the fields in many cases a table of codes had been defined.

Field Function and explanation
10 Validation (codes).
This field was used for selecting and deleting cancelled entries.
20 Source of original entry (codes).
i.e. International Medium Edition, English Full Edition, E&C etc.
21 Changes made to source entry (codes).
i.e. translated into English, updated from later revision, etc.
24 Type of special auxiliary (if applicable) (codes).
i.e. hyphen, point-nought, apostrophe, other (e.g. ...0/ ... 9).
25 Derived from parallel-subdivision (Y/N).
26 UDC-source of parallel instruction for 25.
31 Stage code for database creation (codes).
33 Changed this stage (Y/N).
Used for special selections for editing.
40 Table (codes).
To allow for output in the correct sequence each of the Tables of the Common Auxiliaries and the Main Tables as a whole had to be individually coded.
45 Application of special auxiliary: note (coded).
If an entry was accompanied by a note concerning the application of special auxiliaries, this was indicated by its code (see field 24).
46 Application of special auxiliary: parallel subdivision (code).
Analogue to 45 if the auxiliary was introduced by the parallel division.
50 Language (coded).
Indicated availability of English and/or German text.
55 Added by other medium editions? (codes).
Codes for the more recent medium editions mentioned above.
100 UDC-number.
109 Index only.
Used for indexing UDC-numbers that were not covered by the selection tables defined for other fields.
110 UDC-description: definition.
112 UDC-description: verbal examples.
To separate examples in the description from the core concept and often containing entries from hierarchically lower levels in full editions.
120 References.
With subfields for notation and accompanying text.
130 Notes explaining application or scope of the entry.
140 Combination examples.
With subfields for the notation of the example, its description, annotation and references.
160 Parallel division note.
For UDC-numbers that are parallel divided as a certain other UDC-number (source).
With subfields for the notation and accompanying text.
162 Examples of the parallel division of field 160 with subfields for notation and description.
210--262 Same function as 110--162 but with, if available, German text.
900-- Fields used for different type of editorial annotations (special characters etc.).

UNESCO's CDS/ISIS software proved to be a very reliable, although in many cases somewhat tough and somewhat unfriendly tool for compiling, editing and managing the databases. Its main advantage appeared to be its flexibility in output and display of the database content, and in converting database from one format to another.

However it would be very useful if it would offer facilities for automated checking of references, which can now only be done by hand via printed lists.

A minor but awkward problem is that Micro CDS/ISIS uses the apostrophe for delimitation of the search argument. Apostrophe-auxiliaries therefore disturb the search facility. For the time being this problem has been circumvented by replacing the apostrophe by an inverted comma; for printed output this has to be corrected by a search-and-replace action of the text processing software.

Merging

In the last stage of the creation of the Master Reference File the separate databases for the various sections had to be merged into one database file.

Before doing this a new database design had to be developed. Some fields in the former design were no longer functional, others had to be added so as to register revisions and revision history. Of course, a copy of the original database files has been kept for future reference and to keep track of the sources.

The new design, which is so far more or less experimental - as said, converting a database to a new format is relatively easy in Micro CDS/ISIS - has the following field structure:

Descriptive fields

Field Function and explanation
1 UDC-number.
2 Table (codes).
To allow for output in the correct sequence each of the Tables of the Common Auxiliaries and the Main Tables as a whole had to be individually coded.
3 Type of special auxiliary (if applicable) (codes).
i.e. hyphen, point-nought, apostrophe, other (e.g. ...0/ ... 9).
4 Combination type (codes).
For composed notations this field should indicate the type of combination i.e. with colon or with a certain type of special auxiliary.
5 If applicable UDC number from which the number had been derived by parallel division.
11 If applicable UDC number that is the source for parallel division of the number.
With subfields for the notation and accompanying text.
12 Type of special auxiliary (coded) introduced by the parallel division.
13 Type of special auxiliary introduced by an application note (see field 111).
100 Description: definition.
With subfields for language versions.
105 Description: verbal examples.
With subfields for language versions.
110 Scope note.
To explain the semantic content of the description.
With subfields for language versions.
111 Application note.
For technical details about the application (e.g. applicable special auxiliaries).
115 Combination examples.
With subfields for the notation of the example, and language versions of its description, annotation and references.
120 Examples of the parallel division of field 11 with subfields for notation and language versions of its description.
125 References.
With subfields for notation and language versions of accompanying text.


Administrative fields

Field Function and explanation
901 Date of introduction.
903 Source of introduction.
904 Comments on introduction.
911 Date of cancellation.
912 Replacement(s).
913 Source for cancellation.
914 Comments on cancellation.
921 Date last revision.
922 Specification of revision indicated by number(s) of revised field(s).
923 Source for revision.
924 Comments on revision.
925 Revision history indicated by date and number of revised field.
951

Index only. Used for indexing UDC-numbers that are not covered by the selection tables defined for other fields.

952 Note concerning the use of special characters. In CDS/ISIS many diacritics and special signs have to be coded and this cannot be done in the description itself because this may disturb the searching facilities.
Coding therefore has to be done in a special separate field.
955 Editorial annotations and comments.

Content of the Master Reference File

The database in its last update in the year 2007 contains totals ca 67,770 records = UDC class numbers. The datafile (.MST) occupies ca. 15,000 Kb. The distribution of the records according to section and subject fields is as follows (updated in July 2008):

 

SUBJECT COVERAGE

 

Table

Description (shortened)

UDC numbers

Ia/k

Common auxiliaries

12,993

 

including:

 

Ic

Language

1,364

Id

Form

362

Ie

Place

9,384

If

Ethnic

33

Ig

Time

284

Ik

Properties

805

Ik

Materials

152

Ik

Processes

333

Ik

Persons

267

 

Main table

54,777

0

Generalities ... Documentation. Librarianship etc.

1800

1

Philosophy. Psychology

824

2

Religion

2419

3

Social Sciences

6813

 

including

 

30/32

General. Statistics. Sociology. Demography. Politics

962

33

Economics

2004

34

Law

1826

35

Public Administration. Government

1010

36

Public Welfare

581

37

Education

234

39

Folklore. Etnography

194

5

Mathematics. Natural Sciences

11,176

 

including

 

50

Environment science

49

51

Mathematics

1033

52

Astronomy

625

53

Physics

1846

54

Chemistry. Mineralogical Sciences

3305

55

Earth Sciences

1497

56/59

Palaeontology. Biological Sciences

2820

6

Applied Sciences. Medicine. Technology

27486

 

including

 

    61

Medical Sciences

3170

    62/621.2

Technology in general. Heat Engines. Hydraulics

1575

    621.3

Electrical Engineering

1698

    621.4/.6

Heat Engines. Pneumatic Energy. Fluids Handling

477

    621.7/.9

Mechanical Technology

1486

    622

Mining

679

    623

Military Engineering

618

    624/627

Civil Engineering

1522

    628

Public Health Engineering

628

    629

Transport Vehicle Engineering

1779

    63

Agricultural Sciences

2273

    64

Home Economics

718

    65

Management and Organisation of Industry

1387

    66

Chemical Technology

4455

    67/68

Various Industries and Crafts

4576

    69

Building

705

    7

Arts. Recreation. Entertainment. Sport

2596

    8

Language. Linguistics. Literature

616

    9

Geography. Biography. History

435

Further developments

As a database the MRF will certainly be useful in automated systems for cataloguing and information retrieval. Therefore, the UDC Consortium decided to make it available as such to interested libraries and documentation institutes.

In this stage of its development the MRF can be delivered as a database in Micro CDS/ISIS, as a file in ISO 2709 interchange format and as a text file in plain ASCII that can be loaded in a text processor. Other ways of distribution including special user applications for accessing the MRF will be developed if they respond explicitly to the users' needs and the necessary funding is available.

With regard to this the UDC Consortium will be very grateful for suggestions from users.

The MRF database will be the core material for all editions of the UDC in whatever language, size and form, and on whatever medium. It is also the starting point for all future revisions and enhancements of the UDC. It is the intention of the UDC Consortium to approach the revision process in a more structural way and to shorten the revision procedures.

The needs and wishes of the users of the UDC will remain the most important source for revision, for the UDC should be their tool and not a purpose in itself. User clubs might be the vehicle for their comments and suggestions.

However, users should realize that maintenance and enhancement of the UDC requires not only the involvement and enthusiasm of users, but also money needed for committing revision work, staffing and equipment.

It would be disappointing for all those involved in the creation of the MRF if this project could not be continued and further developed, if this first step were not followed by many others.

Last updated: 18 November 2008    Return to top of page