Chapter 4


Data interchange technologies


This chapter describes and discusses the technical and organizational aspects of data interchange in general, and engineering materials data interchange in particular. Materials property data is exchanged for several purposes:

  1. Data presentation for human readability.
  2. Data editing and database editing for update and maintenance.
  3. Data interchange between databases and specific software packages.
  4. Data interchange between two or more databases.

The first two classes, presentation and update, are very different functions in the way that they appear to people but both cases can be seen to be special cases of the third. Presentation and editing interact with a user via a user interface program which communicates (invisibly) with the database management program. Thus the situation is really the same as an exchange between a database and any other software package. Database management programs are themselves just software but have complex and many-layered means of communication limited in complexity only by the complexity of the information they contain. Therefore the fourth class of data interchange is actually a more general case than the third, and this is the subject of this chapter.

This chapter reviews the two main classes of materials data interchange techniques: the item based and the table based methods, with examples given of each. The chapter ends with a description of current activities in materials data interchange and explains their relevance and importance in the context of data transfer activities for other engineering and economic data.

Background

It is generally accepted that appropriate and timely materials data is essential to maintaining the competitive edge of engineering based industries, and that the establishment of a free market in materials data would be extremely useful. It has become apparent that successful materials data interchange is a technical and organizational problem that requires solution before an information market can be established. Data capture and measurement is also increasingly seen as an expensive activity and effective data interchange is seen as a way of alleviating these costs by sharing the use of materials data after computerization.

Those designing materials interchange formats have much in common with those who are developing standard ways of defining databases for engineering information. These are often 'self describing' databases and clearly a completely self describing database would necessarily be itself a valid materials data communication. There is thus significant overlap between the data model and database schemata developers and data interchange developers.

Generic interchange problems

An interchange format must be expressive enough to be able to represent all the information in all the databases between which it is used for data communication. Translation from a database to the interchange format will usually be straightforward (whatever the format used) because the translation is from a less general to a more general representation (and thus it might, in fact, only use a subset of the interchange format). However, interpreting interchange format data into a database will always be harder to do (the computer programs will be more complex). This is true whatever format is used for interchange. In general interpreters will always have to be capable of handling the full complexity of the interchange format and some information will have to be discarded because it does not 'fit' the target database. The conclusion is that interpreting interchange formats into any particular database will always be an intrinsically complex task, requiring a computer program of similar complexity.

The design of a materials data interchange format is therefore strongly constrained in practice by requirements on the programs that translate into, and interpret from, the interchange format: since an interpreter program is inherently expensive to produce, it follows that (a) it should be constructed so that one interpreter can be easily reconfigured to interpret into other database formats, and that (b) it should be as simple as possible.

In the real world every database is likely to be constructed using proprietary database packages. These types of package were formerly supported on only a few computers but the trend today is for proprietary databases to be supported on an extremely wide range of hardware and operating systems, so translators and interpreter programs will, even for a particular database, have to be similarly portable if they are to be used at all. This means that either only one interpreter will ever be written, and that it will be easily configurable for different databases, or the syntax of the interchange format will have to be extremely regular so that new interpreters can be written easily. For maximum flexibility elements of both should be true and interpreters should also be constructable from automatic compiler generating tools [Aho86].

All data interchange formats should have a syntax defined by a formal, stated grammar because otherwise it is impossible to prove that the syntax is expressive enough to be able to represent all the materials data structures, the database schemata, that the format might be required to transmit. This can only be done by a formal examination of the grammar, preferably involving a mathematical proof that the grammar is adequate. The MISL, Express and EDIF formats (see below) are based on grammars similar to those of programming languages but it is also possible to use instead the 'grammar' of database representations: the relational algebra.

Capability for general expression

In devising a data interchange format to be used between databases it is necessary to ensure that the format is at least as capable of expressing associativities (defined in the previous chapter on information representation) as are any of the participating databases. Many formats have been proposed for materials data, but the specification of the degree of associativity is often not well defined, or is extremely restricted. In some cases an arbitrary degree of 'nesting' is permitted and it is naively assumed that this is generally adequate. What is actually required is some more formal argument which will convince potential users that the format's capabilities have been precisely specified.

Figure 4.1 A useful format has a superset of capabilities

Figure 4.1 illustrates that while any individual database can have a coherent view of how to represent materials data (shown as circles, ellipses or lozenges), any useful format must be able to encompass a superset of all these views, both for names and for associations.

This chapter argues that if the databases involved have the same expressiveness as relational databases, i.e. can express the range of associativity that can be defined in relational databases, then the interchange format too must be designed to have the same capability as the relational algebra.

The data input problem

There is a minor conflict in the materials database community about the data (metadata) that describes the circumstances and conditions under which the numerical data was originally obtained.

Database workers are convinced that data without the associated descriptive data is functionally useless but many experimentalists are concerned that the full experimental data approach is unworkable because so much 'extraneous' information seems to be required. The former attitude is surely correct: data as published in reports and academic papers assumes much information implicitly; this cannot be done in a computerized system and so it must be represented explicitly. If it is represented explicitly, then at some point it must have been entered explicitly.

The High Temperature Materials database at the CEC research centre at Petten, Netherlands, is a disciplined database and can accept up to 400 items of data in association. The early version of the data input system, using paper forms, was intimidating to experimentalists who thought that they were required to enter everything. A later personal computer software system asks only the questions necessary to describe the data the experimentalist wishes to enter, which solves the problem.

CIM, JIT, MRP and EDI

Computer Integrated Manufacturing includes many technologies, many of which are organizational, such as Just In Time (JIT) re-ordering of stock parts. The design of products together with manufacturing (CAD/CAM) is now being further integrated with the order/re-order cycle of raw materials in the company (MRP: Materials Requirements Planning and MRPII: Manufacturing Resource Planning). These latter systems rely increasingly heavily on electronic ordering and invoicing (EDI) and hence there will be a continuing need to integrate technical data interchange (such as STEP formats) with business data interchange (such as EDIFACT formats). This chapter ends with a description of EDIFACT's capabilities.

Individual CAD/CAM companies are already pushing the technology with proprietary linked-in small materials databases to permit analysis of manufacturability [Dea90, Sdrc89,Fin90], and research in CIM and 'manufacturing intelligence' has indicated a crucial role for materials data [Aro88, Wil88, Wri88, Mey90, Enc90].

The provision of manufacturing materials information and constraints is likely to be a major delivery vehicle of materials property data to the engineer. It is significant that the INFOS metal cutting database has been one of the more successful of those participating in the European materials databank project [Krö87c]. This is probably attributable to its close ties with computer aided manufacturing: the automatic generation of numerically controlled cutting machine programs requires automatic access of appropriate data.

STEP

Whereas past projects have addressed the need to exchange computerized engineering drawings, the new international Standard for the Exchange of Product Data (STEP) is focused on exchanging entire product descriptions ('product models') which contain sufficient information to be used directly by CAD/CAM systems. An international coordinated effort is developing a wide family of representation techniques, standards and software tools. STEP is expected to dominate CAD/CAM integration for the next two decades.

Geometry and topology descriptions form the core of STEP. In addition to geometry, it will support tolerances, material properties, design 'features' and surface finish specifications. This will enable the standard to be used to communicate an 'informationally complete' description of an engineering component part or assembly of parts for the purposes of design, analysis, manufacture, quality control, test, inspection, maintenance and final disposal.

The material properties activity for the first version of the STEP standard (which will probably be issued in 1992) is limited to the subset of information that is needed to support only linear elastic finite element analysis. There is a growing realization of the importance of materials property information in advanced integrated manufacturing, so for STEP version 2 the scope will be expanded to include all material property information needed to support the entire life cycle of a product.

Levels of abstraction

Figure 4.2 attempts to show, roughly, how different data interchange formats relate to each other in their degree of detail and level of abstraction. The highest level of abstraction is 'semantic' and this can be achieved by building on top of 'logical' descriptions which may be built on either 'fixed format' (e.g. IGES) or 'relational' (e.g. SQL) database structures. These can in turn be based on either specific codings defined as bit patterns or in terms of characters and digits.

In general, the fewer levels that a format covers the more effective it is because more abstract issues are not confused by lower level issues. This reduces the effort required to produce usable systems. The MAP/TOP and OSI protocols are effective precisely because each level has its own standard formats and defined interfaces [Tho89].

Figure 4.2 Types of data interchange

Item based formats

The basic principle that distinguishes item based formats is that they start by associating one name with one value and then extend the concept to allow lists of values and sometimes lists of names by making ad hoc additions to the permitted syntax. They are therefore well suited to situations where the data consists of a few, individual values.

Item based formats are based on the NAME = VALUE system where the VALUE can, in some formats, consist of lists and other name/value pairs. If only value lists are permitted then the structure of the associations can only be tree-like (branching from a common root) although several distinct trees in any set of data are usually permitted.

A tree can be represented in an item format as a linear string of terms with appropriate nesting of brackets. There are several different ways to do this. The example in Figure 4.3 is represented in two ways: as a list where the first term in every list (in brackets) is the parent of the rest of the terms in that list, and as a tree.

(A, (B, (C, D, (E, F), G)), H, I, J)

Figure 4.3 Tree structure

Simple tree-like (hierarchical) item formats have inadequate expressiveness for the required associativity for materials information. What is required is some way of generalizing from them without losing the simplicity, but that is not possible.

Possible associativities

More general associativities can be thought of as multiply-connected lattices rather than trees. This capability can be achieved in a list-oriented (item based) interchange format only if extra 'reference names' are introduced which allow cross-references between distinct trees. This corresponds to an interchange format where a name is permitted anywhere a value is required. However, there are other common cases, such as the orthogonal materials information concepts discussed in Chapter 2, which even this technique cannot handle.

This discussion of trees, lattices and tables is a recapitulation of the historical comparisons of hierarchical, network and relational databases, but from the point of view of their descriptive power for materials information [Dat90, Rum90].

It is possible to extend an item format from simple hierarchical descriptions to arbitrary levels of complexity by introducing upwards-compatible additions to the format definition, but every level of capability adds increasing complexity to such a degree, in both concepts and extra syntax, that it becomes easier to use a tabular format. Tabular (relational) formats have the advantage that a decade or more of experience with relational databases can be brought in to help.

MISL example

MISL is described here not as an example of a potential data interchange format, but as a vehicle for exploring what kinds of extensions are required to any item based format if it is to cope with some common materials data representation problems.

MISL is an item based interchange format based on a formal grammar defining the syntax [Sar88]. No 'materials semantics' whatsoever are embodied in the format, it merely allows fields to be defined and assumes that some other document is a reference for the meaning of the fields. In this sense it differs from STEP and EDIF and resembles the Cambridge tabular format CTDIF and the data file interchange format ISO 8211.

The MISL approach treats materials data items as statements in a language rather than entries in a database. Although in some sense the two viewpoints are equivalent there is something to be gained in bringing the apparatus and technology of lexical analysis, parsers, interpreters and compilers to the problem [Aho86, Som89, Tho89].

The MISL language for describing materials information consists of statements (sentences) constructed from phrases describing the material identity, condition, and test measurement. The test measurement phrase consists of sub-phrases describing test name, results, accuracy and validity range. MISL is defined (using Backus Naur Form) as a sequence of associated statements:

statement ::=
material-phrase, data-phrase {, phrase}

where the material-phrase identifies the material, and the data-phrase describes everything that is required about the data. The association of data items, such as that between a set of creep measurements and the test temperature, is described only by the depth of binding nesting [Sar89a]. This way of expressing associativities is simple and powerful but, as we have seen, on its own it can only handle tree-like data structures.

Tabular data, where a number of data points are distinguished by varying metadata, can be described completely (if repetitively) using the same syntax as for single data points. In order to cope with complex associativities without repeating information, MISL requires some further enhancements to make the structure 'lattice-like'. Required first is a way of 'naming' a list so that it can be referred to from many different parts of the tree of nested data lists.

Although MISL with named lists can describe lattice-like associations, some things that are easily expressed in a table are difficult to express succinctly in MISL. What is required is a facility for 'projecting' ('casting') one data structure (a list) over another so that corresponding values are associated. For example if a list of three tempering temperature name/value pairs is cast over a list of three values of tempering time, then the resulting table can represent the heat treatment independent variables. Each position in Table 4.1 records a value of the hardness.

Table 4.1 Data for material A

y = HARDNESS t=0.3hr t=0.5hr t=0.7hr
x = 300K 150 100 45
x = 335K null 120 35
x=400K 145 null null
x=420K 53 null null
x=440K null 47 null

The casting operation is a cross-product, similar to the 'outer join' of relational algebra [Dat90, vol.2]. The same data in an extended version of MISL would look like the example below.

(Material = 'A'),
(Time = (0.3, 0.5, 0.7)),
(Temperature = (300,335)),
(Hardness = ((CAST Time -> Temperature((150,100,45),(NULL,120,35))), (145, (Time=0.3),(Temperature=400),(53, (Time=0.3),(Temperature=420), (145, (Time=0.5),(Temperature=440))

For completely general associativity the format should permit reference names for casts too, Named casts, so that the same table layout could then be used again for another material's data.

The named casts method of expressing arbitrary associations is powerful but hard to use with even ordinary complex relationships. It is capable, but ultimately limiting because it is too hard to use in practice, even though software would have no trouble interpreting it.

STEP and Express

The STEP standard is written in a formal 'informational modelling' language Express (see Glossary). The Express language is not an interchange format, it is just a way of specifying the logical structures which can be interchanged.

The use of Express enables standards designers to perform data modelling; to separate abstract and concrete ideas, to describe items of interest, to define constraints and to define operations which may be performed on these items. Express defines a hierarchical view of data but uses a rich variety of basic types, including strings and lists of arbitrary length and sets of any size. Also any entity can be used as a type, as shown in the example.

ENTITY Point_3D;
X, Y, Z: REAL;
END_ENTITY;
ENTITY Line;
p0, p1: Point_3D;
END_ENTITY;
ENTITY Triangle;
q0, q1, q2: Line;
WHERE
q1.p0 = q3.p1;
q2.p0 = q1.p1;
q3.p0 = q2.p1;
END_ENTITY;

This shows the use of embedded rules (the WHERE clauses) to ensure consistency and integrity of data stored in the database schema described by this Express definition.

The next example shows the hierarchical (inheritance) nature of Express through its supertype mechanism which here defines a material as either composite or homogeneous. Both of these inherit the references to the material_id and the set of material properties.

ENTITY material
SUPERTYPE OF (composite XOR homogeneous);
material_id : STRING;
properties : SET[1:#] OF material_property;
END_ENTITY;

ENTITY composite
SUBTYPE OF (material);
number_plies : INTEGER;
ply_type : STRING;
layup : LIST[1:#] OF REAL;
WHERE IF (number_plies <=0) VIOLATION;
END_ENTITY;

STEP separates the logical and byte level parts of the standard: all logical relationships are defined in Express whose translation into the physical file format (bytes on a magnetic tape) is defined only once in the standard. This is the same sort of layered approach as used by OSI: by defining interfaces correctly it is possible to reuse software for a number of different purposes [Tho89].

The definition of names in STEP is unfortunately only implicit in the Express definitions. The meaning of the name of each entity or attribute is strongly affected by how it is used and referred to by other Express statements, such as the definition of 'composite' in the example above. Thus the meaning of the names is defined entirely in terms of their associations with other names and also by brief comments in the standard on their intended use. This naming scheme is a severe limitation since it means that STEP can only be used to transmit types of materials data that have been completely data modelled by the standardization teams.

The STEP project has identified the need to represent data fields which are not defined in the standard glossary, and has proposed a solution: a 'preamble' written in Express that precedes the data in any transmission and which describes the format of the numeric data that follows. The idea is to be able to transfer in a STEP standard manner new (non-STEP) entities, attributes and relations, with reference to STEP defined counterparts, without violating the integrity of the data.

EDIF

The EDIF standard for the exchange of data in electronic engineering was proposed and developed by an independent group of American semiconductor companies [Hil86]. EDIF's great advantage is that it has an extremely simple syntax and logical structure based purely on lists of lists (modelled on the programming language LISP). This makes it easy for companies to create interfaces to it and has greatly influenced its practical success.

Since it is based on a 'list of lists' approach, its most natural use is for describing pure tree structures, but subtrees can be named and referenced so its associativity is of the lattice type.

Conclusions on item based formats

Since item based formats start by associating one name with one value and then extend the concept to allow lists of values, they are well suited to situations where the data consist of a few, individual values and relatively simple functional dependencies. The tree-like structure, especially if extended using cross-references to subtrees, is perfectly adequate for many types of data.

The named casts method for expressing arbitrary associations is powerful but hard to use with even ordinary complex relationships. Software would have no trouble interpreting named references or named casts in a data message, but translating from a database into an message format that used them could be complex and software to do it would probably have to be written specially for each database installation.

Item formats can do a simple job well and can be extended to more complex tasks, but every level of capability adds increasing complexity to such a degree, in both concepts and extra syntax, that it eventually becomes easier to use a tabular format.

Tabular (relational) formats have the advantages of a sound mathematical basis and a decade or more of experience with relational databases. Thus unless the designer of an interchange format is prepared at the outset to handle the naming and casting issues, it is recommended that all item based formats be avoided unless their requirements are (and will remain) extremely modest.

The exception to the above rules is the item-based STEP family of standards where most of the complexity will have already been dealt with by the standardization committee's hard work. The disadvantage is that the information that can be transferred is limited to what has already been standardized (unless the user is an expert in Express and can write appropriate preambles and application protocols). For very complex functional dependencies some kind of formal data modelling language with associated diagraming notation becomes a necessity.

Table based formats

Whereas item based formats are concerned with relating individual items of data to each other and to metadata and later extend the concepts to lists of items, table based formats take as a starting point the relationship between sets of data. They are therefore obviously suited to bulk data transfer where a large number of values have the same structure of associations.

A relational database can always be thought of as a set of tables with labelled column headings and unlabelled rows together with a set of integrity constraints. If multiple tables form a single data communication then the complexity of associations that can be expressed is the same as for relational databases.

dBase

The dBase database file format (the '.dbf' file) is simply a means of encoding a single table as a single file of bytes, together with a large number of implementation related detailed restrictions (see [Sar89c]). The format was originally defined for files on MS-DOS personal computers but is equally valid for any computer which regards files as a simple sequence of bytes. A set of dBase files in which common fieldnames make cross-references can contain data with any degree of associativity expressible in the relational algebra. The dBase language however does not provide integrity and consistency support, and functional dependency can only be expressed through hand built catalogue files.

Most important for materials data, dBase does not permit any kind of NULL value for numeric fields. Most significant for usability is that the format is in binary. It cannot therefore be read by people and requires software at each end of the transmission for any kind of communication whatsoever.

SAE aerospace standard 4159

This draft standard has been submitted to ANSI and proposes a standard way of encoding a single table of materials data [SAE88]. It imposes the requirement that the table be almost completely unnormalized, a 'flat-file' (in 'First Normal Form' [Dat90]). The standard defines a new type of datafile, a 'table file' which contains a single table of numerical data (so in associativity capability it resembles a single dBase file) with initial label fields in accordance with MIL-STD-1840A. This military standard for data interchange merely requires certain formatting restrictions and additional labels which describe the source and destination of the data [DD88a].

There is no defined method for associating several tables. It also defines syntax for the addition of footnotes which means that it is really only suitable for information oriented directly at human beings who can read and interpret the footnotes.

CTDIF

The Cambridge Tabular Data Interchange Format CTDIF has been designed specifically as a table based format for materials data transfer using the relational model of data representation and associativity which can be directly read by people without software support [Sar89c]. The format has two forms, the first is a single 'flat-file' table format and the second a 'multi-tabular' format which describes how to associate a number of the single tables. The collection of a number of tables gives the same capability for expressing associativities as the relational database.

Two levels are also defined. In the lower level the single tables are each completely interconvertible with a dBase format file and thus inherit the same limitations. The 'extended' level is the same except that the restrictions imposed by dBase compatibility are removed.

CTDIF-1 0.1
implementation 'PMS dBase Converter v0.1 21-July-1989'
name NIMONICB updated 89/7/21
fieldlist 'sample_no' weight length strength_MPa elongation_to_fracture endfields
#1-fred 3 5.0e-4 200.3 0.23
#2BA 3.2 1e-3 205.2 0.235
'#3Z ++' 3.333 1e-3 205.3 0.236
FIDTC-1

An important aspect of this format is that all values are either numerics or arbitrary length strings which simplifies both the syntax and all the software that processes it considerably. A translator has been written which converts dBase files into the flat-file CTDIF format and a translator for the reverse is under development. All varieties of the format are written in plain text and so can be edited with a word processor.

SQL

Structured Query Language is an ISO standard for communications between a user or some software and a relational database. SQL is a data sublanguage designed specifically to define (create), to manipulate (update, delete) and to query (interrogate) multiuser, concurrent access relational databases. It is intended to be generated by software, not written directly by human users, although it can be if necessary. SQL is based on the relational algebra, thus SQL has the same capability in describing the associativities between data sets as a relational database.

SQL is not a data interchange language in that while it defines precisely how a query can be used to refer to a subset of data in a database, it does not define in what form this data is then transferred to the user [Dat90].

SQL is defined in ISO 9075:1987 for two levels of implementation. Level 1 does not support NULL values for any field in the database, a vital requirement for most materials data. SQL does not define at all what happens when errors occur or how error messages should be phrased or handled; these will be added in later versions of the standard.

Conclusions on format types

It has been shown that it is easier to demonstrate the expressiveness of tabular formats compared with the expressiveness of item based formats, but that an item based format with the right kind of formally specified grammar can be capable of all required associativities.

Both types of format can theoretically handle the associativity required for materials data (though individual formats may have limitations) and it is shown that they are complementary: tabular formats are more natural to use for bulk data transfer and complex associations, item formats are easier for point data and simple associations. Only the most complex of the existing proposals for item based formats are able to duplicate the expressive power of the simplest multi-tabular format, but for problems that require such expressive power additional systems analysis techniques are required.

History of data interchange

Successive workshops and symposia have identified data interchange as a critical need, but it was at the VAMAS workshop in Petten in November 1988 that several implications for materials data interchange resulting from the activities of materials database users became apparent. There were three main results [Krö88]:

  1. The needs for data interchange were articulated
  2. Self education was acknowledged as a necessary precursor to intelligent discussion of data interchange formats, both technically and with respect to other relevant organizations
  3. It appeared that two or three formats was a feasible goal to aim for.

A meeting was organized in September 1989 at Rolls Royce plc in Derby to identify a workable format. It recommended that the Cambridge tabular format (CTDIF) be used for a trial experiment and suggested further study of STEP/Express. At the second international symposium on the computerization of materials property data in Orlando, Florida, in November 1989, representatives of ASTM, commercial companies and several attendees at the Petten and Derby workshops met and agreed a timetable for a round robin test of data interchange using CTDIF. During 1990 the round robin managed to attract only poor response from its participants and it was suspended indefinitely at a poorly attended ASTM committee meeting in June, much to the disappointment of European participants.

Several important lessons were learned from the experiment, nearly all of which were independent of the format that was notionally the subject of the test. There were problems of communication within individual participating organizations as to the precise purpose of the test. Commercial organizations, particularly the small companies which typify those involved with materials databases, find it very hard to provide data modelling and software support for a project with no immediate profitable outcome. The round robin was considered too 'applied' for research funding, but significant research was nevertheless required since the different data schemata of the communicating databases presented a significant and intractable problem.

A new proposal for a data interchange experiment has recently been proposed to VAMAS which will involve the 1991 developments in the materials module of STEP.

CALS and SGML

Computer Aided Acquisition and Logistic Support (CALS) is an American Department of Defense and industry programme to aid the integration of technical information for technical system design, manufacturing, purchasing and maintenance. The purpose is to improve productivity, quality and military readiness [DD88a].

The CALS standards support the dissemination of all types of manuals, instructional material, and national and international standards in computer readable form. Although the purpose of the programme is for weapon systems, the CALS standards are also directly applicable to non-weapons commercial engineering products where 'response to the market' substitutes for military preparedness. The CALS standards themselves are unclassified and publicly available world-wide [Sta89].

CALS uses Structured Generalized Markup Language (SGML, see Glossary) as a 'glue' to associate files of data in several different formats: text (also in SGML), line illustrations (computer graphics metafile), bitmap graphics (tiled-raster), engineering drawings (IGES) and numerical data (MIL-STD-1840A [DD88b]). These are ISO or ANSI standards but CALS also restricts the user to using a particular subset of each. The list will be extended to include STEP as well as IGES, and also X.12 or possibly EDIFACT (see below) [DD88a]. Using SGML itself to encode materials data is theoretically possible (since SGML has general computing capability) but it does not provide the necessary abstractions directly and would be hard to use.

CALS is important because it is the method by which a very large procurement organization is beginning to insist that technical information, including materials data, is communicated in standard, software processable formats [Sta89]. CALS has also been enthusiastically received by the non-defense related industrial data management community.

Techno-economic materials data

The acronym EDI (Electronic Data Interchange) is commonly used to describe the interchange of techno-economic and business data in computer readable form, e.g. invoices, delivery notes, customs and other official documents. These are often transmitted by data communications links over the public telecommunications network, but EDI data can also be transferred by physical delivery of magnetic tape or disc.

There is significant penetration of EDI into several vertical markets in the USA and UK, less in other countries. In the USA 5000 companies are regularly using EDI and the UK has 1000 users, more than any other European country. The industry programmes are Odette (automotive), EDIFICE (electronics), DISH/SHIPNET (shipping), EDICON (construction) and CEFIC/EDI (chemicals).

X.12 is the ANSI standard for EDI, it is similar to EDIFACT and ANSI is encouraging users to migrate to EDIFACT. EDIFACT and X.12 are inappropriate for general materials data transfer because they require every facet of information to be predefined in detail for each specific type of message.

The case which needs to be considered is the shipping documentation and delivery notes attached to a shipment of an engineering material where the measured properties of the material are part of the documentation, e.g. actual chromium and nickel contents, and measured strain to failure of a batch of stainless steel. This 'delivery data' in EDI formats could be a valuable resource for the corporate technical databases, a resource which is currently completely unused. Translation would only be achievable through carefully coordinated reference vocabulary and data dictionary work.

Open distributed processing

Data interchange by floppy disc or electronic mail is only the beginning. The Open System Interconnection (OSI) protocols are already planned to be supplemented by Open Distributed Processing (ODP) where not just data, but also computation is communicated and distributed across networks of computers [Tho89]. All data transfer so far discussed has been passive in that it is designed for transmission without explicit acknowledgement. Future systems will require dynamic data interchange whereby the meaning of communicated information can be elaborated or confirmed by means of two-way conversation.

Up until now we have only concerned with the transmission of data and information, we have not addressed any issues of distributed systems where the processing of the information also changes location. Architectures and means for defining the necessary authority, responsibility, reliability and security for distributed processing are now under development [Apm89].

Distributed systems will need to communicate concepts at a higher level than bits and bytes. An industry consortium, the Object Management Group (OMG), has already been formed to produce schemes for naming and managing such messages

Materials data index interchange

These new technologies will be useful for dynamic materials data interchange and vital to materials index interchange. The notion of setting up software conversations in which information can be traded about which databases contain the materials data relevant to a query falls firmly in the province of ODP.

If an information market in materials properties is to be achieved, then in addition to mechanisms for data interchange, a method is needed for querying distant databases to find whether they contain data of interest. It is the responses to such queries that need to be in a standard form. The query would ask a database to send its catalogue and indexes and an index interchange format is required for the response. Such a capability would be of use even in the absence of data interchange since it would enable a simple gateway computer to provide a uniform interface to a user from a set of disparate databases.

The exchange of indexes in a machine-interpretable form is quite a different problem from interchange of actual data although many of the component problems are similar. The transmission of a data catalogue requires an almost identical capability.


Previous | Contents | Next