Chapter 6
Materials information systems, especially those used in engineering design, are largely felt to be less useful than they could be. One major reason is that the quality of the systems, and more especially the quality of the data within them, is felt to be either low or indeterminate. What this really means is harder to determine.
Traditional ways of using materials information in design and analysis have not led to the concern that appears when computerized systems are considered. This is peculiar since it is often the same data in both cases. This survey takes the view that traditional methods inevitably involve extensive manual re-evaluation of the data and that simplistic computerized methods by speeding up the process reduce the degree of personal involvement and the 'feeling' for the validity of the data.
The quality of materials property data is intimately bound up with the use and perception of data by organizations in the real world.
Perceptions
It is largely the uncertainty in the perceived quality of data from modern information systems that restricts their use among practising engineers [Wes86, Rum87, Swi90]. This could be due to the delivery of data via 'information brokerage' companies whereby the end user of the data does not receive it directly from the organization that evaluated it [Bam89]. The middle organization adds value to the data by changing format, making several sets of data available together, increasing speed of access etc., and may add extra data consistency checks on the data before it is delivered. Nevertheless, in the users' view, the reputation of the data depends on the reputation of the evaluating organization.
Fit for use
Quality is often confused with precision, accuracy and even luxury but here it is taken to mean 'fitness for use'. This is a little wider than 'fitness for purpose' since that is often interpreted to mean how well a product meets its design and performance specifications [Die83]. Wherever software systems are concerned, we now know that incomplete specification is the dominant difficulty [Win90] and many systems which meet specifications perfectly are in fact useless.
Precision and accuracy
Accuracy is often quoted as meaning how close a measurement is to the right answer, whereas precision means how close it is to the wrong answer. This is the same difference as exists between random error and systematic error. A precise numerical datum is one which has been measured to several significant figures, usually arrived at by a statistical analysis of many individual measurements. However a systematic error in all those measurements may mean that this precise datum is wrong. An accurate datum is one that is near the 'right' answer, irrespective of its precision. A common complaint of materials databases is that they appear to be more precise than they really are because they quote too many 'significant' figures for numerical data.
Measurements and properties
All materials property tests face the following problem: well-characterized laboratory tests based on the physics and materials science of the problem produce good data for comparing different materials but poor correlation with properties of end-products in service. Conversely service-simulation tests, which test complex and ill-defined combinations of properties, produce good correlation with end-use conditions but are highly specific to a particular application and so are performed for few materials [PRI89]. The data quality for these two extremes is different in type. High quality data should also be concise and compatible with design procedures. This demonstrates that any linear, one-dimensional system of quality levels is inadequate because quality has at least three dimensions: appropriateness, accuracy and compatibility with design.
It must be remembered that the idea that materials have independently identifiable properties is a convenient fiction (as discussed in the chapter on representing materials information). In practice there are only specimens and measurements. Any technique of assigning data quality which is not fundamentally based on this realization will be of limited practical benefit.
Statistical data
Statistical measures for data quality are necessary, but agreed reference points of statistical accuracy are not common. The American military handbook and British Engineering Sciences Data Unit (ESDU) organizations define two categories: category A indicates that there is a 95% probability that 99% of measurements exceed a particular value and B indicates a 95% probability that 90% of measurements exceed a particular value.
The importance of statistical measures can be judged from their use in MIL-HDBK-17A where all the data is presented with statistical analyses. This handbook specifies minimum numbers of specimens, how to deal with outliers and preferences between Weibull, normal and log-normal probability distributions and non-parametric methods for evaluating batch to batch variability [New87].
A number of different measures have been proposed as a uniform way to specify both the validity of experimental measurements and the confidence of certain evaluation procedures [Hop91]. The 'reliability' or 'R' parameter is a value between 1.0 and an arbitrarily large number: the larger the number the less the reliability [Sar87]. For example, if an elastic modulus value is 100 GPa with an R value of 1.12, then this is interpreted to mean that values very probably fall between 100/1.12 GPa and 100*1.12 GPa, i.e. between 92 and 120 GPa, or -8% to +12%. For statistical data 'very probably' is interpreted to mean within two standard errors in the mean, corresponding to 96% of measurements.
The multiplicator-divisor behaviour is a good representation, but the 'R' method is realistically no better than many ad hoc procedures such as the 'certainty factors' used in some expert systems. 'Support logic' is a modern and mathematically better-founded system which may be useful within evaluation systems but is too complicated for user interaction and everyday use [Wu91].
Perceived and consistent quality
No designer or analyst will use a database only once except for the very lowest quality data. Users form their own views of the usefulness of any data collection in terms of how they have used data from it in the past and in what ways their experiences were successful. Thus it can be seen that consistent quality in a data collection is more important than high quality as such. Consistency means that a user can assess a collection's quality in his or her own terms based on some of the data, and then use that assessment confidently when accessing new data from the collection.
Users have high confidence in using consistent databases and this is why specific databases, such as those purely for thermal properties or purely for thermoplastics, find such favour. It is much easier to ensure consistent quality in a tightly focused data collection. In the user's mind each database assumes its own 'personality' with particular strengths and weaknesses. If the data is evaluated by an identifiable person or organization then the user already has a strong guide to the personality of the data collection, which is why the 'grandfathered' database approach works well and why merging large numbers of different databases into a single, undifferentiated collection is looked upon with suspicion.
Design and numerical analysis
When used as input parameters to numerical analysis (for example when creep data is used to predict component distortion in service by non-linear finite element methods), the usual means of ensuring a valid analysis is to use upper and lower bounds. Thus the analysis is made twice, using as bounds parameter values for which it is 'inconceivable' that the true values could be outside. However this approach only really works for single parameters such as elastic modulus or density.
Where a number of parameters is required to predict a single property, as is the case for creep, it is not clear what combinations of values form valid bounds. Even when valid combinations can be derived, taking extreme values for each parameter does not take into account the covariance that usually exists between them, and so predictions become unduly conservative. The implications for data quality are that evaluated materials parameters, intended to be used in the analysis of proposed designs, should include not just range and bounds attributes, but also valid covariance information.
The same situation arises when single parameters are used for a number of components of a mechanism which is analysed as a whole. Summing the extreme values is unrealistic, the combinatorial problems become immense and automatic constraint satisfaction is the only approach with long-term prospects of success.
It is rare for supplied data to be immediately suitable for numerical analysis of a component. It is almost inevitable that the analyst will have to perform some data evaluation before analysis can proceed and this is currently an important area of expertise that the analyst has to acquire. Unfortunately the methods of these evaluations are poorly standardized so quality can be compromised at this last hurdle that materials information has to pass before it is finally used [Nis90].
New materials
New materials where the characterization, processing and testing are under development and interwoven present difficulties which are are only resolvable by considerable effort and the passage of sufficient time for new materials to be produced with consistent (if not understood) properties and characterization. Even established materials actually change slowly with time but 'advanced' materials change faster than the organizational systems for data evaluation can currently cope with. Future software assistance for (at least) the clerical tasks necessary for data evaluation will make the response time better, but it will always lag somewhere behind new material development.
Automated consistency-checking
The use of automated consistency-checking software is not widespread and it is not mentioned in current guide-lines [CEC86]. This is partly because this is an area where database maintainers see critical commercial advantages. Checking for typographic errors in numerical data is inordinately time-consuming and expensive.
There are two types of rules that are used, simple bounding values and physics-based interdependences. The first type is typified by checking assertions of the form 'all density values are positive and between 10 and 23000 kg/m^3', or 'all Young's moduli are between 1 MPa and 1000 GPa'.
The second variety are based on interdependences between different properties, such as the correlations between melting point and elastic modulus which arise because both depend on the strength of atomic bonds. This latter type can be highly specific if account is taken of the classification of a material by its 'isomechanical group' [Fro82, Ash89b]. There are many of these rules with numerous interactions which can, if not used carefully, lead to circular reasoning, so devising useful automatic checkers is not straightforward [Sar87]. Such tools are particularly useful where users can add their own data to a database which is then used as input to a modelling program, since inconsistent input can otherwise cause the software to crash .
When materials property data is exchanged or sold, this can only be achieved if there is agreement on the value of the data. The quality of the information is an essential factor in calculating the value and equally there has to be agreement on the process which assesses the quality [Bam89].
Precision and range
There is a technical problem with transferring information about precision between databases, just part of the wider problem of transferring information describing the data itself (i.e. transferring 'metadata' [Sar89c, Sar90c]). Range is a related problem which defines the largest and smallest numbers, both positive and negative, that can be communicated.
Several materials properties are conventionally measured in units which give rise to very large or very small numbers and some have a large spread between materials whatever units they are measured in. Viscosity can vary by a factor of 10^16 for the same material over a narrow temperature range and the electrical conductivities of porcelain and copper differ by an even greater factor. Range problems can be exacerbated if data is transformed into a different set of units for user convenience.
One practical aid to most range problems is to employ a convention where certain properties are stored as base-10 logarithms of their true value. This works well for properties which can only take positive values, such as conductivity, but negative property values require an extra variable to record their sign, in addition to the log. of the absolute value.
A data interchange format is not necessarily limited in the precision it can convey but in practice many of them are unintentionally so limited [Sar89a, Sar89b, Sar90a]. Commercial database systems (all versions of dBase™, for example) often only support in their data files a 10-byte Binary Coded Decimal (BCD) format which gives 18 digit accuracy for positive and negative numbers between 1.0 x 10^-63 and 1.0 x 10^63.
A common problem is that many database systems save storage space by storing numbers as integer values. These integers nearly always have an implicit range from -32768 to 32767. Integers can be quite adequate to represent yield stresses of materials if units of MPa are used but not if psi are required.
In a database or an exchange format it is not sufficient to specify only the precision of individual measurements or the precision of sets of measurements, it is also necessary to specify the complete range that measurements of a particular type are likely to have, and the precision that is required to represent significant differences between materials. Anything less sophisticated compromises the quality of all materials information systems which may use that database or interchange format as a component.
Concerted attempts to improve the state of materials information systems began in the early 1980s. Since then more than a dozen international workshops and symposia have met and published their findings, and several other institutions have issued reports. Two of the major organizations active in materials information systems have activities in data quality. ASTM Committee E-49 has a subcommittee E-49.05 specifically targeted at data quality, and BSI Panel AMT/4-6 (see Glossary) has a sub-project on quality standards for materials data.
Fairfield Glade and Schluchsee
In the report of the NBS-organized meeting at Fairfield Glade [Wes82] it was (implicitly) accepted that there were differences between computerized and paper-based systems in that computer systems tended to give the user only precisely what was asked for. Therefore it was considered important to the quality of the information supplied to the user that the computer system should present the data evaluation methods used, and requirements and warnings on the use of a material, even when this information was not requested.
The CODATA workshop at Schluchsee in 1985 marked a stage where the statement of the problems of data quality had become largely complete and where practical approaches could be suggested. The discussions of greatest relevance to data quality were possibly those which elucidated the 'data chain', the sequential relationships and distinctions between raw, validated, evaluated and standard data. Engineering materials property data flows through several processes and can be characterized as being of one of seven types [Wes86, Kau87, Krö87]: raw data, validated data, evaluated data, standard (certified) data, catalogue data, aggregated data, and application data,
These categories are poorly distinguished and for many applications are probably not entirely distinct: for example, the process of evaluation usually involves some aggregation. They do not form a simple linear sequence from 'bad' to 'good' as Figure 6. 1 demonstrates.
Validation is usually taken to mean the preliminary processing of raw data to remove bad data caused by machine failure or calibration errors and to ensure that the test met the conditions specified. The process of evaluation (data reduction or data analysis), is taken to mean the critical assessment of raw data by expert judgement, statistical analysis and comparison with the predictions of physically-based models and theories [Kau89]. It is considered possible that judicious evaluation can convert incomplete and inconsistent data into useful information [PRI89].
The Versailles Project on Advanced Materials and Standards (VAMAS) has a number of activities in the area of materials information [Rey87], and has recently completed a round-robin study of data evaluation methods in which sets of raw experimental creep and fatigue data were distributed to test laboratories for evaluation using several behaviour models and empirical curve fits [Nis90].
Figure 6.1 Types of materials information (after [Krö87])
The exercise demonstrated the variability of the materials parameters returned and showed the need for more detailed guide-lines and standards for data evaluation, even for laboratories with current quality accreditation and even where the measurements are made according to international standards. Parameters derived from curve fits were generally shown to have unreliable quality.
The process of evaluation is expensive and slow because it relies on a manual process performed by rare experts [Rum90]. The difficulty of consistently evaluating a broad range of data types implies that databases should be narrowly focused and that a wide range of information be made available only by networking such databases [Krö87, Sar90d].
Standard, certified data has been through a formal process whereby a group of experts evaluate it with respect to certain specific uses. Catalogue data is that produced by companies as an aid to sales and customer support. Aggregation may be performed to create a useful database of a number of materials. Application data is re-evaluated for specialized applications.
The real flow of data is much more complex than that shown in the figure, but it is still sufficient to make it clear that a simple linear sequence of quality codes is inadequate to represent important quality differences.
Proper quality assurance is indivisible from the establishment and maintenance of secure audit trails, and bibliographic information should therefore be available even in ostensibly purely numerical databases [Gra86].
The question of data integrity is also dependent on the security of underlying operating systems, database management systems and telecommunications systems, but that is of concern (in commercial databases) mostly for military security or commercial confidence.
MPD Network
The Material Properties Data Network is an organization which provides access to materials data whose quality as datasets is assured by the data supplier organizations and not by MPD Network [Kau89]. (This is termed 'grandfathered' data.) The following databases are currently available and many more will be added.
| ALFRAC | The Aluminum Association Inc.'s fracture toughness database. |
| STEELTUF | Impact energy and tensile properties data for pressure vessel steels. |
| MARTUF | Fracture toughness and welding of steels database. |
| MIL-HDBK-5 | Design property specifications for metallic materials for aerospace vehicles. |
| AAASD | The Aluminum Association Inc.'s handbook Aluminum Standards and Data. |
| PLASPEC | Bill Communications Inc.'s polymer manufacturers' data on properties, applications and prices. |
| IPS | Information Handling Services Inc.'s compendium of polymer manufacturers' data. |
| METALS DATA | The Institute of Metals (UK) and American Society of Metals' numeric database compiled from bibliographic sources. |
The quality of MPD Network's databases, and the quality of the data within them, display most of the aspects inherent in materials information quality in a condensed form.
ALFRAC, STEELTUF and MARTUF are collections of experimental data, evaluated by the organizations supplying the information. MIL-HDBK-5 and AAASD are handbook data compiled and edited by an organization different from that which performed the testing and initial evaluation. PLASPEC and IPS are compendiums of manufacturers' data so the reliability of different parts of those datasets will be different depending on the manufacturer. METALS DATAFILE contains data of every kind of quality since it is extracted from the entire technical literature and the original bibliographic reference must be checked for each datum.
CAD/CAM and polymer properties
Polymer design and processing information is arguably more integrated than is the case for metals and there has been serious effort devoted to evaluating the quality of design data and to specifying quality requirements [PRI89]. This could be due to a number of factors: the unfamiliarity of polymers to most engineers and their consequent reliance on computerized systems, the availability of good processing models for injection moulding, or the perceived reduced diversity in polymer information quality compared with that for metals.
Material identification and test conditions are more important for polymers, where for example the degree of orientation has a greater effect on properties than does texture in metals. Particular attention must be paid to recording thermal and rheological history these for test specimens [Dea90].
The CAMPUS consortium of manufacturers led by four German companies has established a standard set of polymer tests to aid comparison of different grades of polymers both within a company's product range and between companies. The problem was that well over 5000 grades of polymer were available, and data was available using hundreds of different test methods (more than 35 variants for impact toughness alone [PRI89]). This consortium now includes nearly all the major polymer manufacturers including those in the USA. In the first stage the standards were all for single-point data and were provided, on a floppy disc using an agreed common database system, free by each company as part of their marketing efforts.
Currently there is some disagreement within ISO between BSI (which has an existing standard BS 7008) and the latest CAMPUS proposals over developments of standard sets of multi-point data (where there is dependence on independent variables such as temperature or humidity).
Three levels of data quality were suggested in the mid-1980s: highly qualified data, qualified data, and limited use data [Wes89, Gra86]. The lowest of these represents the minimum requirements before data is permitted in a system, and the highest is a stringent list of requirements from 'the stand-point of maximum scientific and engineering integrity'. Each level includes all the requirements of its predecessor and thus the levels form a linear, sequential scale. Such scales were shown to be inadequate earlier in this chapter. The Dowty-Rotol company has suggested a longer list of levels (see Table 6.1).
Table 6.1 Dowty quality codes
| A | '95% probability that 99% of measurements exceed this value' |
| B | '95% probability that 90% of measurements exceed this value' |
| C | ESDU category bracketed values |
| D | Company design data |
| E | Interpolated data from categories A, B, C and D |
| F | Factored (evaluated) data from above sets using approved techniques |
| G | ESDU 'Grey' category |
| H | Handout from material supplier |
| I | Extrapolated data from categories A-H |
| K | Carried over from a 'similar' material |
| L | Preliminary data on new materials |
| M | Made up, guess-work |
| S | Specification data. |
Table 6.2 Rolls Royce quality codes
| 10 | Perfect data | This is accurate by definition, e.g. physical constants or unit conversion factors. |
| 9 | Best Practical | The best practical quality based directly on the best possible measurements. |
| 8 | Safety Critical | Where human life is at stake, this data would be sufficient without further prototype tests to overcheck the results. |
| 7 | Critical | Sufficient without further overcheck, unless human life is at risk. |
| 6 | Major Risk | Overchecks will be incorporated to back up the use of this data. |
| 5 | General Engineering | From a reputable source where analysis techniques are audited. |
| 4 | Reputable Source | From a reputable source, but analysis techniques are either unknown or dated. |
| 3 | Incomplete | From a reputable source, but incomplete. |
| 2 | General Knowledge | Based on general knowledge from the class of materials. |
| 1 | Estimate | Order of magnitude or source unknown. |
| 0 | Unknown | No guarantee, e.g. extrapolated beyond limits of theory. |
Dowty is an aerospace supplier and therefore it uses several datasets and design guide-lines produced by ESDU, a commercial organization which fulfils a role in the UK similar to, but in a much smaller way than, the USA Department of Defense's military handbooks office. The A, B and S grades are from ESDU and correspond exactly to MIL-HDBK grades [New87].
Figure 6.2 Rolls Royce quality and status
Note that the list in Table 6.1 again takes the simple view that data quality can be measured on a single linear scale, with the addition of a single special category for specification data. This absolute difference between design 'specification-type' data and 'quality assurance type' data, where the latter is not derived from experimental measurements, has been noted repeatedly [Swi89] .
Rolls Royce proposal
Aero-engine manufacturers have to design safety-critical systems and are also subject to independent audit control by the aviation authorities. All materials use has to be recorded and traceable for decades, even for obsolete materials.
Bamkin and Butler have shown that material property data quality cannot in practice be separated from the 'status' of the material within such organizations as both contribute to the 'value' of the information, where the quality is interpreted as the 'level of guarantee' of the accuracy [Bam89]. This relationship between quality and status is shown in Figures 6.2 and 6.3.
In design it is also often much more important that data is consistent rather than accurate when a company uses it for many different analyses or for different parts of a single product. This means that control of an 'agreed opinion' is sometimes what is required, rather than high quality.
Table 6.2 is presented as a linear list, but the higher codes are not super-sets of the lower codes and so they are not strict quality 'levels'. There are distinctions which could be drawn, for instance a slight degree of incompleteness might degrade a data set from Code 5 to Code 3, whereas it might still be better than Code 4 for all practical purposes.
Material categories
The overlapping classifications of quality and status produce a list of standard material categories shown in Table 6.3. Note that although this list is a linearly numbered sequence, the meaning of the categories is more complex and is represented by Figure 6.3. Although theoretically all possible combinations of quality and status could be considered to give rise to a material category, in practice certain categories make more sense than others.
Information class
In addition to quality and status, Bamkin and Butler identified that different items of information will have a different 'class' which reflects how widely that information is known, and whether knowledge of that particular information confers particular technical or commercial advantages on the organization [Bam89].
Table 6.3 Rolls Royce materials categories
| 10 | Selector | Preferred materials for current designs. |
| 20 | Present Selector Only | Currently preferred materials, but pending replacement when new materials become available. |
| 30 | Lower Risk Approved | Currently usable materials, but not cleared for high risk designs. |
| 40 | Proposed Selector | New materials expected to become available at a designated future date. |
| 50 | Not Selector | Materials which have been used for high risk designs, and may be used again only if a strong case can be made. |
| 60 | Lower Risk Not Approved | Materials which have been used for low risk designs, and may be used again only if a strong case can be made. |
| 70 | Not in Use by Company | Available materials which could be considered for approval. |
| 80 | Not Usable, Hazardous | Not usable because of health hazard or company policy. |
| 90 | Not in Use | Archive data on obsolete materials. |
Table 6.4 Materials' value class
| 1 | Strategic | Central to the company's business and not to be traded. |
| 2 | Original Research | Valuable information which could be traded or exchanged. |
| 3 | Public Domain | Market value only because of a particular method of presentation and not because of intrinsic value. |
The additional classifications in Table 6.4 are only useful if the effect of data quality on the value of data in an information market is being considered.
Figure 6.3 Rolls Royce data categories
Most of the efforts of ASTM Committee E.49 for the computerization of materials property data could be described as aimed towards increasing the quality of materials data and databases. Increasing the detail of both materials identification and test reporting is particularly important to increasing data quality, as a decade of study has shown [Kau89].
The stage is set by ASTM standard E1314 which defines a 'Standard Practice for Structuring Terminological Records Relating to Computerized Test Reporting and Designation Formats'. This is followed by the 'Standard Guide for the Development of Standard Data Records for Computerization of Material Property Data' (E1313) for describing test measurements, and there is planned a parallel 'Standard Generic Guideline for the Designation of Materials' for identifying the material tested.
Each of the general guide-lines is followed by standard guide-lines for successively more specific materials and tests [Kau89]. Of particular importance is the mandatory reporting of test parameters which determine whether the test was adequate, for example whether fatigue failure occurred in the guage length of a specimen or in the grips, or whether the measured thickness/length ratio of a specimen was within the valid range for a plane-strain fracture toughness determination.
It is widely agreed that if a completely standard material or a completely standard test is not appropriate or available, then an audit trail of information relating to the history of the material and the description of the test (the complete 'metadata') is required. It is also appreciated that this should not require every individual data item to be accompanied by vast quantities of metadata. This does, however, have a purely technical solution. Careful data modelling and the judicious use of data normalization and default value assignments are capable of removing redundant information completely. The difficulty is really one of designing a user-interface so that this information is retrieved when necessary but does not otherwise intrude [Lau90].
ASTM subcommittee E.49.05 has been working on producing standards aimed directly at defining and recording data and database quality explicitly. Here a proposal for explicit quality indicators is described. Work on defining quality indices for system management is described below.
The principle is that no explicit data quality level is allocated (unlike the Dowty and Rolls Royce proposals) but that sufficient information is given in a standard form for users to assess themselves what features are significant in determining whether the data will satisfy the user's needs. This is an important difference as in theory it means that this approach is more general than Rolls Royce's categorization scheme.
Any group of like-minded organizations could construct their own specific scheme based on the same universal basic indicators that all other industries use for their schemes. Thus individual sets of data could be transferred between quite different industries, retaining their basic indicators, but taking on the appropriate categorization in their new commercial environment.
Table 6.5 shows indicators for individual data values or sets and presents early ideas, not a finished standard [Kau87].
Table 6.5 Indicators for data
| 1 | Data Source | (unknown, primary producer, intermediate producer, government laboratory, technical committee, bibliographic source etc.) | |||
| 2 | Data Class | (specification minimum value, typical value, single value, statistically processed value, design data etc.) | |||
| 3 | Material Status | (production, research, obsolete) | |||
| 4 | Evaluating authority | (if evaluated) | |||
| 5 | Evaluating method | (if evaluated) | |||
| 6 | Certification authority | (if certified) | |||
Note that the concept of status is similar to that of Rolls Royce, but not as detailed, whereas the meaning of data class is quite different and corresponds to a more detailed level of definition. ASTM is also considering similar indicators for data collections (databases).
Table 6.6 Indicators for databases
| 1 | Material Identification Completeness | (little or none, form and condition only, full processing history) |
| 2 | Test Procedure Documentation | (not documented, non-standard test, standard test) |
| 3 | Currency | (not updated, updated infrequently, updated regularly) |
A practical study of how materials engineers used databases indicated a great number of quality failures in materials database systems and user interfaces, irrespective of the quality of the data itself [Amm89].
A truly high-quality information system should make it difficult for improper use to be made of the data it contains [PRI89]. User expectations of materials databases' sophistication, flexibility and ease of use have become more concrete over the last decade as examples of well-designed software become familiar from other fields (such as modern word-processing). These matters have long been of concern and their resolution will not be quick or easy.
Materials database management
As part of the European Community's five year Materials Demonstrator Programme, a Code of Practice for the participating databases became necessary. This has since been published as a CODATA guide and is currently being approved as ASTM standard E1407 [CEC86, Krö87, Swi89]. It covers the areas of operations management, data and data management, system capabilities and data security.
These guide-lines do not lay down specific procedures, but require that the database managers do have a 'regular set of procedures' to ensure consistent quality of operations and that the staff are appropriately qualified.
The guide-lines implicitly support the principle that data quality can only be assured if audit trails are established and if procedures are in place to ensure that they are used and that they are secure and tamper-proof. (Large volumes of retired audit data do not present any problem in terms of media cost; magnetic tape is cheap and ICI's new optical paper is even cheaper: the cost is more in the time taken to do archiving. Also the cost of a tape drive or WORM disc drive is not insignificant for personal computer systems.)
At the final workshop of the CEC demonstrator programme a number of suggestions were made towards the content of a further code of practice. These included recommendations that in addition to users and materials data re-sellers, database technologists and user interface interaction technologists should be requested to review and expand the scope of the code of practice and to produce a new code that could be upgraded in a systematic manner [Swi90].
Software quality
There is a vast technical literature on the difficulties of specifying, implementing and delivering software that is fit for use [Bro75, Bro87, Som89, LwO90, Schl90, Win90] . There is another vast commercial literature concerning specifying and implementing database-centred systems in organizations [Avi88, Dat90, Enc90]. The software industry unfortunately suffers from a great spread in standards of 'best practice', even within the same industry segment, and well-established and proven techniques such as code inspection are still not accepted by many programmers. This serious problem has been addressed by direct action by the UK government which now requires all software that it procures to be developed according to the Structured Systems Analysis and Design Method SSADM methodology (the latest version of which was developed according to ISO 9000 guidelines).
There are few ISO standards in this area, but a great number of American IEEE, ASTM, DoD and ANSI standards relate to assuring the quality of software, its accurate specification and documentation both for use and for maintenance. Recent lists are given by [Gla89] and [Sar91a].
In the UK the British Computing Society and Institution of Electrical and Electronic Engineers have a joint working party studying the use of software engineering tools and their relationship to achieving general engineering quality standards as embodied in BS 5750 (ISO 9000, and EN29000 'Quality Systems' [She90]).
In the long run, the best hope is that model procedures for database systems, calculation and analysis software, and user-interface design will be produced by the software engineering community to aid conformance to ISO 9000, and that further, more specific guide-lines will then be provided for materials database providers by organizations such as CODATA, BSI or ASTM.
The technical problems associated with producing quality materials property information systems are now largely understood and it is possible to begin to estimate the time and cost required to assure that new systems have some specific quality characteristics.
Quality in materials information systems can be assured in regard to
and these are not independent. The quality of individual data items depends on the quality of the collection of which it is a part, and the quality of a data collection depends on the quality of the individual items and on the organization which collected it. Individual data items have several different types of quality depending on whether the item is a standard, a specification or a measurement.
Work must proceed in producing guide-lines and standards in each of these areas, but each development must also be cognisant of the important interrelationships.
Materials information is required for a large variety of uses and these varieties of use require different kinds of data quality. It is to be hoped that a universal set of basic quality indicators can be agreed that can be variously assembled to form a number of industry-specific data quality frameworks within each of which the processes of evaluation and material categorization could be appropriately defined.
The most significant problem is the lack of appreciation by designing organizations of the true value and cost of high quality materials property information [Rum87]. This leads to a reluctance to pay realistic prices and the result that the burden is placed on public bodies, which are then given inadequate resources (and incentive) to produce quality systems.
The full benefit of computer-aided engineering CAD/CAM techniques relies on the availability of high quality and appropriate materials processing and property data. Making this known, together with the true worth of the information, must be a high priority for governmental bodies and design education.