Chapter 7
Knowledge-based systems
Introduction
Given the complexity and difficulty of representing and accessing materials property information, as discussed in earlier chapters, an obvious conclusion would be that knowledge-based systems could provide solutions. Such a conclusion would be premature.
There have been many attempts at constructing materials knowledge-based systems, a few of which have found a successful use. The reasons for success and failure can be found by examining the precise type of materials information which the systems have attempted to encapsulate. Exactly the same issues as affect databases apply: issues of data quality, of stable versus capricious properties, of the predictability of property software models, of deep physical models versus superficial summary models, of numeric versus symbolic information, of materials designation via applicable properties and of the balance between material state and material process.
As we have seen throughout this book, non-trivial use of materials information requires sophisticated clear-thinking and only sometimes can simple procedures be useful.
Useful systems
In practice most useful knowledge-based systems consist mostly of conventional software; typically the knowledge and knowledge manipulation system comprise only a tenth of the system. The user-interface, 'help' and tutorial sub-systems, file handling, graphics and database (if present) dominate the designing and programming effort. Conversely, software techniques originally developed for artificial intelligence projects, such as abstraction, inheritance, delegation, tree-navigation etc., are now routinely used in many entirely conventional software products such as database management and version control systems.
Thus distinguishing between knowledge-based and data-based materials information systems is often both difficult and pointless.
Maintenance and explanation
Two common myths concerning rule-based expert systems common in the mid-1980s were that they can explain their reasoning by listing the relevant rules, and that the knowledge they contain is easily extensible, since more rules can be added without requiring existing ones to be modified [Sar85].
Both myths are false for anything other than trivial problems, and for the same reason. Nearly all areas of knowledge of the physical world require that many rules represent both factual knowledge and control information: how next to proceed with the solution to the users' problems. These control aspects mean that a simple list of the rules that have been followed to reach a conclusion is only understandable by the original programmer, not by the casual user. It also means that rules are interwoven in an implicit causal network and that new rules can be added only when the causality relationships are also updated.
Control information is especially important when the portion of the real world modelled by the knowledge-base is a process rather than a system of static states. For this reason rule-based control systems and formulations of simulations that evolve with time are notoriously complex.
Uncertainty
One area where knowledge-based systems can claim to offer unique capabilities is in their handling of uncertainty: probability measures using Bayesian updating, certainty factors, fuzzy sets, and evidence rules have all been presented as potentially useful techniques. When examined more closely, however, there is no reason to suppose that any of these techniques could not also be provided in sufficiently modular conventional software. Certainty factors are probably the most easy to use from the knowledge-engineer's point of view, but not everyone will have the same opinion of what values are appropriate for each fact or rule. In this they strongly resemble the weights in multi-attribute value functions, which in a sense is what they are.
Strict probability is of course already handled perfectly well by statistical software. The other techniques are not only highly approximate, mostly providing methods for stating degrees of ignorance about probabilities, but also not supported by sound mathematical foundations. Systems built using them can produce uncertainty estimates which do not logically follow from the premises used as input. (The Dempster-Schafer evidence rules are perhaps an exception since they treat uncertainty through the use of upper and lower bounds on strict probabilities [Wu91, Hop91].)
The network of uncertainty relationships between rules is similar to that of control in its implications for maintenance: adding new rules frequently upsets uncertainty estimates. This is because systems without a sound mathematical basis can produce different answers when a single situation is broken up into rules in two different ways. This is irrational.
Models: types of expert system
All materials information systems model some aspects of the real world. Even simple databases represent materials and properties as being exemplars of particular groups of specimens and test measurements thus implicitly classify the real world. Other equally simple databases may use not just different classifications, but different classifying principles, e.g. by material composition or by usual method of manufacture.
Knowledge-based systems are perhaps best typified by the type of information they model [Har87]. The most successful systems are those where only a superficial, symbolic model is attempted but where there is a great quantity of information, intricately connected, at that superficial level. Corrosion expert systems are a good example.
A second classic type is the selection system over a closed set of options: selection of adhesives or paints from a fixed set of alternatives for example. These systems are very widely used for selection of anything, not just materials. It is assumption of a closed world with a finite set of relevant properties that makes it possible to treat materials in the same way as, say, nuts and bolts, but it is also these systems' critical defect. They are really the same as diagnosis systems and can be programmed using a wide variety of techniques, some particularly innovative and fast [Lee89].
The third major type of materials-oriented knowledge-based system is that of process modelling. Where they have worked, these systems have been highly commercially successful. They are used in either diagnosing faults, e.g. in rolling mills and blast furnaces, or advising on the real-time control of (slow moving) industrial processes, such as cement kilns, paper making or glass making. Implicitly at least, these always contain a model of the manufacturing process including materials processing aspects. It is usually at a highly superficial level since that is usually all that is required since, as discussed earlier, competitive processes proceed at a carefully balanced level of ignorance. Often the expert system is used as a framework in which some modules are detailed numerical models of specific materials operations, such as cooling and viscosity calculations, and where part of the knowledge in the system is how to run such models effectively.
A last variety is the deep causal model where there are many intermediate levels of knowledge brought in to play between the input information and the conclusions. The depth could also be described as the extent to which the system explicitly represents its several reasoning procedures as being appropriate for different varieties of problem.
The types of knowledge-based system described above are not exclusive; a framework can include modules of all types, invisible to the user, and a process modelling system could be superficial or deep.
Assessing the practical impact of expert systems, even successful ones, is often problematic. Usually an expert system is the first application of any software technique to an organizational problem and most of the benefit ensues from the complete systems analysis that precedes the actual system development. Also, it is often the case that other changes to the system are made at the same time that the expert system is introduced, for example the highly successful LINKman cement kiln controller was introduced at the same time as new sensor system which for the first time enabled an estimate to be made of the kiln's peak temperature. The lesson is that materials engineering problems should be addressed with an open mind as to their solution. If the best approach is hardware or paper-based rather than software, then pushing a knowledge-based approach is bound to fail eventually.
Fragility
Simple expert systems are fragile when dealing with problems at the edges of their expertise for a fundamental reason: such systems often have the 'absent equals false' assumption built in to their inference systems. Taking the step to more sophisticated systems and away from 'complete' problems requires moving from normal logic to three-valued logic: true, false and maybe, and the manipulation of 'maybes' has the same problems earlier discussed in connection with NULL values in databases.
This fragility with respect to absent knowledge is why simple systems built using commercial expert system shells usually succeed with small, complete problem domains but can fail disastrously when attempts are made to scale them up for real problems. Most materials problems, as we have seen, are not complete in this sense but open-ended.
Tutorial systems
The most interesting systems, and those with the longest life ahead of them, are those that encapsulate significant quantities of basic scientific knowledge since that will not become outdated. Such systems almost inevitably assume a tutorial role in educating their users, almost irrespective of the type of problem they ostensibly solve: diagnosis, control advice or data analysis.
Knowledge acquisition is now known to be arguably the most serious bottle-neck in the development of knowledge-based systems and this is especially serious for tutorial systems since so much information has to be assimilated. The extensive corrosion system achilles takes an uncommon approach which deserves wider recognition. For each corrosion topic an expert is selected who prepares a written digest in English. This is circulated to a team of assessors for editorial review and comment, and when all are agreed that it represents the required information at the right level of detail, it is encapsulated as rules and control information. The English text is also permanently available in the system for the user to refer to [Bal90].
The use of conventional textbook-type instructional material appears a promising route for expert systems attempting to cover materials information, especially where it is complex, requiring many layers of representation and detail, e.g. the corrosion system PRIME [Jov89]. Text, recast as a hypertext network and coupled to knowledge-based systems to help the user find instruction appropriate to particular problem areas, is not accorded the significance it deserves. Explicitly representing materials information relationships formally using catalogues and data dictionary systems is, as has earlier been demonstrated, time consuming and difficult. Future systems will require such representations but for many advice-giving expert systems it is not immediately necessary and can be avoided.
Often all that is required is to direct the user to the relevant information without having to represent that information explicitly in a formal structure. This is the traditional function of information retrieval from textual (bibliographic) databases. The convergence of semi-automated text indexing and semi-manual knowledge structuring techniques will produce tools appropriate for many materials information problems [Par89].
Big systems
The ALADIN aluminium alloy design assistant illustrates many important aspects of developing systems which represent deep physical reasoning about materials, physics, microstructures and properties. ALADIN, a technical success, was developed at Alcoa Technical Center (Pennsylvania) at a cost of probably over $1 million. It is currently unused.
The project never made the transition from the artificial intelligence department to the alloy design department since the latter could not afford the high maintenance costs associated with large knowledge-based systems. Such knowledge-bases require continual upgrading as new information becomes available, as new alloys are developed and as new knowledge about existing alloys is discovered. Also as any large system is used, unexpected interactions between items of information must be documented or changed. Alloy design is an important, expert task but too few people do it to support such heavy maintenance costs. It is such an expert task, and so few people do it, that a company can generally afford to hire the best people and these experts do not need as much automated support as would less-expert users. (This is not always true for expert systems: the PAL adhesives system sometimes surprises its creators with unexpected but good selections [Lee89].)
ALADIN was created with facts represented in the frame-based language CRL, rules in OPS5 and other functions in LISP because its complexity and depth required specific features not found in packages. This makes it difficult, but not impossible, to extract useful modules for use in other knowledge-based projects. Such things as a list-based representation of the periodic table, small databases of alloys, classification schemes of alloy classes and manufacturing processes etc. are embedded in ALADIN and one of the justifications of the project was that the cost of such knowledge acquisition and representation would not have to be repeated for other projects. Unfortunately since it was not really designed for such re-use and because of intense time pressure on the development programmers, the system was not built in properly distinct extractable modules. The representation schemes used were implicit in the CRL and not formally defined but even so some knowledge was re-used in the CORDIAL system [Har87].
The system probably attempted too much. Several small, separate software tools (some knowledge-based, some merely clerical) for manipulating databases of alloys, for running numerical predictions of specific alloying element interactions, or for managing default parameters based on alloy classification would probably have been adopted and used [Hay89].
A large, monolithic knowledge-based system unfortunately implies an all-or-nothing approach. This lesson of designing for integration and of using multiple software technologies is discussed further in Chapter 8 which discusses the need to integrate materials information tools with other software tools in design and manufacturing software environments.
Combined Knowledge/Databases
Many knowledge-based systems include databases, and some databases have knowledge-based user interfaces [Kai88, Par89, Bey91]. Neither approach represents a true merging of the technologies (see Figure 7.1).
Nearly all large, modern knowledge-based projects in engineering use a frame-based, databases:object-oriented; structure to hold unchanging knowledge together with a system of rules and inference/deduction engines which represent how to solve problems dealing with the frame-based knowledge. Some also use frames to store some strategic information representing how to go about applying the rule systems in particular circumstances [Hay89, Pay90].

Figure 7.1 Sub-databases used by knowledge systems and database intelligent interfaces
A true merging of knowledge and database technologies will occur when databases begin to cope with the rich, complex functional dependencies in frame-based knowledge stores. If these can be represented in standard ways, perhaps using an inheritance-extended version of IRDS, then they can be made independent of specific research projects (see Figure 7.2) and will also be able to be communicated between systems (assuming that appropriate concept relationship structures have previously been built). This will enable true materials information interchange [Bam89, Swi90, Sar90a-c] and only with that as a basis will deep causal models of materials processes become successful, transferable, commercial products of independent value (whether implemented using rule-based programming or numerical computation [Kel86]).
Re-usable concept hierarchies
Databases are easily developed as a common resource to be used by many different programs, but the construction of knowledge-bases (frame-bases) to be similarly used by several different types of program is currently a research issue. The best hope for re-use is that the knowledge of relationships between concepts as well as facts about concepts are explicitly represented in concept hierarchies in standard structures using such tools as CODE, CRL, a data thesaurus, or more implicitly in SPLINTER or Express. These would form the basis for independently developed information systems which nevertheless would share a common semantic heritage thus making later integration much easier.

Figure 7.2 Eventual explicit and independent information representation
Although frame-like expressions would be most appropriate for concept structures, the universal problems that always accompany data interchange would indicate that a tabular representation might be more convenient as a common medium of expression.
The future
Knowledge-based systems are not even close to being a mature technology. Their mixed history to date is only to be expected. While there are already many niches for small, simple rule-based/object-oriented systems, the true advantages will only appear when stable, large, semantically richly complex materials data systems become available. Only then will intelligent systems have something to be intelligent about.
Nearly all software products which access future information systems will have to be knowledgeable about what to expect in such systems. This is because the impossibility of any one organization representing all materials information means that multiple, diverse database systems will be the only useful source of information.
Despite their chequered past, knowledge-containing systems really are the wave of the future. Current knowledge-representation research is progressing faster than at any other time and even recent textbooks [Mey90, Pay90, Bey91] give an inadequate view of what is already available in many research laboratories.
Learning systems
Any complex computerized system which encapsulates and represents formally all important aspects of a problem is a candidate for the application of machine learning. Many successful learning algorithms exist but they depend crucially on all the relevant information being represented explicitly at appropriate levels of abstraction with the format of representations themselves available in a formal structure [Lai87, Car90].
The significance of representation as a general theme of materials information has already been covered at length. Data and index interchange rely on it and the next chapter will show its particular importance for CAD/CAM integration. All economic and technical driving forces are pushing the development of machine-interpretable formal representations of how materials information is named, associated and structured. This work will have a fascinating side-effect in that it will enable real learning systems to function for the first time with materials problems.
Footnotes