Philip Sargent
16th October 1998
This note briefly records our conclusions from the Vienna OGC meeting.
It will be provisionally edited into a draft of Topic 5, with supporting information, proper references, reconciliation with current IETF work, and the nomenclature already established by the Catalog WG.
This note says nothing about update and maintenance of identifiers or scope resolving objects. That was essentially dealt with in the note "Feature Identifier Registries: Incremental Publishing" put on the Feature SIG email archive on October 1st, though it will need extensive re-editing to fit in Topic 5.
This not says nothing about access (see below), only identifier resolution.
This note says nothing about the reasons why we want permanent, immutable feature identifiers or why they need to be names not locations. See the other recent feature SIG documents for background.
The key idea that a Scope is a permanent object with an immutable name that allows it to be located indefinitely, which can answer queries about identifiers "in its scope": resolving them to more explicit identifiers.
Any scoped identifier (simply called an "identifier" in this note) is conceptually and perhaps actually constructed in two parts.
and there is always a third, invisible, part: the scope in which the whole thing, prefix and suffix, should be interpreted. This is because an identifier is always "inside" a scope.
"The Handle System®" from CNRI is an example of a 1-level scope system. An identifier looks like this:
hdl://cnri.test/abcd/efg/ijk#123
Where
The Handle System includes a large number of scopes, but all of them have " hdl://" as the first part of their prefix.
This example illustrates another useful but non-essential characteristic: the invisible scope in which the identifier should be resolved is indicated in a non-formal way by the initial part of the text string of the prefix. Any informed person reading " hdl://" has a pretty good idea that we are dealing with a variety of URL (a URI to be pedantic) and we need to find which protocol "hdl:" represents. This capability is most use at the "top" level when we have to decide which Well Known Scope we have to start with. Lower scopes could be pure binary for prefixes and suffixes.
An identifier is resolved by giving
it to the Scope object in which it is defined (the invisible
one), which (conceptually) strips off the prefix, finds the Scope
object that the prefix names, and hands the suffix to that Scope
object. When the process reaches a leaf identifier, the stack
unwinds and the leaf identifier is returned together with
whatever information is necessary for the original enquirer to
make use of it.
For software specification it is best to keep orthogonal functions specified entirely separately as distinct interfaces. Any implementation may choose to support one or several interfaces. We have three quite separate specifications to think about:
We understand a feature descriptor to be something which is resolved by a Catalog Service. A feature descriptor contains "sufficiently unique metadata".
This discussion of scoped identifiers covers what we previously called a feature handle.
Anything could offer an identifier resolution service within an "understood" scope, but we need to ensure that scoped-identifiers are only issued containing scope object prefixes for scope objects with certain minimum required qualities of permanence and immutability.
Catalog services are something which we could also cover by this same scoped identifier mechanism.
The same feature can be in several Scopes. There does not seem to be any way to avoid this since anyone can set up an identifier resolution service which then defines its own new scope.
Thus the same feature (a specific software representation of a real world object) can have several identifiers. It will have the same leaf identifier, but this could appear in published form hidden inside several different scoped identifiers via different scopes.
Thus we lose equality by value for published identifiers; however we can define an equality method on identifiers if the method resolves them down to the lowest common denominator scope. Because scopes form a DAG not a tree, and because identifier length does not tell you anything about how many prefixes may be hidden inside opaque suffixes, this means resolving all the way down to the leaf identifier in each case.
We need to make the following restriction:
This is tricky, since data repositories can always be copied. So we need to think about how to define scopes for copies, and to consider replication possibilities to see if we can relax this condition under certain carefully-controlled circumstances.
What properties must an identifier have ?
A Well Known Scope is a Scope Environment, e.g. X.500, Corba Locator Service, DNS (machine identifiers), URL (file identifiers if the location is permanent).
Other Scope Environments might be:
A Well Known Scope is a scope of scope identifiers.
All identifiers discussed in this note are published, external identifiers. Any repository can use any internal identification mechanism it likes so long as it can support permanent, immutable external identifiers.
As an example, consider a system which uses 8-digit numbers as internal feature identifiers. Such as system might want to be able to represent foreign identifiers (e.g. to implement some feature-feature relationships) and it might do so by internally using identifiers of the form "9xxxxxxx". The initial "9" indicating an external identifier. Such a system would then have a bit of software in its export subsystem which published its own identifiers as scoped-identifiers and replaced the foreign identifiers with the original foreign scoped identifier (stored in some local registry).
The interesting thing about this example is that the initial "9" can itself be considered to be a scope-prefix inside the proprietary system. This illustrates the general points that
The important property of an identifier is that is resolvable, not that it is readable.
Some scopes may exist only to provide short-forms of identifiers, e.g. in the above example a single digit "9" represented the entire outside world. A deeply-nested identifier could accumulate a very long string of prefixes, so within an organisation or information community, a scope resolver which provided short-form aliases could be useful.
Scopes can cross-reference each other, thus the opaque part of an X.500 scope may be a Handle System server.
Any identifier published, e.g. in an email or quoted by some other piece of general purpose software for any value-adding purpose, must quote scope prefixes all the way back to the most-global Well Known Scope. This is where it is particularly useful if the prefixes understood by Well Known Scopes are not opaque but are readable.
The Handle System includes a large number of scopes, but all of them have " hdl://" as the first part of their prefix.
Conceptually, there exists a FeatureCollection of all the features whose identifiers are in the Scope, but
they are just implicitly associated. The FeatureCollection may not be instantiatable even in theory for some Scopes.
A Scope Object has these methods:
This last method is arguably implementable because we always know where we are in the scope sequence because we must always have come "down" some route through the scopes' DAG to get to the scope we are "in" now. Being able to produce a publishable identifier is clearly a requirement, how it is done should be left to responses to an RFP.
A Scope is probably an Interface not a Class, (using Java nomenclature), i.e. the set of Scope methods could be supported by many different objects of different classes.
| {scope, id} | Traverse(id) |
| boolean | IsEqual(id,id) |
| boolean | IsLeaf(id) |
| object | GetLeaf(id) |
| feature | GetLeafAsFeature(id) |
| id | GetLeafId(Object) |
| id | ComposeId(scope,id) |
In discussion, we provisionally decided that we did not want or need a method on Feature where you give it a scope and ask what its publishable identifier would be from that scope.
Added on 22 November 1998:
This list includes papers which I discovered only after I wrote the above paper. They are taken into account, with this paper, in current working drafts of OGC Abstract Specification Topic 5 (copyright © OGC, 1998 and only available on the member area of the OGC website). None of this appears in the public, released copy of the Abstract Specification yet.
draft-sun-handle-system-01.txt draft-leach-uuids-guids-01.txt, see also http://www.ics.uci.edu/~ejw/authoring/
This work performed at the European Commission Joint Research Centre.