Philip Sargent 4 May 1998
generalisation 1.html
Coming to this area as a newcomer, some aspects hit me immediately which may not be obvious to the more experienced. (I could be completely wrong.)
Automated generalisation and dynamic segmentation techniques have some characteristics in common whether they are applied to object, vector or raster data.
Just about all useful generalisation and segementation algorithms work by calculating statistics over a small area of the data ("local statistics") and then using user- or system-supplied cutoff values and parameters to make local changes to the geometric representation (generalisation) or attribute data (segmentation).
A by-product of these algorithms is therefore a great quantity of statistics on at least two levels of detail (scale). The generalised visual result is a single view of these statistics with most of the information thrown away.
With generalisation performed by any one formula, the statistics on the global dataset (the entire map) will always be different before and after the procedure. For example, the percentage of wild vegetation, agriculture and urban areas will be different in the original dataset from in the generalised dataste. This is because no algorithm that only changes data locally can ensure any global property whatsoever (unless it is restricted to only moving boundaries).
This simple fact is not mentioned anywhere in the March 1997 Technical Report from ITE (project T02072 O9), which in my opinion is a significant deficiency of an otherwise excellent piece of work. Indeed, the conclusions imply that the authors do not seem to realise this fact and thus recommend some actions which can never succeed in the general case. For CORINE data, however, the errors in the source data are so large that changes by almost any generalisation algorithm are not likely to be significant.
Which global statistics must be preserved and which are expected to be changed is dependent on the application requirements. A common cartographic need is to maintain global proportions of wild vegetation to agriculture, but to allow urban areas to grow by aggregation to enhance their visibility. For land cover maps (e.g. CORINE), all proportions must usually be maintained (though to some extent this depends on whether the GIS technology in use permits areas to be classified as "mixed" and to have as attibutes the percentage of each land cover type in that area).
By hand-tuning the parameters of the generalisation formula, it is possible to maintain a handful of global statistics over a number of different datasets without human intervention, but only if the datasets have similar landscapes. In general, different landscapes require different parameter settings to preserve global statistics and some combinations of global statistics are not maintainable at all.
A generalisation algorithm could be self-tuning to either
if the fine scale is generalised iteratively:
After initial generalisation, the global statistics are calculated. If they diverge from the assigned policy, the generalisation is re-done with automatically modified parameters to attempt to converge. If the statistics cannot be made to converge after several iterations (I would guess three), then a warning is attached to the generalised dataset.
It is easy to imagine a dataset which fails: it consists of 2 areas, black and white, and the algorithm has to produce a generalised dataset consisting of one area.
Because of the global convergence issue, I believe that an iterative, multi-scale algorithm is required whatever formula and parameters are used in the inner loop. Indeed, it is possible that otherwise crude formulae could be so improved by iteration that it would change the whole basis whereby generalisation algorithms are compared.