Date: | 2007/06/24 |
---|---|
Author: | Tamas Szekeres |
Contact: | szekerest at gmail.com |
Last Edited: | 2007/06/24 |
Status: | Discussion Draft |
Version: | Mapserver 5.0 |
Id: | $Id: ms-rfc-22a.txt 8278 2008-12-23 21:34:31Z hobu $ |
Currently the various query operations involve multiple accesses to the data providers which may cause a significant performance impact depending on the providers. In the first phase all of the features in the given search area are retrieved and the index of the relevant shapes are stored in the result cache. In the second phase the features in the result cache are retrieved form the provider one by one. Retaining the shapes in the memory we could eliminate the need of the subsequent accesses to the providers and increase the overall performance of the query. Implementing the cache requires a transformation of the data between the data provider and the client. From this aspect it is desirable to provide a framework to implement this transformation in a higher level of abstraction.
The main purpose of this RFC is to implement a feature cache and retain the shapes in the memory for the further retrievals during a query operation. However I would also want to create a mechanism so that a layer could use another layer as a data source. This outer layer could apply transformations on the shapes coming from the base layer or even retain the shapes in the memory for the subsequent fetches. Here are some other examples where this framework would provide a solution:
- Constructing geometries based on feature attributes
- Modifying the geometries or the feature attribute values
- Geometry transformations (like GEOS operations)
- Feature cache
- Providing default layer style based on external style information, or attribute based styling
- Providing custom data filters
Setting up a proper layer hierarchy one can solve pretty complex issues without the need of creating additional (eg. mapscript) code. Later on - as a live example - I’ll show up the solution of the following scenario:
- The user will select one or more shapes in one layer (by attibute selection in this case).
- Cascaded transformations will be applied on the selected shapes (I’ll use the GEOS convexhull and buffer operations)
- In another layer the features will be selected based on the transformed shapes as the selection shapes.
In this proposal to ensure the better readability I’ll avoid embedding much of the code inline. However most of the implementation patches are available at the tracking ticket of this RFC (see later).
This proposal involves writing additional data providers. These providers will use another layer as the source of the features. To set up this hierarchy of the layers, the CONNECTION parameter of these layers will contain the name of the source layer. In some cases the source layer doesn’t participate in the renderings and should be kept internal to the original layer. Therefore we will establish the support for nesting the layers into each other. In this regard the CONNECTION parameter will contain the full path of the layer it refers to. The path contains the layer names in the hierarchy using the slash ‘/’ as the separator between the names. We can specify the pathnames relative to the actual layer or relative to the map itself.
LAYER
NAME "roads"
CONNECTIONTYPE OGR
CONNECTION "roads.tab"
...
END
LAYER
NAME "cache"
CONNECTIONTYPE CACHE
CONNECTION "/roads"
...
END
LAYER
NAME "cache"
CONNECTIONTYPE CACHE
CONNECTION "roads"
...
LAYER
NAME "roads"
CONNECTIONTYPE OGR
CONNECTION "roads.tab"
...
END
END
The main difference between these 2 options is that in the first case the referred layer is contained by the layers collection of the map, so the layer will participate in the drawings. In the second case the referred layer is internal to the outer layer. It is supported that 2 or more layers connect to the same base layer so that the features will be served from the same cache repository. The base layer can reside in any place of the layer hierarchy.
In any case, the exension layer can also be implemented as a pluggable external layer. (CONNECTIONTYPE PLUGIN)
To achieve the desired functionality the following 3 providers will be implemented:
The purpose of this provider is to retain the features from the source layer in the memory so the subsequent fetches on the feature caching provider will be served from the internal cache. With the current implementation I’m planning to retain the shapes of last extent. When the subsequent shape retrieval refers to the same extent (or within that extent) the features will be served from the cache. At the moment I will not address caching features from multiple non overlapping extents and implement more sophisticated cache management (like size limit/expiration handling) but it might be the object of an enhancement in the future. All of the provider specific data will be placed in the layerinfo structure of the provider. The shapes in the cache will be stored in a hashtable. This provider will be implemented in a separate file (mapcache.c).
This provider will be capable to populate the cache with all of the shapes in the given extent or only with the shapes in the resultcache of the source layer. The first one is the default option and the latter can be selected by the following PROCESSING directive:
PROCESSING "fetch_mode=selection"
The cache will be populated upon the WhichShapes call of the caching provider when the given rect falls outside of the previous one. We can also specify that the provider will retieve all the shapes of the current map extent regardless to the search area using the following setting:
PROCESSING "spatial_selection_mode=mapextent"
The feature caching provider will store all of the items available regardless of which items have been selected externally (by using the WhichItems call). However the iteminfo will contain the indexes only of the requested items. The GetShape and the NextShape operations will copy only the requested subset of the items to the caller. This solution will provide the compatibility with any other existing provider so there’s no need to alter the common mapserver code from this aspect.
By using the STYLEITEM “AUTO” option the provider can retrieve the classObj of every shapeObj from the data source. So as to keep on supporting this option the caching provider will be capable to cache these classObj-s as well as the shapeObj-s in a separate hashtable. When the STYLEITEM “AUTO” option is set on the feature caching provider the style cache will also be populated by calling the GetAutoStyle function to the source layer. The subsequent GetAutoStyle calls on the feature caching provider will retrieve the classObj-s from the style cache and provide a copy to the caller.
The feature caching provider will be compatible with the existing query operations (implemented in the msQueryBy[] functions in mapquery.c). For supporting the msQueryByAttributes the NextShape will use msEvalExpression before returning the shapes to the caller.
The geometry transformation provider (implemented in mapgeomtrans.c) will support transparent access to the source layer. Every vtable operations will be dispatched to corresponding vtable operation of the source layer. In addition some of the other layer properties might require to copy between the connected layers.
This provider will serve the same subset of the items as the source layer provides. Therefore the InitIteminfo will copy the items and the numitems of the external layer to the source layer, like:
int msGeomTransLayerInitItemInfo( layerObj *layer )
{
/* copying the item array an call the subsequent InitItemInfo*/
return msLayerSetItems(((GeomTransLayerInfo*)layer->layerinfo)->srclayer, layer->items, layer->numitems);
}
And upon the GetItems call these values will be copied back to the external layer, like:
int msGeomTransLayerGetItems(layerObj *layer)
{
/* copying the item array back */
int result;
GeomTransLayerInfo* layerinfo = (GeomTransLayerInfo*)layer->layerinfo;
result = msLayerGetItems(layerinfo->srclayer);
msLayerSetItems(layer, layerinfo->srclayer->items, layerinfo->srclayer->numitems);
return result;
}
The proposed implementation will support most of the GEOS transformations supported by mapserver (buffer, convexhull, boundary, union, intersection, difference, symdifference). The transformations will be applied right before retrieving a shape to the caller. The desired transformation can be selected using a PROCESSING option, like:
PROCESSING "transformation=buffer"
Some of the transformations will accept further parameters:
PROCESSING "buffer_width=0.2"
Some of these operations will use 2 shapes to create the transformed shape. The reference shape of these transformations can be set externally using the WKT representation of a processing option:
PROCESSING "ref_shape=[WKT of the shape]"
I’m also planning to support retrieving this shape from the features collection of the layer and from an external layer as well.
The layer filter provider (implemented in mapfilter.c) will provide the shape retrieved from the source layer and will not apply any transformation on that. However some of the shapes will be skipped in the NextShape operations depending on the spatial and the attribute selection options. This provider ensures transparent access to the source layer (just as the previous one) but uses a cache for storing the selection shapes. The selection shapes will be retrieved from another layer which can be specified in the following processing option:
PROCESSING "selection_layer=[path to the layer]"
When populating the selection cache I’ll support to retrieve all of the shapes or only the shapes in the result cache as for the caching provider mentioned before:
PROCESSING "fetch_mode=selection"
In this chapter I’ll describe the solution of the scenario have been mentioned earlier. I’ll start with a simple map file definition with 2 layers the counties and the citypoints of the country:
MAP
NAME "Hungary"
STATUS ON
EXTENT 421543.362603 47885.103526 933973.563202 367180.479761
SIZE 800 600
IMAGETYPE PNG
IMAGECOLOR 255 255 255
PROJECTION
"proj=somerc"
"lat_0=47.14439372222222"
"lon_0=19.04857177777778"
"x_0=650000"
"y_0=200000"
"ellps=GRS67"
"units=m"
"no_defs"
END
SYMBOL
Name 'point'
Type ELLIPSE
POINTS
1 1
END
FILLED TRUE
END
LAYER
PROJECTION
"AUTO"
END
NAME "Hun_Counties"
CONNECTIONTYPE OGR
CONNECTION "Hun_Megye.TAB"
STATUS default
TYPE POLYGON
LABELITEM "name"
CLASS
TEMPLATE "query.html"
LABEL
SIZE medium
COLOR 64 128 64
END
STYLE
COLOR 208 255 208
OUTLINECOLOR 64 64 64
END
END
END
LAYER
PROJECTION
"AUTO"
END
NAME "Hun_CityPoints"
CONNECTIONTYPE OGR
CONNECTION "Hun_CityPoints.TAB"
STATUS default
TYPE POINT
CLASS
STYLE
COLOR 255 0 0
SYMBOL "point"
SIZE 2
END
END
END
END
This map will look like this:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample.png
Any of the layers might be cached by using the CACHE layer provider. The original provider might be nested inside the cache. The parameters related to the drawing should be specified for the outer layer, like:
LAYER
PROJECTION
"AUTO"
END
NAME "Hun_Counties_cache"
CONNECTIONTYPE CACHE
CONNECTION "Hun_Counties"
STATUS default
TYPE POLYGON
LABELITEM "name"
CLASS
TEMPLATE "query.html"
LABEL
SIZE medium
COLOR 64 128 64
END
STYLE
COLOR 208 255 208
OUTLINECOLOR 64 64 64
END
END
LAYER
PROJECTION
"AUTO"
END
NAME "Hun_Counties"
CONNECTIONTYPE OGR
CONNECTION "Hun_Megye.TAB"
TYPE POLYGON
END
END
The Hun_Counties_cache layer is queryable and all of the existing methods are available to populate the resultcache. In my example I’ll use the mapscript C# drawquery example (implemented in drawquery.cs) to execute an attribute query and use the drawquery afterwards. The following command will be used along with this sample:
drawquery sample.map "('[Name]'='Tolna')" sample.png
Which passes the (‘[Name]’=’Tolna’) query string to the queryByAttributes of the first layer.
The result of the rendering will look like:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample1.png
The corresponding mapfile can be found here:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample1.map
In the next step I’ll apply a cascaded GEOS convexhull and buffer operations on the first layer. The results will be rendered in a separate layer using the following definition:
LAYER
NAME "selectionshape"
CONNECTIONTYPE GEOMTRANS
CONNECTION "simplify"
STATUS default
TYPE POLYGON
TRANSPARENCY 50
CLASS
STYLE
COLOR 64 255 64
OUTLINECOLOR 64 64 64
END
END
PROCESSING "transformation=buffer"
PROCESSING "buffer_width=0.2"
LAYER
NAME "simplify"
TYPE POLYGON
CONNECTIONTYPE GEOMTRANS
CONNECTION "/Hun_Counties_cache"
PROCESSING "transformation=convexhull"
END
END
The result of the rendering will look like:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample2.png
The corresponding mapfile:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample2.map
Which is not the desired result since all of the polygons were transformed. So as to transform only the selected shape an additional cache should be specified by using the “fetch_mode=selection” processing option:
LAYER
NAME "selectionshape"
CONNECTIONTYPE GEOMTRANS
CONNECTION "simplify"
STATUS default
TYPE POLYGON
TRANSPARENCY 50
CLASS
STYLE
COLOR 64 255 64
OUTLINECOLOR 64 64 64
END
END
PROCESSING "transformation=buffer"
PROCESSING "buffer_width=0.2"
LAYER
NAME "simplify"
TYPE POLYGON
CONNECTIONTYPE GEOMTRANS
CONNECTION "selectioncache"
PROCESSING "transformation=convexhull"
LAYER
NAME "selectioncache"
TYPE POLYGON
CONNECTIONTYPE CACHE
CONNECTION "/Hun_Counties_cache"
PROCESSING "fetch_mode=selection"
END
END
END
And here is the result of the drawing:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample3.png
The corresponding mapfile:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample3.map
My final goal is to select the features of the point layer using the transformed shape as the selection shape. Therefore I’ll have to use the layer filter provider on the point layer and setting the selection_layer to the transformed layer:
LAYER
TYPE POINT
CONNECTIONTYPE LAYERFILTER
CONNECTION "Hun_CityPoints"
NAME "selectedpoints"
STATUS default
PROCESSING "selection_layer=/selectionshape"
CLASS
STYLE
COLOR 255 0 0
SYMBOL "point"
SIZE 2
END
END
LAYER
PROJECTION
"AUTO"
END
NAME "Hun_CityPoints"
CONNECTIONTYPE OGR
CONNECTION "Hun_CityPoints.TAB"
TYPE POINT
END
END
Here is the result:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample4.png
The corresponding mapfile:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample4.map
Note1: Without altering the map configuration I can modify the selection externally and the rendered image will reflect the changes automatically. For example I can select 2 counties using the query string: (‘[Name]’=’Tolna’ or ‘[Name]’=’Baranya’)
The resulting image can be found here:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample4a.png
Note2: If there’s no need to render the selection layer I can specify that configuration inline:
LAYER
TYPE POINT
CONNECTIONTYPE LAYERFILTER
CONNECTION "Hun_CityPoints"
NAME "selectedpoints"
STATUS default
PROCESSING "selection_layer=selectionshape"
CLASS
STYLE
COLOR 255 0 0
SYMBOL "point"
SIZE 2
END
END
LAYER
PROJECTION
"AUTO"
END
NAME "Hun_CityPoints"
CONNECTIONTYPE OGR
CONNECTION "Hun_CityPoints.TAB"
TYPE POINT
END
LAYER
NAME "selectionshape"
CONNECTIONTYPE GEOMTRANS
CONNECTION "simplify"
STATUS default
TYPE POLYGON
TRANSPARENCY 50
CLASS
STYLE
COLOR 64 255 64
OUTLINECOLOR 64 64 64
END
END
PROCESSING "transformation=buffer"
PROCESSING "buffer_width=0.2"
LAYER
NAME "simplify"
TYPE POLYGON
CONNECTIONTYPE GEOMTRANS
CONNECTION "selectioncache"
PROCESSING "transformation=convexhull"
LAYER
NAME "selectioncache"
TYPE POLYGON
CONNECTIONTYPE CACHE
CONNECTION "/Hun_Counties_cache"
PROCESSING "fetch_mode=selection"
END
END
END
END
Here is the result:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample5.png
And the corresponding mapfile:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample5.map
And recall that because I’ve used a cache on the counties layer all of these renderings require to retrieve the shapes only once from the original provider. That was the primary objective of this RFC.
To achieve the desired fuctionality the following major steps should be done in the mapserver core. The details of the proposed changes can be found here:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/common.patch
The current hashtable implementation (in maphash.c) will be generalized to be capable to store objects as well as strings. Currently the hashtable can store only string values. In addition the we will provide support for specifying the hashsize upon the construction of the hashtable. The objects will be stored in the hashtable by reference and the destroy function of the objects can also be specified externally. The following functions will be added to maphash.c
int initHashTableEx( hashTableObj *table, int hashSize );
void msClearHashItems( hashTableObj *table );
struct hashObj *msInsertHashTablePtr(hashTableObj *table, const char *key, const char *value);
struct hashObj *msFirstItemFromHashTable( hashTableObj *table);
struct hashObj *msNextItemFromHashTable( hashTableObj *table, struct hashObj *lastItem );
int msGetHashTableItemCount(hashTableObj *table);
initHashTableEx can be called to specify the hash size externally. Actually the original initHashTable will be implemented as calling initHashTableEx with MS_HASHSIZE. msClearHashItems will clear all of the elements of the hashtable but does not clear the items array. msInsertHashTablePtr provides the support for adding object by reference to the hashtable. msFirstItemFromHashTable and msNextItemFromHashTable will provide the iteration of the elements efficiently and will be called by the NextShape of the feature caching provider. msGetHashTableItemCount will retrieve the actual number of the hashtable items.
The sublayers in the layerObj structure will be stored as an array of layers.
typedef struct layer_obj {
...
#ifndef SWIG
struct layer_obj **layers;
int numlayers; /* number of sublayers in layer */
#endif /* SWIG */
...
} layerObj;
The parser will take care of loading the nested layers:
int loadLayer(layerObj *layer, mapObj *map)
{
...
case (LAYER):
if(layer->numlayers == 0)
layer->layers = (layerObj**) malloc(sizeof(layerObj*));
else
layer->layers = (layerObj**) realloc(layer->layers, sizeof(layerObj*) * (layer->numlayers));
layer->layers[layer->numlayers]=(layerObj*)malloc(sizeof(layerObj));
if (layer->layers[layer->numlayers] == NULL) {
msSetError(MS_MEMERR, "Malloc of a new layer failed.", "loadLayer()");
return MS_FAILURE;
}
if(loadLayer(layer->layers[layer->numlayers], map) == -1) return MS_FAILURE;
layer->layers[layer->numlayers]->index = layer->numlayers; /* save the index */
++layer->numlayers;
break;
...
}
A new data connection type is established for the new providers.
enum MS_CONNECTION_TYPE {MS_INLINE, MS_SHAPEFILE, MS_TILED_SHAPEFILE, MS_SDE, MS_OGR, MS_UNUSED_1, MS_POSTGIS, MS_WMS, MS_ORACLESPATIAL, MS_WFS, MS_GRATICULE, MS_MYGIS, MS_RASTER, MS_PLUGIN, MS_GEOMTRANS, MS_LAYERFILTER, MS_CACHE };
The lexer will be modified to interpret the new connection types.
The providers would keep persistent data between the various Connection/Close operations on that layer so we should establish a mechanism to destroy this provider specific data by adding a new method to the layervtable:
void (*LayerDestroy)(layerObj *layer);
int
msInitializeVirtualTable(layerObj *layer)
{
if (layer->baselayer)
msInitializeVirtualTable(layer->baselayer);
...
switch(layer->connectiontype) {
...
case(MS_GEOMTRANS):
return(msGeomTransLayerInitializeVirtualTable(layer));
break;
case(MS_LAYERFILTER):
return(msFilterLayerInitializeVirtualTable(layer));
break;
case(MS_CACHE):
return(msCacheLayerInitializeVirtualTable(layer));
break;
...
}
..
}
The following files will be affected by this RFC:
maphash.c
maphash.h
maplayer.c
maplexer.l
mapfile.c
map.h
Makefile.vc
Makefile.in
mapcache.c (new)
mapgeomtrans.c (new)
mapfilter.c (new)
These changes will retain mapfile and mapscript backward compatibility.
None