XML Metadata Extraction from SVG Databases

Byungwoo Kim
Center for Advanced Computer studies
University of Louisiana, Lafayette, LA 70504-4330
Jong P. Yoon
Center for Advanced Computer Studies
University of Louisiana, Lafayette, LA 70504-4330
telephone number


Vector graphics have a number of advantages over bitmap graphics. First, the size of vector graphics is relatively smaller than bitmap graphics. Secondly, vector graphics are scalable, so one can zoom in or out without loss of quality of image. Both advantages are important to many applications. As SVG is emerging as a promising graphics format in the Internet, in the near future, there will be a lot of SVG documents in the Web. However, SVG is not suitable for information retrieval. Therefore it is vital to develop a mechanism that extracts metadata from SVG for information retrieval.

1.1 Problem Description

All of the XML query languages use the tag matching to retrieve the desired data without using schema because contents of XML documents themselves are inside the documents. Even though SVG uses XML format, there are fundamental differences between XML and SVG. SVG is designed to display graphical objects in client browsers, and it does not represent the visual contents of the documents. The path tag doesn't represent anything about its content but display. By the nature of graphical data it is not easy to use tag matching technique to retrieve the desired data. This shortcoming motivates us to describe the contents of the SVG documents. Notice however that SVG has the "metadata" element to include metadata information about the SVG documents. Even though SVG has a metadata feature, it is not sufficient to represent user specified requests, and it is not enough for multimedia information retrieval.

The metadata about images can be used to speed up the retrieval of graphic objects. The metadata about object shapes, colors, and spatial relationships between those objects has been developed in many areas such as GIS, but is not yet enough to specify for content-sensitive objects. For example, Figure 1 (a) and (b) are same shapes but they have different angles, depicting rotational invariance. Figure 1 (a) and (b) have same shape but size is different, depicting scale invariance. Even though Figure 1 (a) and (b) have the same shape, SVG descriptions are not the same. The reason is that 2-dimensional coordinate system does not have the scale, rotation and transform invariance properties. Figure 1. Vector Image Examples

2. Related Work

Many of Multimedia researchers have typically focused on the issue of the retrieval of images using indexed image collections [1]. Some image researchers have concentrated on the design of multimedia IR systems [2]. The significance of spatial relationships has been pointed out by a number of researchers in spatial and image databases. The most common types of spatial relationships include direction, distance and topological relationships. El-Kwae et al. presented a framework for retrieving images by spatial similarity in image databases [4].

3. Preliminaries

3.1 SVG (Scalable Vector Graphics)

SVG is a standard for expressing two-dimensional graphics in XML. SVG is a XML language described by the predefined XML elements. SVG has three types of graphic objects: vector objects, images and text. Most of the application programs that support SVG output functionality generate the SVG using path elements. The reason is that path element is not only to express all other basic shape elements such as rectangle, lines, and etc. but also to support compact form. This paper focuses on the path element, from which metadata can be extracted.

4. Pattern Mining

4.1 Path Data Sequences

Path is a sequence of vector movements, which are defined as attributes of the path element, and implies the definition of the outline of a shape. Path data sequences are not invariant to the transform, scale, and rotation operations. For example roof shape of house in Figure 1 (a) and (b) both represent the same shape, however, they have different path data sequences. Therefore, we have to transform path data sequences to sequences that have property of scale, transform and rotational invariance. In this section, we propose two approaches to handle path data sequences: direction and angle.

4.1.1 Direction

This subsection describes the direction approach to handling path data sequences. There are a number of directions available, but we consider 8, 16, and 32 direction in this paper for simplicity. Transformation function for 8 directions is follows: If a pen moves east, assign character '3', and if a pen moves south, assign character '5' and so on.

4.2 Similarity of Path Data Sequences

We employ the notion of edit distance that has been widely applied quantify differences between strings. The definition of edit distance is the smallest number of insertions, deletions, and substitutions required to change one string or tree into another. Edit distance is used to compare character strings and used to other domains including computer vision, and molecular biology. In our case we transform path data sequences to string sequences that represents the shapes.

4.3 Aggregation of Paths

We classfy object shapes in vector images into the three types: primitive shape, unit shape, and composite shape. Primitive shapes are the shapes predefined by the system. The examples of the primitive shapes include rectangle, circle, ellipse, line, and etc. Unit shape consists of one path data sequence. The examples of unit shapes are a "roof" shape as shown in Figure 1 (a) and (b). Composite shapes are the ones derived. A composite shape is aggregated from primitive shapes, unit shapes, composite shapes, or any combination of them. For example, as you can see in Figure 1 (a) and (b) composite shape "door" consists of the two primitive shapes, circle and rectangle. A composite shape "window" consists of five primitive shapes (rectangle), while "house" consists of primitive shape (pentagon), unit shape (roof) and two composite shapes (door and window.)

In a composite object, there may be spatial relationships amongst those component objects. For example, in Figure 1 (a), the composite object "house" contains "door" and "window". The object "door" is in the west of the object "window".

4.4 Spatial Relationship of Composite Objects

In Figure 1(a), the shapes, roof, door, and window, are aggregated to constitute a composite object, house. Also, the shape car is a composite object, which consists of wheels and body. These two composite objects, house and car, have a special relationship: Car is in the west of House. As objects are aggregated to a composite object, those component MBRs are aggregated to form the MBR of the composite object. Each such MBR of composite objects has in turn spatial relationships each other. The method of identifying the relationship amongst composite objects is the same as the one for primitive objects.

5 Metadata in XML

The extracted and aggregated metadata need to be stored and managed. Since such metadata are extracted from SVG data (which is in XML), they are naturally represented in XML. Notice that the metadata is written in XML because the source data is XML-family SVG and XML is the most promising and flexible technology to describe semi-structured data. Path sequences in SVG can be recognized if stored in the XML instance as show in Figure 2. However, if not stored, then there are three approaches to identify path sequences: 1) user interactive, in that a shape description is given by users, 2) data mining, in that a shape description is identified if similar to any known image, 3) mixed approach, in that a data mining technique may suggest an image shape, but the description may be modified by users. Figure 2 XML Instance for Image Metadata

6 Conclusions

We proposed a new shape description mechanism for SVG documents. With these shape descriptor and spatial relationships, we can represent vector images in XML metadata, and therefore search for structurally sensitive multimedia data very efficiently.


  1. Diane Greene. An implementation and performance analysis of spatial data access. In Proceedings of the ACM SIGMOD, 1989.
  2. Yong Rui, Kaushik Chakrabarti, Sharad Mehrotra, Yunxin Zhao, and Thomas S. Huang. Dynamic D. Franzosa. Point-set Topological spatial relations. International Journal of Geographical Information Systems, 5(2):161-174, 1991.
  3. Essam A. El-Kwae and Mansur R. Kabuka. A Robust Framework for Content-Based Retrieval by Spatial Similarity in Image Databases. ACM Transactions on Information Systems, Vol. 17, No. 2, April 1999, Pages 174-198.