We propose a description logic approach to the semantic retrieval by content of graphics described in Scalable Vector Graphics (SVG), the novel XML based W3C approved standard language for describing twodimensional graphics. We provide a syntax to describe basic shapes, complex objects as compositions of basic ones, and transformations. An extensional semantics, which is compositional, is also introduced for defining retrieval, classification, and subsumption services. Using our logical approach as a formal specification, we implemented a prototype system. A set of experiments, carried out on a testbed of SVG documents to assess the retrieval capabilities of the system, is presented.
SVG, image retrieval, multimedia
Syntax. Adapting the formalism originally introduced in [2], the syntax of our language is defined through basic shapes, position of shapes, and geometric transformations. Basic shapes we consider are those proposed by SVG standard, namely circle, rectangle, ellipse, line, polyline, polygon, described through their attributes and transformations applied in the SVG language. Formally, basic shapes are denoted with the letter B, and have a contour e(B) characterizing them. We assume that e(B) is described in a space whose origin coincides with the centroid of B. The possible transformations are those defined in SVG. We globally denote a transformation as t. Transformations can be composed in sequences t_{1}°...° t_{n}, and they form a mathematical group. To make the syntax uniform, we consider also the pose of a basic shape as a transformation.
The basic building block of our syntax is a basic shape component B[c,t], which represents a basic shape B with color c and contour t(e(B)). With t(e(B)) we denote the pointwise transformation t of the whole contour of B. Composite shape descriptions are conjunctions of basic shape components denoted as
C = B_{1}[c_{1},t_{1}] ⊓¼ ⊓ B_{n}[c_{n},t_{n}]Semantics. The semantics of our language is defined as an interpretation of the syntactic elements of the language on a domain. The domain D is a set of SVG fragments and the interpretation of an object composed by basic shapes is a subset of the domain D, in which the basic shapes have the proper features. Hence, also a collection of SVG fragments is a domain of interpretation, and a complex shape C is a subset of such a domain  the fragments to be retrieved from the collection when C is viewed as a query. Formally, an interpretation is a pair (Á,D), where D is a set of SVG fragments, and Á is a mapping from shapes and components to subsets of D. We identify each fragment F with the set of shapes {s1,...,sn} it is composed by. Each shape s comes with its own edge contour e(s). A SVG fragment F ∈ ∆ belongs to the interpretation of a basic shape component B[t]^{Á} if F contains a shape whose contour matches t(e(B)). The definition can be extended to approximate recognition as follows. Recall that the characteristic function f_{S} of a set S is a function whose value is either 1 or 0; f_{S}(x) = 1 if x ∈ S, f_{S}(x) = 0 otherwise. We consider now the characteristic function of the set B[t]^{Á}.
Let F be a SVG fragment; if F belongs to B[t]^{Á}, then the characteristic function computed on F has value 1, otherwise it has value 0. To keep the number of symbols low, we use the expression B[t]^{Á} also to denote the characteristic function (with an argument (F) to distinguish it from the set).
B[t]^{Á}(F) =  ì í î 

B[t]^{Á}(F)
=max{sim(e(s),t(e(B))) }
s Î
F
Note that sim depends on transformations, since we are looking for shapes in F whose contour matches e(B), with reference to the position and size specified by t. The interpretation of basic shapes, instead, includes a transformation invariant recognition. We define the interpretation of a basic shape in the approximate recognition as the function
The maximization over all possible transformations maxt can be effectively computed by using a similarity measure simss that is invariant with reference to transformations. In this way, a basic shape B can be used as a query to retrieve all fragments from ∆ which are in B^{Á}. Composite shape descriptions are interpreted as sets of SVG fragments that contain all components of the composite shape. Components can be anywhere in the fragment description, as long as they are in the described arrangement relative to each other. Let C be a composite shape description B_{1}[t_{1}] ⊓.. ⊓ B_{n}[t_{n} ]. For approximate matching the interpretation of a composite shape is:
C^{Á}(F) =max min {{
B_{i}[(t°
t_{i})]^{Á}(F)}}
(1)
t
i=1,..,n
2. SYSTEM AND EXPERIMENTS
The test data set consists of a collection of 48 SVG documents picturing chemical structures, electronic circuits and general subject graphics not concerning a specific theme; a sample of them is shown in Figure 1. We selected from the test data set 20 documents to be used as queries. We separately asked two volunteers to classify in decreasing order, according to their judgment, the 48 documents based on their similarity to each SVG graphic of the selected query set. The volunteers had never used the system and they were only briefly instructed that rank orderings had to be based on the degree of conformance of the database documents with the queries. They were allowed to group documents when considered equivalent in content, and for each query, to discard documents that were judged wholly dissimilar from the query.
Having obtained two classifications, which were not univocal, we created the final ranking merging the previous similarity rankings according to a minimum ranking criterion. The final ranking of each document with respect to a query was determined as the minimum one among the two available.
Then we submitted the same set of 20 queries to the system, whose knowledge base was loaded only with the 48 documents of the test set. The resulting classification gave us what was called a systemprovided ranking. Again following the framework in [3], we adopted the R_{norm} as quality measure of the retrieval effectiveness. R_{norm} measure was first introduced in the LIVEProject [1] for the evaluation of textual information retrieval systems and it has been widely used int experiments on image retrieval based on spatial relationships.
R_{norm} values are in the range [0,1]; a value of 1 corresponds to a systemprovided ordering of the database images that is either identical to the one provided by the human experts or has a higher degree of resolution, lower values correspond to a proportional disagreement between the two.
Figure [3] shows results for queries shown in Figure [2] the final average R_{norm}=0.967. R_{norm} results are nearly always equal to 1, this means that the system provided results are almost always similar to userprovided ones.
Figure 2: The set of tested queries.
Figure 3: Detail of Rnorm values.
In order to provide also other typical information retrieval measures we report here results in terms of precision and recall, determined using the same users' provided rankings. Average measures are the following ones: Precision=0.95 Recall=0.87.
It is noteworthy that we also carried out a simple test using a fulltext retrieval tool. Element types were included in the indexed terms, while various other tags were considered stopwords. We indexed the same SVG documents and carried out retrieval with the same queries previously used and a cutoff at 6 documents. Average precision and recall were the following ones: Precision=0.58 Recall=0.54.