Integrating Content and Structure into a Comprehensive Framework for XML Document Similarity Represented in 3D Space
- Eric Draken
- Tamer N. Jarada
- Keivan Kianmehr
- Reda Alhajj
XML is attractive for data exchange between different platforms, and the number of XML documents is rapidly increasing. This raised the need for techniques capable of investigating the similarity between XML documents to help in classifying them for better organized utilization.
In fact, the idea of similarity between documents is not new. However, XML documents are more rich and informative than classical documents in the sense that they encapsulate both structure and content; on the other hand, classical documents are characterized only by the content. According, using both the content and structure of XML documents to assign a similarity metric is relatively new. Of the recent research and algorithms proposed in the literature, the majority assign a similarity metric between 0.0 and 1.0 when comparing two XML documents. The similarity measures between multiple XML documents may be arranged in a matrix whereby data mining may be done to cluster closely related documents. In this chapter the authors have presented a novel way to represent XML document similarity in 3D space.
Their approach benefits from the characteristics of the XML documents to produce a measure to be used in clustering and classification techniques, information retrieval and searching methods for the case of XML documents. We mainly derive a three dimensional vector per document by considering two dimensions as the document’s structural and content, while the third dimension is a combination of both structure and content characteristics of the document. The outcome from our research allows users to intuitively visualize document similarity.
- Similarity measures
- 3D space
- intuitive representation
- document similarity
- platform independence
Draken, E. et al., 2011. Integrating Content and Structure into a Comprehensive Framework for XML Document Similarity Represented in 3D Space. Studies in Computational Intelligence, pp.275–287. Available at: http://dx.doi.org/10.1007/978-3-642-22913-8_13.
Making Query Coding in SQL Easier by Implementing the SQL Divide Keyword: An Experimental Query Rewriter in Java
- Eric Draken
- Shang Gao
- Reda Alhajj
Book title: Advanced Database Query Systems: Techniques, Applications and Technologies
Chapter 12: Making Query Coding in SQL Easier by Implementing the SQL Divide Keyword: An Experimental Query Rewriter in Java pp 287-303
Relational Algebra (RA) and structured query language (SQL) are supposed to have a bijective relationship by having the same expressive power. That is, each operation in SQL can be mapped to one RA equivalent and vice versa. Actually, this is an essential fact because in commercial database management systems, every SQL query is translated into equivalent RA expression, which is optimized and executed to produce the required output.
However, RA has an explicit relational division symbol (÷), whereas SQL does not have a corresponding explicit division keyword. Division is implemented using a combination of four core operations, namely cross product, difference, selection, and projection. In fact, to implement relational division in SQL requires convoluted queries with multiple nested select statements and set operations. Explicit division in relational algebra is possible when the divisor is static; however, a dynamic divisor forces the coding of the query to follow the explicit expression using the four core operators. On the other hand, SQL does not provide any flexibility for expressing division when the divisor is static. Thus, the work described in this chapter is intended to provide SQL expression equivalent to explicit relational algebra division (with static divisor). In other words, the goal is to implement a SQL query rewriter in Java which takes as input a divide grammar and rewrites it to an efficient query using current SQL keywords. The developed approach could be adapted as front-end or wrapper to existing SQL query system.Users will be able to express explicit division in SQL which will be translated into an equivalent expression that involves only the standard SQL keywords and structure. This will turn SQL into more attractive for specifying queries involving explicit division.
- Relational algebra
- optimized retrieval
- divide keyword
- relational division
Draken, E., Gao, S. & Alhajj, R., Making Query Coding in SQL Easier by Implementing the SQL Divide Keyword. Techniques, Applications and Technologies, pp.287–303. Available at: http://dx.doi.org/10.4018/978-1-60960-475-2.ch012.