Scalable Query Processing

Quick links:

Interval data

Description

The interval overlap join is among the most widely used operators in temporal databases. Essentially, temporal databases store relations of explicit attributes that conform to a schema while data objects carry a validity interval. In this context, an interval overlap join identifies object pairs (or combinations) with overlapping validity. Besides temporal databases, interval joins also find application in other domains; in multi-dimensional spaces where an object can be represented as a set of intervals in a space-filling curve or in uncertain data management where uncertain values are represented as intervals (which can be paired with confidence values).

In this context, plane sweep is investigated as a competitive approach for interval joins; in brief, we proposed an optimized version of plane sweep which has the potential of performing fewer comparisons than the actual join output size. In addition, we devised a novel orthogonal parallel paradigm to benefit from modern hardware. In our follow up works, we studied temporal aggregation based on interval joins by introducing a semi-join operation and the band-join variant of the query.

Publications

  • Panagiotis Bouros, Artur Titkov, George Christodoulou, Christian Rauch and Nikos Mamoulis:
    HINT on Steroids: Batch Query Processing for Interval Data
    Proceedings of the 29th International Conference on Extending Database Technology (EDBT'24), Paestum, Italy, March 25-29, 2024 (to appear)
  • George Christodoulou, Panagiotis Bouros and Nikos Mamoulis:
    LIT: Lightning-fast In-memory Temporal Indexing
    Proceedings of the ACM on Management of Data (PACMMOD), Vol 2 No 1, February 2024
    To be presented at the 2024 ACM International Conference on Management of Data (ACM SIGMOD'24), Santiago, Chile, June 9-15, 2024
  • George Christodoulou, Panagiotis Bouros and Nikos Mamoulis:
    HINT: A Hierarchical Interval Index for Allen Relationships
    International Journal on Very Large Data Bases (VLDBJ), Vol 33, No 1, January 2024
  • George Christodoulou, Panagiotis Bouros and Nikos Mamoulis:
    HINT: A Hierarchical Index for Intervals in Main Memory
    Proceedings of the 2022 ACM International Conference on Management of Data (ACM SIGMOD), Philadelphia, PA, USA, June 12-17, 2022
  • Panagiotis Bouros, Dimitrios Tsitsigkos, Nikos Mamoulis and Manolis Terrovitis:
    In-Memory Interval joins
    International Journal on Very Large Data Bases (VLDBJ), Vol 30, No 4, July 2021
  • Panagiotis Bouros, Konstantinos Lampropoulos, Dimitrios Tsitsigkos, Nikos Mamoulis and Manolis Terrovitis:
    Band Joins for Interval Data
    Proceedings of the 23rd International Conference on Extending Database Technology (EDBT), Copenhagen, Denmark, March 30 - April 2, 2020
  • Panagiotis Bouros and Nikos Mamoulis:
    Interval Count Semi-Joins
    Proceeding of the 21st International Conference on Extending Database Technology (EDBT), Vienna, Austria, March 26-29, 2018
  • Panagiotis Bouros and Nikos Mamoulis:
    A Forward Scan based Plane Sweep Algorithm for Parallel Interval Joins
    Proceedings of the VLDB Endowment (PVLDB), Vol 10, No 11, July 2017

Theses

  • Christian Rauch:
    Temporal Information Retrieval (working title)
    Doctorate thesis, Johannes Gutenberg University Mainz, Germany
  • Jan Raider:
    A Graphical User Interface for Managing Interval Data
    Bachelor thesis, Johannes Gutenberg University Mainz, Germany
  • George Christodoulou:
    Managing Big and Complex Data in Main Memory
    Doctorate thesis, University of Ioannina, Greece, 2023

Source code

 

Spatial data

Description

The spatial join is a popular operation in spatial database systems and its evaluation is a well-studied problem. Despite the vast amount of research output on the subject, the complexity of the data and the consideration modern commodity hardware, which supports parallel processing, opens the road to a number of interesting directions for future research. We briefly survey of the background in spatial join computation and outline these directions. In an effort to cover some of modern challenges, we investigates the in-memory and parallel evaluation of spatial joins, by tuning a classic partitioning based algorithm. The study shows that, compared to a straightforward implementation of the algorithm, performance can be significantly improved by properly selecting partitioning parameters based on data statistics, in order to tune the algorithm for the given join inputs. The proposed parallel implementation scales gracefully with the number of threads reducing the cost of the join to at most one second even for join inputs with tens of millions of spatial objects.

Publications

  • Dimitrios Tsitsigkos, Panagiotis Bouros, Konstantinos Lampropoulos, Nikos Mamoulis and Manolis Terrovitis:
    Two-layer Space-oriented Partitioning for Non-point Data
    IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol 36, No 3, March 2024
  • Achilleas Michalopoulos, Dimitrios Tsitsigkos, Panagiotis Bouros, Nikos Mamoulis and Manolis Terrovitis:
    Efficient Nearest Neighbor Queries on Non-point Data
    Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL), Hamburg, Germany, November 13-16, 2023
  • Dimitrios Tsitsigkos, Konstantinos Lampropoulos, Panagiotis Bouros, Nikos Mamoulis and Manolis Terrovitis:
    A Two-layer Partitioning for Non-point Spatial Data
    Proceedings of the 37th IEEE International Conference on Data Engineering (ICDE), Chania, Greece, April 19-22, 2021
  • Dimitrios Tsitsigkos, Konstantinos Lampropoulos, Panagiotis Bouros, Nikos Mamoulis and Manolis Terrovitis:
    A Two-level Spatial In-Memory Index
    CoRR abs/2005.08600, May 2020
  • Dimitrios Tsitsigkos, Panagiotis Bouros, Nikos Mamoulis and Manolis Terrovitis:
    Parallel In-Memory Evaluation of Spatial Joins
    Proceedings of the 27th ACM International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL), Chicago, Illinois, USA, November 5-8, 2019
  • Panagiotis Bouros and Nikos Mamoulis:
    Spatial Joins: What's next?
    SIGSPATIAL Special, Vol 11, No 1, March 2019

Thesis

  • Achilleas Michalopoulos:
    Partitioning and Indexing Techniques for Scalable Spatial Query Evaluation
    Doctorate thesis, University of Ioannina, Greece
  • Dimitrios Tsitsigkos:
    Join Operators for Complex Data
    Doctorate thesis, University of Ioannina, Greece

 

Text and set data

Description

Sets are ubiquitous in computer science; they model transactions, scientific data, Web data, text etc. Contemporary data management systems allow the definition of set-valued data attributes and support operations such as containment queries. Join operations are also extended to include predicates on sets such as containment and similarity.

Regarding the former, we proposed (i) an adaptive methodology that alleviates the shortcomings of the state-of-the-art method which indexes the inputs using a prefix tree and an inverted index, and (ii) a novel join paradigm which interleaves indexing with the actual join process. We also studied the parallel computation of set containment joins. Regarding the similarity predicate, we proposed the grouping technique which builds on top of the widely-adopted prefix filtering allowing to batch process objects with identical prefix. Recently, another line of our work focuses on experimentally comparing set similarity join evaluation methods both on centralized and distributed environments.

Publications

Source code

 

Spatio-textual data

Description

We introduced the Spatio-Textual Similarity Join operation (ST-SJOIN) to identify object pairs (or combinations in case of multiple inputs) which are closely located in space and have a similar textual description. This is the first work on joining objects carrying spatial and text information. Intuitively, ST-SJOIN comes as a hybrid query operator which applies a join predicate in two dimensions of the data at the same time, and therefore, it finds application in a wide range of domains. In social network analysis, ST-SJOIN can be used for social recommendations; e.g., to recommend potential friend relationships between people with common interests (modeled as a keyword set), that live or socialize in nearby areas. As another application, ST-SJOIN can improve the effectiveness of data de-duplication, i.e., the task of retrieving near-identical objects; e.g., identifying photos that picture the same object in the photo-sharing services of Flickr or Instagram.

Publications

  • Artur Titkov and Panagiotis Bouros:
    Spatially Combined Keyword Searches
    Proceedings of the 25th International Conference on Extending Database Technology (EDBT), Edinburgh, UK, March 29 - April 1, 2022
  • Panagiotis Bouros, Shen Ge and Nikos Mamoulis:
    Spatio-Textual Similarity Joins
    Proceedings of the VLDB Endowment (PVLDB), Vol 6, No 1, November 2012

Theses

  • Yazhou Pan, Bachelor thesis
    Processing spatial keyword search queries
    Bachelor thesis, Johannes Gutenberg University Mainz, Germany, 2023
  • Maximilian Detlef Zerbe, Bachelor thesis
    Spatio-textual outlier detection
    Bachelor thesis, Johannes Gutenberg University Mainz, Germany, 2023
  • Artur Titkov:
    Spatially Combined Text Searches
    Bachelor thesis, Johannes Gutenberg University Mainz, Germany, 2020
  • Shen Ge:
    Advanced analysis and join queries in multidimensional spaces
    Doctorate thesis, University of Hong Kong, China PR, 2012

 

Ranking

Description

The amount of join results are often overwhelming; as only a subset would be further examined, a top-k join operator which retrieves the best k results comes of greater practical value. Top-k joins have been extensively studied in the past but only for relational equijoins, i.e., for primitive (numerical) join attributes and the equality predicate.

To fill this gap, we introduced a generalized ranking join operator on complex data types. For example, spatial locations as join attributes can be used to recommend to the visitors of a city the best k pairs of restaurants and hotels within short distance from each other that have the highest combined ratings. Top-k join for string join attributes finds application in tasks like data integration where the score ratings from different data sources are combined, or data cleaning and de-duplication. The evaluation of relational ranking joins is challenged by the cost of accessing the data objects; however, in case of complex data types the computational cost can easily become the bottleneck. In view of this, I proposed a novel evaluation paradigm which accesses objects from the inputs in blocks to reduce the CPU cost but without compromising at the same time, the access or I/O cost.

Publications

  • Shuyao Qi, Panagiotis Bouros and Nikos Mamoulis:
    Top-k Spatial Distance Joins
    International Journal on Advances of Computer Science for Geographic Information Systems (GeoInformatica), Vol 24, No 3, July 2020
  • Shuyao Qi, Panagiotis Bouros and Nikos Mamoulis:
    Top-k String Similarity Joins
    Proceedings of the 32nd International Conference on Scientific and Statistical Database Management (SSDBM), Vienna, Austria, July 7-9, 2020
  • Shuyao Qi, Panagiotis Bouros and Nikos Mamoulis:
    Efficient top-k Joins on Complex Data Types
    TR-2015-03, Department of Computer Science, HKU, Hong Kong SAR, China, December 2015
  • Shuyao Qi, Panagiotis Bouros and Nikos Mamoulis:
    Efficient Top-k Spatial Distance Joins
    Proceedings of the 13th International Symposium on Spatial and Temporal Databases (SSTD), Munich, Germany, August 21-23, 2013

Thesis