Quick links:
Interval data
Description
The interval overlap join is among the most widely used operators in temporal databases. Essentially, temporal databases store relations of explicit attributes that conform to a schema while data objects carry a validity interval. In this context, an interval overlap join identifies object pairs (or combinations) with overlapping validity. Besides temporal databases, interval joins also find application in other domains; in multi-dimensional spaces where an object can be represented as a set of intervals in a space-filling curve or in uncertain data management where uncertain values are represented as intervals (which can be paired with confidence values).
In this context, plane sweep is investigated as a competitive approach for interval joins; in brief, we proposed an optimized version of plane sweep which has the potential of performing fewer comparisons than the actual join output size. In addition, we devised a novel orthogonal parallel paradigm to benefit from modern hardware. In our follow up works, we studied temporal aggregation based on interval joins by introducing a semi-join operation and the band-join variant of the query.
Publications
- Panagiotis Bouros, Artur Titkov, George Christodoulou, Christian Rauch and Nikos Mamoulis:
HINT on Steroids: Batch Query Processing for Interval Data
Proceedings of the 27th International Conference on Extending Database Technology (EDBT), Paestum, Italy, March 25-29, 2024 - George Christodoulou, Panagiotis Bouros and Nikos Mamoulis:
LIT: Lightning-fast In-memory Temporal Indexing
Proceedings of the ACM on Management of Data (PACMMOD), Vol 2 No 1, February 2024
Presented at the 2024 ACM International Conference on Management of Data (ACM SIGMOD), Santiago, Chile, June 9-15, 2024 - George Christodoulou, Panagiotis Bouros and Nikos Mamoulis:
HINT: A Hierarchical Interval Index for Allen Relationships
International Journal on Very Large Data Bases (VLDBJ), Vol 33, No 1, January 2024 - George Christodoulou, Panagiotis Bouros and Nikos Mamoulis:
HINT: A Hierarchical Index for Intervals in Main Memory
Proceedings of the 2022 ACM International Conference on Management of Data (ACM SIGMOD), Philadelphia, PA, USA, June 12-17, 2022 - Panagiotis Bouros, Dimitrios Tsitsigkos, Nikos Mamoulis and Manolis Terrovitis:
In-Memory Interval joins
International Journal on Very Large Data Bases (VLDBJ), Vol 30, No 4, July 2021 - Panagiotis Bouros, Konstantinos Lampropoulos, Dimitrios Tsitsigkos, Nikos Mamoulis and Manolis Terrovitis:
Band Joins for Interval Data
Proceedings of the 23rd International Conference on Extending Database Technology (EDBT), Copenhagen, Denmark, March 30 - April 2, 2020 - Panagiotis Bouros and Nikos Mamoulis:
Interval Count Semi-Joins
Proceeding of the 21st International Conference on Extending Database Technology (EDBT), Vienna, Austria, March 26-29, 2018 - Panagiotis Bouros and Nikos Mamoulis:
A Forward Scan based Plane Sweep Algorithm for Parallel Interval Joins
Proceedings of the VLDB Endowment (PVLDB), Vol 10, No 11, July 2017
Theses
- Christian Rauch:
Temporal Information Retrieval (working title)
Doctorate thesis, Johannes Gutenberg University Mainz, Germany - Giorgos Kotsinas:
Ranking Queries over Range Data
Master thesis, University of Ioannina, Greece, 2024
Primary supervisor: Prof. Nikos Mamoulis - Jan Raider:
A Graphical User Interface for Managing Interval Data
Bachelor thesis, Johannes Gutenberg University Mainz, Germany, 2024 - George Christodoulou:
Managing Big and Complex Data in Main Memory
Doctorate thesis, University of Ioannina, Greece, 2023
Source code
- EDBT'24, GitHub repository for batch HINT, https://github.com/pbour/batch_hint
- ACM PACMMOD'24, GitHub repository for LIT , https://github.com/GiorgosChristodoulou/LIT
- VLDBJ and ACM SIGMOD'22, GitHub repository for HINT, https://github.com/pbour/hint
- PVLDB'17, GitHub repository for interval join, https://github.com/pbour/ijoin
Spatial data
Description
The spatial join is a popular operation in spatial database systems and its evaluation is a well-studied problem. Despite the vast amount of research output on the subject, the complexity of the data and the consideration modern commodity hardware, which supports parallel processing, opens the road to a number of interesting directions for future research. We briefly survey of the background in spatial join computation and outline these directions. In an effort to cover some of modern challenges, we investigates the in-memory and parallel evaluation of spatial joins, by tuning a classic partitioning based algorithm. The study shows that, compared to a straightforward implementation of the algorithm, performance can be significantly improved by properly selecting partitioning parameters based on data statistics, in order to tune the algorithm for the given join inputs. The proposed parallel implementation scales gracefully with the number of threads reducing the cost of the join to at most one second even for join inputs with tens of millions of spatial objects.
Publications
- Dimitrios Tsitsigkos, Panagiotis Bouros, Konstantinos Lampropoulos, Nikos Mamoulis and Manolis Terrovitis:
Two-layer Space-oriented Partitioning for Non-point Data
IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol 36, No 3, March 2024 - Achilleas Michalopoulos, Dimitrios Tsitsigkos, Panagiotis Bouros, Nikos Mamoulis and Manolis Terrovitis:
Efficient Nearest Neighbor Queries on Non-point Data
Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL), Hamburg, Germany, November 13-16, 2023 - Dimitrios Tsitsigkos, Konstantinos Lampropoulos, Panagiotis Bouros, Nikos Mamoulis and Manolis Terrovitis:
A Two-layer Partitioning for Non-point Spatial Data
Proceedings of the 37th IEEE International Conference on Data Engineering (ICDE), Chania, Greece, April 19-22, 2021 - Dimitrios Tsitsigkos, Konstantinos Lampropoulos, Panagiotis Bouros, Nikos Mamoulis and Manolis Terrovitis:
A Two-level Spatial In-Memory Index
CoRR abs/2005.08600, May 2020 - Dimitrios Tsitsigkos, Panagiotis Bouros, Nikos Mamoulis and Manolis Terrovitis:
Parallel In-Memory Evaluation of Spatial Joins
Proceedings of the 27th ACM International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL), Chicago, Illinois, USA, November 5-8, 2019 - Panagiotis Bouros and Nikos Mamoulis:
Spatial Joins: What's next?
SIGSPATIAL Special, Vol 11, No 1, March 2019
Thesis
- Achilleas Michalopoulos:
Partitioning and Indexing Techniques for Scalable Spatial Query Evaluation
Doctorate thesis, University of Ioannina, Greece - Dimitrios Tsitsigkos:
Join Operators for Complex Data
Doctorate thesis, University of Ioannina, Greece
Text and set data
Description
Sets are ubiquitous in computer science; they model transactions, scientific data, Web data, text etc. Contemporary data management systems allow the definition of set-valued data attributes and support operations such as containment queries. Join operations are also extended to include predicates on sets such as containment and similarity.
Regarding the former, we proposed (i) an adaptive methodology that alleviates the shortcomings of the state-of-the-art method which indexes the inputs using a prefix tree and an inverted index, and (ii) a novel join paradigm which interleaves indexing with the actual join process. We also studied the parallel computation of set containment joins. Regarding the similarity predicate, we proposed the grouping technique which builds on top of the widely-adopted prefix filtering allowing to batch process objects with identical prefix. Recently, another line of our work focuses on experimentally comparing set similarity join evaluation methods both on centralized and distributed environments.
Publications
- Fabian Fier, Nikolaus Augsten, Panagiotis Bouros, Ulf Leser and Johann-Christoph Freytag:
Set Similarity Joins on MapReduce: An Experimental Survey
Proceedings of the VLDB Endowment (PVLDB), Vol 11, No 10, May 2018 - Panagiotis Bouros, Nikos Mamoulis, Shen Ge and Manolis Terrovitis:
Set Containment Join Revisited
International Journal on Knowledge and Information Systems (KAIS), Vol 49, No 1, October, 2016 - Willi Mann, Nikolaus Augsten and Panagiotis Bouros:
An Empirical Evaluation of Set Similarity Join Techniques
Proceedings of the VLDB Endowment (PVLDB), Vol 9, No 9, May 2016 - Anja Kunkel, Astrid Rheinländer, Christopher Schiefer, Sven Helmer, Panagiotis Bouros and Ulf Leser:
PIEJoin: Towards Parallel Set Containment Joins using Prefix Tree Indexes
Proceedings of the 28th International Conference on Scientific and Statistical Database Management (SSDBM), Budapest, Hungary, July 18-20, 2016 - Panagiotis Bouros, Shen Ge and Nikos Mamoulis:
Spatio-Textual Similarity Joins
Proceedings of the VLDB Endowment (PVLDB), Vol 6, No 1, November 2012 - Manolis Terrovitis, Panagiotis Bouros, Panos Vassiliadis, Timos Sellis and Nikos Mamoulis:
Efficient Answering of Set Containment Queries for Skewed Item Distributions
Proceedings of the 14th International Conference on Extending Database Technology (EDBT), Uppsala, Sweden, March 21-25, 2011
Source code
- PVLDB'16, ssjoins http://ssjoin.dbresearch.uni-salzburg.at
Spatio-textual data
Description
We introduced the Spatio-Textual Similarity Join operation (ST-SJOIN) to identify object pairs (or combinations in case of multiple inputs) which are closely located in space and have a similar textual description. This is the first work on joining objects carrying spatial and text information. Intuitively, ST-SJOIN comes as a hybrid query operator which applies a join predicate in two dimensions of the data at the same time, and therefore, it finds application in a wide range of domains. In social network analysis, ST-SJOIN can be used for social recommendations; e.g., to recommend potential friend relationships between people with common interests (modeled as a keyword set), that live or socialize in nearby areas. As another application, ST-SJOIN can improve the effectiveness of data de-duplication, i.e., the task of retrieving near-identical objects; e.g., identifying photos that picture the same object in the photo-sharing services of Flickr or Instagram.
Publications
- Artur Titkov and Panagiotis Bouros:
Spatially Combined Keyword Searches
Proceedings of the 25th International Conference on Extending Database Technology (EDBT), Edinburgh, UK, March 29 - April 1, 2022 - Panagiotis Bouros, Shen Ge and Nikos Mamoulis:
Spatio-Textual Similarity Joins
Proceedings of the VLDB Endowment (PVLDB), Vol 6, No 1, November 2012
Theses
- Yazhou Pan, Bachelor thesis
Processing spatial keyword search queries
Bachelor thesis, Johannes Gutenberg University Mainz, Germany, 2023 - Maximilian Detlef Zerbe, Bachelor thesis
Spatio-textual outlier detection
Bachelor thesis, Johannes Gutenberg University Mainz, Germany, 2023 - Artur Titkov:
Spatially Combined Text Searches
Bachelor thesis, Johannes Gutenberg University Mainz, Germany, 2020 - Shen Ge:
Advanced analysis and join queries in multidimensional spaces
Doctorate thesis, University of Hong Kong, China PR, 2012
Ranking
Description
The amount of join results are often overwhelming; as only a subset would be further examined, a top-k join operator which retrieves the best k results comes of greater practical value. Top-k joins have been extensively studied in the past but only for relational equijoins, i.e., for primitive (numerical) join attributes and the equality predicate.
To fill this gap, we introduced a generalized ranking join operator on complex data types. For example, spatial locations as join attributes can be used to recommend to the visitors of a city the best k pairs of restaurants and hotels within short distance from each other that have the highest combined ratings. Top-k join for string join attributes finds application in tasks like data integration where the score ratings from different data sources are combined, or data cleaning and de-duplication. The evaluation of relational ranking joins is challenged by the cost of accessing the data objects; however, in case of complex data types the computational cost can easily become the bottleneck. In view of this, I proposed a novel evaluation paradigm which accesses objects from the inputs in blocks to reduce the CPU cost but without compromising at the same time, the access or I/O cost.
Publications
- Shuyao Qi, Panagiotis Bouros and Nikos Mamoulis:
Top-k Spatial Distance Joins
International Journal on Advances of Computer Science for Geographic Information Systems (GeoInformatica), Vol 24, No 3, July 2020 - Shuyao Qi, Panagiotis Bouros and Nikos Mamoulis:
Top-k String Similarity Joins
Proceedings of the 32nd International Conference on Scientific and Statistical Database Management (SSDBM), Vienna, Austria, July 7-9, 2020 - Shuyao Qi, Panagiotis Bouros and Nikos Mamoulis:
Efficient top-k Joins on Complex Data Types
TR-2015-03, Department of Computer Science, HKU, Hong Kong SAR, China, December 2015 - Shuyao Qi, Panagiotis Bouros and Nikos Mamoulis:
Efficient Top-k Spatial Distance Joins
Proceedings of the 13th International Symposium on Spatial and Temporal Databases (SSTD), Munich, Germany, August 21-23, 2013
Thesis
- Shuyao Qi:
Advanced Ranking Queries on Composite Data
Doctorate thesis, University of Hong Kong, China PR, 2016