Abstract :
Effectively processing range queries for big, distributed data sets remains a perennial problem throughout today’s data systems, when queries are multifeatured in particular. Traditional indexes, such as B-Trees, KD-Trees, and R-Trees, often don’t work well in distributed or high-dimensional environments due to scalability and integrative limitations. This research paper proposes AVL Tree based Multi-Feature Query Engine framework for optimal multi-feature range queries. Maintaining logarithmic time per query when coupled with collection size, AVL trees represent a highly efficient indexing model, most prominently when resultant sets are modest in size. It’s deployed in Python and describes building local AVL trees on partitioned data and mapping it to a distributed environment utilizing MapReduce’s Hadoop implementation. It suggests a highly efficient filtering of big collections by performing range criteria in a staged mode across features, which severely reduces execution time when compared to linear scans. It describes the usability of balancing trees within distributed querying systems and spans scalability between in-memory indexes and big data systems.