Abstract : Executing Join-Aggregate queries on Big Data can incur enormous computational costs. Hence, the Approximate Aggregate Query Processing Techniques (AQPTs) are an attractive choice to execute such join-aggregate queries, because they incur limited computational costs. The AQPTs utilize random sampling to approximately execute a given join aggregate query. However, the effectiveness of random sampling is highly correlated with the number of qualifying tuples of the given query. If the given query has limited number of qualifying tuples, the approximate answer obtained by AQPT can suffer from poor approximation accuracy. Hence, it is extremely necessary to analyze the feasibility of a given join-aggregate query to be executed by AQPT. In the literature, such feasibility framework has not received any attention. Hence, in this paper, a feasibility framework for analyzing the feasibility of a given join-aggregate query to be executed by AQPT, is presented. This proposed feasibility framework is designed through the aid of a probabilistic model. An empirical analysis study of the proposed feasibility framework is presented. In this study, the proposed feasibility framework demonstrates appreciable performance in-terms of prediction accuracy and execution latency. The proposed feasibility framework presents a significant advancement in the field of Approximate Aggregate Query Processing AI Techniques. Its effective utilization of a probabilistic model for feasibility analysis provides a reliable and efficient solution for executing join-aggregate queries, paving the way for more optimized and cost-effective Big Data processing.. Full article
Abstract : Manzala Lake is one of the largest natural resources in Egypt. The research aims to assess the lake region's ecosystem changes by identifying the dynamic change pattern and interactions between three main factors; urban development, lake ecological system and ecosystem services' value. The objectives were achieved by propose a spatiotemporal assessment tools using the earth observation and the computing clouds technologies advantages of Google Earth Engine for historical data extraction and its power for spatial image processing analysis. The assessment tools showed a declination of the lake and agricultural land areas with 3.8% and 2.5% erosion rates as well as declination in the water quality index, meanwhile the fish farming and urban areas has incrementally increased. The analysis also detects the change pattern of the lake environmental services of fish productivity and fishermen's income. The lake encroachment was statistically significance associated by 74% of fish farms areas and by 17% of agricultural land. These findings prove the advantages of the spatiotemporal assessment tools to monitor and study the dynamic changes and interactions within the lake’s ecosystem identifying the crucial lake region factors and the suitable environmental protection actions to preserve the balance between urban development and the quality of ecosystem.. Full article
Abstract : The integrity of the healthcare system is compromised by collusive fraud, where multiple individuals conspire to deceive health insurance providers. Despite the severity of this issue, current statistical and machine learning-based approaches struggle to identify fraudulent activities in health insurance claims, largely due to the similarity between fraudulent behaviors and legitimate medical visits, as well as the scarcity of labeled data. To enhance the accuracy of fraud detection, it is essential to incorporate domain expertise into the detection process. Through collaboration with health insurance audit experts, we have developed Fraud Auditor, a novel three-stage visual analytics framework designed to detect collusive fraud in health insurance. The approach begins with an interactive module that enables users to construct a co-visit network, providing a comprehensive representation of the relationships between patient visits. Next, we developed an enhanced community detection developed algorithm that incorporates fraud likelihood scores to identify high-risk clusters of suspicious activity. This is followed by a visual analytics interface that enables users to examine, compare, and validate suspicious patient behavior through customizable visualizations that accommodate various time granularities. To evaluate the efficacy of the current approach, we conducted a real-world case study in a healthcare setting, aimed at distinguishing actual fraudulent groups from false positives. The results, corroborated by expert feedback, demonstrate the effectiveness and usability of our methodology in supporting fraud detection and mitigation efforts.. Full article