Tracking Concept Evolution : A Graph-Based Approach to Detecting Drift in Text Data

Miss.Vaishnavi H. Jingar; Miss. Nikita R. Pa l; Miss. Kalpana K. Sable; Miss.M.A.Pa

Author:

Miss.Vaishnavi H. Jingar, Miss. Nikita R. Pa l, Miss. Kalpana K. Sable, Miss.M.A.Pa

Published in

Journal of Science Technology and Research

( Volume 6, Issue 1 )

Page No: 1 - 7

Volume 6, Issue 1

Article Type: Research article

Published Date: 06/2025

Published by: Journal star

Abstract

Concept drift detection is crucial for maintaining the reliability of machine learning models, especially in dynamic environments where data distributions evolve over time. Traditional approaches to concept drift detection primarily rely on statistical methods or supervised learning techniques that often require labeled data. In this work, we propose a novel method for detecting concept drift from textual data using graph-based resolution techniques.Our approach models text streams as dynamic graphs, where nodes represent terms or concepts, and edges capture semantic relationships based on co-occurrence patterns. By monitoring changes in the graph structure, such as variations in node centrality, edge density, and community structure, we can identify shifts in underlying data distributions indicative of concept drift. We employ graph resolution metrics to quantify these structural changes and establish a drift detection mechanism without requiring labeled data.The proposed method is evaluated on multiple textual datasets, including news articles and social media streams, where evolving topics and terminology shifts frequently occur. Experimental results demonstrate that our graph-based approach effectively detects both sudden and gradual drifts, outperforming conventional statistical methods in adaptability and sensitivity to semantic evolution. Additionally, it enables interpretable drift explanations by identifying the specific terms and relationships contributing to the change.This study highlights the potential of graph resolution techniques for unsupervised concept drift detection in textual data, offering a scalable and interpretable solution for realworld applications such as sentiment analysis, topic modeling, and misinformation detection.

Keywords

Concept Drift Detection , Text Mining , Graph Resolution ,Machine Learning, Natural Language Processing (NLP) , Data Streams

References

[ 1] A. Suprem and C. Pu, ”ASSED: a Framework for Identifying Physical Events Through
Adaptive Social Sensor Data Filtering,” in Proc. 13th ACM Int. Conf. Distributed and Event-based Systems, Darmstadt Germany, June 24 – 28, 2019, pp. 115–126. doi:
1 0.1145/3328905.3329510.
[ 2] A. Bifet, R. Gavalda, G. Holmes, and B. Pfahringer, Machine Learning for Data Streams: with Practical
Examples in MOA. MIT Press, 2018. doi: 10.7551/mitpress/10654.001.0001.
[ 3] K. Rahmani, R. Thapa, P. Tsou, S. C. Chetti, G. Barners, C. Lam, and C. F. Tso, ”Assessing the effects of data
drift on the performance of machine learning models used in clinical sepsis
prediction,” Int. J. Med. Info., Volume 173, May 2023, 104930. doi:
1 0.1016/j.ijmedinf.2022.104930.
[ 4] A. Bechini, A. Bondielli, P. Ducange, F. Marcelloni, and A. Renda, ”Addressing Event- Driven Concept
Drift in Twitter Stream: A Stance Detection Application,” IEEE Access, vol. 9, pp.
7 7758-77770, 2021. doi: 10.1109/ACCESS.2021.3083578.
[5] A. D. Vinisky, J. P. Barddal, A. de S. Britto Jr., F. Enembreck, and H. V. A. Campos, ”A case study of batch and
incremental recommender systems in supermarket data under concept
drifts and cold start,” Expert Syst. Appl., Volume 176, February 2021, 114890. doi:
1 0.1016/j.eswa.2021.114890.
[ 6] R. Feldhans, A. Wilke, S. Heindorf, and M. H. Shaker, ”Drift Detection in Text Data with Document
Embeddings,” in Proc. 22nd Int’l Conf. Intelligent Data Engineering and Automated Learning – IDEAL 2021,
Manchester, UK, November 25–27, 2021, pp. 107-118. doi: 1 0.1007/978-3-030-91608- 4_11.
[7] I. Frias-Blanco, J. del Campo-Avila, G. Ramos-Jimenez, R. Morales-Bueno, A. OrtizDiaz, and Y. CaballeroMota, ”Online and non-parametric drift detection methods based on
Hoeffding’s bounds,” IEEE Trans. Knowl. Data Eng., vol. 27, no. 3, pp. 810–823, Mar. 2015. doi: 1
0.1109/TKDE.2014.2345382.
[8] J. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, and A. Bouchachia, ”A survey on concept drift adaptation,”
ACM Comput. Surv., vol. 46, no. 4, Article 44, pp.1-37, 2014. doi:
1 0.1145/2523813.
[ 9] J. Gama, P. Medas, G. Castillo, and P. Rodrigues, ”Learning with drift detection,” in Proc. 1 7th Brasilian
Symposium on AI – SBIA 2004, Sao Luis, Maranhao, Brasil, September 29October 1 , 2004. A. L. C. Bazzan
and S. Labidi, Eds. Heidelberg: Springer, 2004, pp. 286–295. doi:
1 0.1007/978-3-540-28645- 5_29.
[10] M. Baena-Garcia, J. del Campo-Avila, R. Fidalgo, A. Bifet, R. Gavalda, and R. Morales- Bueno, ”Early
drift detection method,” in Proc. 4th Int. Workshop Knowl. Discovery from Data Streams, Berlin, Germany,
September 18, 2006, pp. 77–86.

Tracking Concept Evolution : A Graph

LITERATURE SURVEY
Concept drift refers to the phenomenon where the statistical properties of data change over time, affecting
the performance of machine learning models. Traditional concept drift detection methods rely on featurebased or distributional comparisons. Recently, graph-based methods have gained attention due to their
ability to model relationships among textual elements dynamically. This survey explores studies on concept
drift detection in textual data using graph resolution techniques. 2 .1. Concept Drift in Text Mining
Concept drift in text mining occurs when the meaning, usage, or distribution of words changes over time,
affecting classification and clustering models. Kifer et al. (2004) defined concept drift as changes in the
posterior distribution of classes given input features over time. In text-based applications, such drifts arise
due to language evolution, emerging trends, or shifts in topic prevalence (Gama et al., 2014).
Traditional methods for drift detection include statistical hypothesis testing (Dries & Rückert,
2 009), incremental learning models (Widmer & Kubat, 1996), and ensemble learning techniques (Losing
et al., 2018). However, these methods struggle with capturing structural and relational changes in textual
data.

Introduction

2 .2. Graph-Based Approaches for Concept Drift Detection
Graphs offer a natural way to model relationships in textual data, where nodes represent words, phrases, or
document entities, and edges capture co-occurrence patterns. Graph resolution techniques involve
clustering, partitioning, and anomaly detection within evolving graphs to identify concept drift. Tracking Concept Evolution : A Graph.
2 .3. Temporal Graph Models
Dynamic graph models help track changes in textual patterns over time. Sun et al. (2007) proposed an
evolving graph-based model for text stream analysis, capturing structural changes in word associations.
Similarly, Aggarwal & Wang (2003) used graph-based clustering to detect topic drifts in document streams.
Community Detection for Concept DriftCommunity detection in graphs helps identify shifts in topic
structures. Rossetti & Cazabet (2018) reviewed dynamic community detection algorithms, highlighting
their use in monitoring evolving text patterns. Akoglu & Faloutsos (2010) introduced anomaly detection in
dynamic graphs, identifying sudden changes in textual data representation.
2 .4. Graph Embedding Methods
Graph embedding techniques, such as node2vec (Grover & Leskovec, 2016) and DeepWalk (Perozzi et al.,
2014), allow representation learning for concept drift detection. Chen et al. (2019) proposed using graph
embeddings to detect evolving word semantics, improving drift detection accuracy.
2 .5. Comparative Analysis and Future Directions