You can download program in pdf
Day 1: Monday, Dec 9, 2024 |
Sessions/Forums |
Room |
8:00 - 8:30 |
Welcome coffee break |
|
8:30 – 14:00 |
Causal Representation Learning (CRL) |
CS 3 |
8:30 – 16:00 |
Sentiment Elicitation from Natural Text for Information Retrieval and Extraction (SENTIRE) |
CS 4 |
8:30 –11:00 |
Responsible AI to Increase Clinical Decision Trust: Explainability & Reliability of Machine Learning Models (TRUST) |
CS 5 |
8:30 – 16:00 |
International Workshop on Data Mining for Service (DMS2024) |
CS 6 |
8:30 – 11:00 |
Workshop on AI for Financial Crime Fight (AI4FCF) |
CS 7 |
8:30 – 11:00 |
International Workshop on Spatial and Spatio-Temporal Data Mining (SSTDM) |
CS 8 |
8:30 – 11:00 |
The 2024 Workshop on Optimization Based Techniques for Emerging Data Mining Problems (OEDM) |
CS 9 |
11:00 – 11:30 |
Coffee Break |
|
11:30 – 16:00 |
Incremental Classification and Clustering, Concept Drift, Novelty Detection, Active Learning in Big/Fast data Context (IncrLearn) |
CS 5 |
11:30 – 16:00 |
International Workshop on Data Mining in Finance (DMF) |
CS 7 |
11:30 – 13:00 |
Workshop on Information Seeking with Big Models (BigIS) |
CS 8 |
11:30 – 13:00 |
Deep Learning and Clustering (DLC) |
CS 9 |
13:00 – 14:00 |
Lunch |
|
14:00 – 16:00 |
Workshop on Emerging Trends in Deep Learning for Healthcare (ETDLH) |
CS 3 |
14:00 – 16:00 |
The 11th ICDM Workshop on High Dimensional Data Mining (HDM) |
CS 8 |
14:00 – 16:00 |
Data Mining in Biomedical Informatics and Healthcare (DMBIH) |
CS 9 |
16:00 – 16:30 |
Coffee Break |
|
16:30 – 19:00 |
Demo Track |
CS 5 |
16:30 – 19:00 |
International Workshop on Multimodal Content Analysis for Social Good (MM4SG) |
CS 3 |
16:30 – 19:00 |
International Workshop on Data-Centric AI (DCAI) |
CS 4 |
16:30 – 19:00 |
2nd International Workshop on Adaptable, Reliable, and Responsible Learning (ARRL) |
CS 6 |
16:30 – 18:00 |
Advances in AI-Driven Data Mining for Autonomous Systems (AIDM-AS) |
CS 7 |
16:30 – 18:00 |
Machine Learning for Cybersecurity (MLC) |
CS 8 |
16:30 – 18:00 |
International Workshop on AI for Nudging and Personalization (WAIN) |
CS 9 |
18:00 – 19:00 |
The 2nd International Workshop on User Understanding from Big Data Workshop (DMU2) |
CS 7 |
18:00 – 19:00 |
Neverending Machine Learning (NML) |
CS 8 |
18:00 – 19:00 |
Evolutionary Data Mining and Machine Learning Workshop (EDMML) |
CS 9 |
18:00 – 20:00 |
Steering Committee Meeting with ICDM 2024 & 2025 Main Organizers (By invitation only) |
TBA |
Conference Agenda
Day 2: Tuesday, Dec 10, 2024 |
08:00 – 08:45 |
Welcome coffee break |
|||||
08:45 – 09:00 |
Opening and Welcome (Eric Xing, Conference Chairs, PC Chairs, Local Chairs) |
||||||
09:00 – 10:00 |
Keynote 1: Preslav Nakov Towards Safe, Truly Open, and Factual Large Language Models Room CHB |
||||||
10:00 – 17:30 |
Tutorial 1 (5 hours in total) Causality and Large Models
Room CHB |
10:00 – 10:30 |
Coffee break |
||||
10:30 – 12:00 |
Session A1-1 |
Session A2-1 |
Session A3-1 |
Session A5-1 |
|||
Room |
Room |
Room |
Room Hive |
||||
12:00 – 13:30 |
Lunch |
||||||
13:30 – 15:00 |
Session A4-1 |
Session A5-2 |
Session A6-1 |
Session A3-2 |
|||
Room |
Room |
Room |
Room Hive |
||||
15:00 – 15:30 |
Coffee break |
||||||
15:30 – 17:00 |
Session A1-2 |
Session A2-2 |
Session A3-3 |
Session A6-2 |
|||
Room |
Room |
Room |
Room Hive |
||||
18:00 – 20:00 |
Welcome Reception (Location: ADNEC) |
Day 3: Wednesday, Dec 11, 2024 |
08:00 – 09:00 |
Welcome coffee break |
||||
09:00 – 10:00 |
Session A1-3 |
Session A3-4 |
Session A5-3 |
Session A6-3 |
||
Room |
Room |
Room CS 9 |
Room Hive |
|||
10:00 – 10:30 |
Coffee break |
|||||
10:30 – 12:00 |
Women Forum Room Hive |
10:30 – 12:00 |
Session A5-4 |
Session A6-4 |
Session A1-4 |
Session A2-3 |
Room |
Room |
Room CS 9 |
Room CHB |
|||
12:00 – 13:30 |
Lunch |
|||||
13:30 – 14:30 |
Keynote 2: Bernhard Schölkopf Towards causal world models and digital twins
Room CHB |
|||||
14:30 – 18:00 |
Organised desert trip |
|||||
18:30 – 20:30 |
Banquet and Award Ceremony (Location: Desert) |
Day 4: Thursday, Dec 12, 2024 |
08:00 – 09:00 |
Welcome coffee break |
||||
09:00 – 10:00 |
Keynote 3: Claudia Plant Clustering: Balancing Abstraction and Representation
Room CHB |
|||||
10:00 – 12:00 |
Tutorial 2 Hypergraph Neural Networks: An In-Depth and Step-by-Step Guide
Room CHB |
10:00 – 10:30 |
Coffee break |
|||
10:30 – 12:00 |
Session A3-5 |
Session A5-5 |
Session A6-5 |
Session A1-5 |
||
Room CS 5 |
Room CS 7 |
Room CS 9 |
Room Hive |
|||
12:00 – 13:30 |
Lunch |
|||||
13:30 – 15:30 |
Tutorial 3 Uncertain Boundaries: Multidisciplinary Approaches to Copyright Issues in Generative AI
Room CHB |
13:30 – 15:00 |
Session A1-6 |
Session A2-5 |
Session A3-6 |
Session A2-4 |
Room CS 5 |
Room CS 7 |
Room CS 9 |
Room Hive |
|||
15:00 – 15:30 |
Coffee break |
|||||
15:30 – 17:30 |
Panel discussion: TBA Room Room CHB |
|||||
17:30 |
Conference concluding remarks |
Conference paper presentations
Keynote Lecture: 60 minutes (about 45 minutes for talk and 15 minutes for Q and A)
Main conference regular paper (R): 20 minutes (about 15 minutes for talk and 5 minutes for Q and A)
Main conference short paper (S): 15 minutes (about 10 minutes for talk and 5 minutes for Q and A)
Day 2: December 10, 2024
Session A1-1 Foundations, algorithms, models, and theory of data mining
Room CS 5, 10:30-11:40
Session Chair: Xingquan (Hill) Zhu, Florida Atlantic University, xzhu3@fau.edu
10:30 |
DM306 |
Efficient Network Embedding by Approximate Equitable Partitions |
R |
Giuseppe Squillace, Mirco Tribastone, Max Tschaikowski, and Andrea Vandin |
|||
10:50 |
DM319 |
ADOD: Adaptive Density Outlier Detection |
R |
Li Qian, Jing Qian, Xin Sun, Wengang Guo, and Christian Böhm |
|||
11:10 |
DM227 |
Matrix Profile for Anomaly Detection on Multidimensional Time Series |
S |
Chin-Chia Michael Yeh, Audrey Der, Uday Singh Saini, Vivian Lai, Yan Zheng, Junpeng Wang, Xin Dai, Zhongfang Zhuang, Yujie Fan, Huiyuan Chen, Prince Aboagye, Liang Wang, Wei Zhang, and Eamonn Keogh |
|||
11:25 |
DM271 |
CL4CO: A Curriculum Training Framework for Graph-based Neural Combinatorial Optimization |
S |
Yang Liu, Chuan Zhou, Peng Zhang, Zhao Li, Shuai Zhang, Xixun Lin, and Xindong Wu |
Session A2-1 Deep learning and statistical methods for data mining
Room CS 7, 10:30-11:45
Session Chair: Flavio Giobergia, Politecnico di Torino, flavio.giobergia@polito.it
10:30 |
DM211 |
Generating Realistic Tabular Data with Large Language Model |
R |
Dang Nguyen, Sunil Gupta, Kien Do, Thin Nguyen, and Svetha Venkatesh
|
|||
10:50 |
DM245 |
HyperTime: A Dynamic Hypergraph Approach for Time Series Classification
|
R |
Raneen Younis and Zahra Ahmadi |
|||
11:10 |
DM301 |
Improving Time Series Encoding with Noise-Aware Self-Supervised Learning and an Efficient Encoder |
R |
Duy Nguyen Anh, Trang Tran, Hieu Pham Huy, Le Nguyen Phi, and Lam Nguyen Minh |
|||
11:30 |
DM295 |
QUCE: The Minimisation and Quantification of Path-Based Uncertainty for Generative Counterfactual Explanations |
S |
Jamie Duell, Monika Seisenberger, Hsuan Fu, and Xiuyi Fan |
Session A3-1 Mining from heterogeneous data sources
Room CS 9, 10:30-11:40
Session Chair: Yue He, Tsinghua University, heyuethu@mail.tsinghua.edu.cn
10:30 |
DM216
|
Graph Community Augmentation with GMM-based Modeling in Latent Space |
R |
Shintaro Fukushima and Kenji Yamanishi
|
|||
10:50 |
DM233
|
Solving Combinatorial Optimization Problem over Graph through QUBO Transformation and Deep Reinforcement Learning
|
R |
Tianle Pu, Chao Chen, Li Zeng, Shixuan Liu, Rui Sun, and Changjun Fan |
|||
11:10 |
DM384 |
Exploratory Combinatorial Optimization Problem Solving via Gauge Transformation |
S |
Tianle Pu, Changjun Fan, Mutian Shen, Yizhou Lu, Li Zeng, Zohar Nussinov, Chao Chen, and Zhong Liu |
|||
11:25 |
DM259
|
2DXformer: Dual Transformers for Wind Power Forecasting with Dual Exogenous Variables |
S |
Yajuan Zhang, Jiahai Jiang, Yule Yan, liang Yang, and ping zhang |
Session A5-1 Data mining for modelling, visualization, personalization, and recommendation
Room Hive, 10:30-11:40
Session Chair: Di Wu, Sun Yat-Sen University, China, wudi27@mail.sysu.edu.cn
10:30 |
DM410
|
Contrastive Learning for Adapting Language Model to Sequential Recommendation
|
R |
Fei-Yao Liang, Wu-Dong Xi, Xing-Xing Xing, Wei Wan, Chang-Dong Wang, Min Chen, and Mohsen Guizani
|
|||
10:50 |
DM419
|
Cross-Store Next-Basket Recommendation |
R |
Liangchen Ma, Ya Li, Zifeng Mai, Feiyao Liang, Chang-Dong Wang, Min Chen, and Mohsen Guizani |
|||
11:10 |
DM517
|
DifFaiRec: Generative Fair Recommender with Conditional Diffusion Model |
S |
Zhenhao Jiang and Jicong Fan |
|||
11:25 |
DM663
|
A Parameter Update Balancing Algorithm for Multi-task Ranking Models in Recommendation Systems |
S |
Jun Yuan, Guohao Cai, and Zhenhua Dong |
Session A4-1 Data mining systems and platforms
Room CS 5, 13:30-15:00
Session Chair: Juan Garcia, Universidad de Guayaquil, juan.garciap1@ug.edu.ec
13:30 |
DM366
|
Designing an attack-defense game: how to increase the robustness of financial transaction models via a competition |
R |
Alexey Zaytsev, Alex Natekin, Evgeni Vorsin, Valerii Smirnov, Georgii Smirnov, Oleg Sidorshin, Alexander Senin, Alexander Dudin, Maria Kovaleva, and Dmitry Berestnev |
|||
13:50 |
DM409
|
Scaling Disk Failure Prediction via Multi-Source Stream Mining |
R |
Shujie Han, Zirui Ou, Qun Huang, and Patrick P. C. Lee |
|||
14:10 |
DM455
|
APOLLO: Differential Private Online Multi-Sensor Data Prediction with Certified Performance |
R |
Honghui Xu, Wei Li, Shaoen Wu, Liang Zhao, and Zhipeng Cai |
|||
14:30 |
DM559
|
FGLBA: Enabling Highly-Effective and Stealthy Backdoor Attack on Federated Graph Learning |
S |
Qing Lu, Miao Hu, Di Wu, Yipeng Zhou, Mohsen Guizani, and Quan Z. Sheng |
|||
14:45 |
DM583 |
Enhancing Entity Alignment on Probabilistic Knowledge Graphs |
S |
Yunfei Li, Lu Chen, Chengfei Liu, Rui Zhou, and Jianxin Li |
Session A5-2 Data mining for modelling, visualization, personalization, and recommendation
Room CS 7, 13:30-15:00
Session Chair: Yejing Wang, City University of Hong Kong, yejing.wang@my.cityu.edu.hk
13:30 |
DM277
|
Transitivity-Encoded Graph Attention Networks for Complementary Item Recommendations |
R |
Jin Shang, Yang Jiao, Chenghuan Guo, Minghao Sun, Yan Gao, Jia Liu, Michinari Momma, Itetsu Taru, and Yi Sun |
|||
13:50 |
DM288
|
SR-PredictAO: Session-based Recommendation with High-Capability Predictor Add-On |
R |
Ruida WANG, Raymond Chi-Wing Wong, and Weile TAN |
|||
14:10 |
DM331
|
Enhancing Embeddings Quality with Stacked Gate for Click-Through Rate Prediction |
R |
Caihong Mu, Yunfei Fang, Jialiang Zhou, and Yi Liu |
|||
14:30 |
DM241
|
Hi-Gen: Generative Retrieval For Large-Scale Personalized E-commerce Search |
S |
YanjingWu Wu, Yinfu Feng, Jian Wang, Wenji Zhou, Yunan Ye, Rong Xiao, and Jun Xiao |
|||
14:45 |
DM343
|
Exploitation or Exploration Next? User Behavior Decoupling and Emerging Intent Modeling for Next-Item Recommendation |
S |
Nengjun Zhu, Lingdan Sun, Xiangfeng Luo, Jian Cao, Qi Zhang, and Xinjiang Lu |
Session A6-1 Applications of data mining
Room CS 9, 13:30-15:00
Session Chair: Meikang Qiu, Augusta University, qiumeikang@ieee.org
13:30 |
DM254
|
Towards Efficient Ridesharing via Order-Vehicle Pre-Matching Using Attention Mechanism |
R |
Zhidan Liu, Jinye Lin, Zhiyu Xia, Chao Chen, and Kaishun Wu |
|||
13:50 |
DM270
|
DFDG: Data-Free Dual-Generator Adversarial Distillation for One-Shot Federated Learning |
R |
Kangyang Luo, Shuai Wang, Yexuan Fu, Renrong Shao, Xiang Li, Yunshi Lan, Ming Gao, and Jinlong Shu |
|||
14:10 |
DM359
|
Debunking Fake News in Online Social Networks without Text Analysis |
R |
Xing Su, Jian Yang, Jia Wu, and Zitai Qiu |
|||
14:30 |
DM266
|
Goal-guided Generative Prompt Injection Attack on Large Language Models |
S |
Chong Zhang, Mingyu Jin, Qinkai Yu, Chengzhi Liu, Haochen Xue, and Xiaobo Jin |
|||
14:45 |
DM385
|
SplitSEE: A Splittable Self-supervised Framework for Single-channel EEG Representation Learning |
S |
Rikuto Kotoge, Zheng Chen, Tasuku Kimura, Yasuko Matsubara, Takufumi Yanagisawa, Haruhiko Kishima, and Yasushi Sakurai |
Session A3-2 Mining from heterogeneous data sources
Room Hive, 13:30-15:00
Session Chair: Djellel Difallah, NYU Abu Dhabi, djellel@nyu.edu
13:30 |
DM388
|
ELiCiT: Effective and Lightweight Lossy Compression of Tensors |
R |
Jihoon Ko, Taehyung Kwon, Jinhong Jung, and Kijung Shin
|
|||
13:50 |
DM393
|
LISA: Learning-Integrated Space Partitioning Framework for Traffic Accident Forecasting on Heterogeneous Spatiotemporal Data |
R |
Bang An, Xun Zhou, Amin Khezerlou, Nick Street, Jinping Guan, and Jun Luo |
|||
14:10 |
DM430
|
Emotional Synchronization for Audio-Driven Talking-Head Generation |
R |
Zhao Zhang, Yan Luo, Zhichao Zuo, Richang Hong, Yi Yang, and Meng Wang |
|||
14:30 |
DM605
|
SemiFDA: Domain Adaptation in Semi-Supervised Federated Learning |
S |
Michele Craighero, Giorgio Rossi, Beatrice Rossi, Diego Carrera, Diego Stucchi, Pasqualina Fragneto, and Giacomo Boracchi |
|||
14:45 |
DM649
|
Controllable Visit Trajectory Generation with Spatiotemporal Constraints |
S |
Haowen Lin, John Krumm, Cyrus Shahabi, and Li Xiong |
Session A1-2 Foundations, algorithms, models, and theory of data mining
Room CS 5, 15:30-17:00
Session Chair: Yuewen Sun, MBZUAI, yuewen.sun@mbzuai.ac.ae
15:30 |
DM378
|
Probabilistic Matrix Factorization-based Three-stage Label Completion for Crowdsourcing |
R |
Boyi Yang, Liangxiao Jiang, and Wenjun Zhang |
|||
15:50 |
DM413 |
HomoMGC: Homophily-enhanced Adaptive Graph Refinement for Multi-view Graph Clustering |
R |
Man-Sheng Chen, Xiao-Sha Cai, Chang-Dong Wang, Dong Huang, Min Chen, and Mohsen Guizani |
|||
16:10 |
DM442 |
GADIN: Generative Adversarial Denoise Imputation Network for Incomplete Data |
R |
Dong Li, Zhicong Liu, Mingfeng Hu, Baoyan Song, and Xiaohuan Shan |
|||
16:30 |
DM325
|
Generalized Sparse Additive Model with Unknown Link Function |
S |
Peipei Yuan, Xinge You, Hong Chen, Xuelin Zhang, and Qinmu Peng
|
|||
16:45 |
DM462
|
Towards Expressive Graph Representations for Graph Neural Networks |
S |
Chengsheng Mao, Liang Yao, and Yuan Luo |
Session A2-2 Deep learning and statistical methods for data mining
Room CS 7, 15:30-17:00
Session Chair: Evgenii Tsymbalov, Amazon, etsymbalov@gmail.com
15:30 |
DM323
|
Graph Contrastive Learning with Adversarial Structure Refinement (GCL-ASR) |
R |
Jiangwen Chen, Kou Guang, Qiyang Li, and Tan Hao
|
|||
15:50 |
DM412
|
GQ*: Towards Generalizable Deep Q-Learning for Steiner Tree in Graphs |
R |
Wei Huang, Hanchen Wang, Dong Wen, Xuefeng Chen, Wenjie zhang, and Ying Zhang |
|||
16:10 |
DM315 |
Hierarchical Explanations for Text Classification Models: Fast and Effective |
R |
Zhenyu Nie, Zheng Xiao, Huizhang Luo, Xuan Liu, and Anthony Theodore Chronopoulos |
|||
16:30 |
DM449
|
Channel-Attentive Graph Neural Networks |
S |
Tuğrul Hasan Karabulut and İnci M. Baytaş
|
|||
16:45 |
DM580
|
Cascading Multimodal Feature Enhanced Contrast Learning for Music Recommendation |
S |
Qimeng Yang, Shijia Wang, Da Guo, Dongjin Yu, Qiang Xiao, Dongjing Wang, and Chuanjiang Luo |
Session A3-3 Mining from heterogeneous data sources
Room CS 9, 15:30-17:00
Session Chair: Guangyi Chen, MBZUAI, Guangyi.Chen@mbzuai.ac.ae
15:30 |
DM327
|
Adaptive Loss-ware Modulation for Multimedia Retrieval |
R |
Jian Zhu, Yu Cui, Zeyi Sun, Yuyang Dai, Xi Wang, Lei Liu, Cheng Luo, and Li-Rong Dai
|
|||
15:50 |
DM337
|
Towards Cross-domain Few-shot Graph Anomaly Detection |
R |
Jiazhen Chen, Sichao Fu, Zhibin Zhang, Zheng Ma, Mingbin Feng, Tony Wirjanto, and Qinmu Peng |
|||
16:10 |
DM383
|
Informative Subgraphs Aware Masked Auto-Encoder in Dynamic Graphs |
R |
Pengfei Jiao, Xinxun Zhang, Mengzhou Gao, and Tianpeng Li |
|||
16:30 |
DM320
|
A Momentum Contrastive Learning Framework for Query-POI Matching |
S |
Yuting Qiang, Jianbin Zheng, Lixia Wu, Haomin Wen, Junhong Lou, and Minhui Deng
|
|||
16:45 |
DM371
|
Multi-modal Sarcasm Detection via Dual Synergetic Perception Graph Convolutional Networks |
S |
Xingjie Zhuang and Zhixin Li |
Session A6-2 Applications of data mining
Room Hive, 15:30-17:00
Session Chair: Xun Zhou, Harbin Institute of Technology, Shenzhen zhouxun2023@hit.edu.cn
15:30 |
DM690
|
Dual Cross-Stage Partial Learning for Enhanced Object Detection in Dehazed Images |
R |
Jinbiao Zhao, Zhao Zhang, Jiahuan Ren, Haijun Zhang, Zhongqiu Zhao, and Meng Wang |
|||
15:50 |
DM697
|
Resource2Box: Learning To Rank Resources in Distributed Search Using Box Embedding |
R |
Ulugbek Ergashev, Geon Lee, Kijung Shin, Eduard Dragut, and Weiyi Meng
|
|||
16:10 |
DM709
|
ChronoCTI: Mining Knowledge Graph of Temporal Relations among Cyberattack Actions |
R |
Md Rayhanur Rahman, Brandon Wroblewski, Quinn Matthews, Brantley Morgan, Timothy Menzies, and Laurie Williams |
|||
16:30 |
DM749
|
Addressing Delayed Feedback in Conversion Rate Prediction: A Domain Adaptation Approach |
S |
Leisheng Yu, Yanxiao Cai, Lucas Chen, Minxing Zhang, Wei-Yen Day, Li Li, Rui Chen, Soo-Hyun Choi, and Xia Hu |
|||
16:45 |
DM753
|
Hypergraph-Enhanced Contrastively Regularized Transformer for Multi-Behavior E-commerce Product Recommendation |
S |
Shuiying Liao and P. Y. Mok |
Day 3: December 11, 2024
Session A1-3 Foundations, algorithms, models, and theory of data mining
Room CS 5, 09:00-10:00
Session Chair: Mengyue Yang, Bristol University, mengyue.yang.20@ucl.ac.uk
09:00 |
DM363 |
Scalable Order-Preserving Pattern Mining |
R |
Ling Li, Wiktor Zuba, Grigorios Loukides, Solon Pissis, and Maria Matsangidou |
|||
09:20 |
DM546
|
Efficiently Manipulating Structural Graph Clustering Under Jaccard Similarity |
R |
Chuanyu Zong, Rui Fang, Meng-xiang Wang, Tao Qiu, and Anzhen Zhang |
|||
09:40 |
DM617
|
IIFE: Interaction Information Based Automated Feature Engineering |
S |
Tom Overman, Diego Klabjan, and Jean Utke |
Session A3-4 Mining from heterogeneous data sources
Room CS 7, 09:00-10:00
Session Chair: Haoxuan Li, Peking University, hxli@stu.pku.edu.cn
09:00 |
DM322
|
Adaptive Graph Neural Networks for Cold-start Multimedia Recommendation |
R |
Zhen Li, Jibin Wang, Zhuo Chen, Kun Wu, Yuanzhen Wei, and Hai Huang |
|||
09:20 |
DM482
|
EEiF: Efficient Isolated Forest with e Branches for Anomaly Detection |
R |
Yifan Zhang, Haolong Xiang, Xuyun Zhang, Xiaolong Xu, Wei Fan, Qin Zhang, and Lianyong Qi |
|||
09:40 |
DM223
|
MetaSTC: A Meta Spatio-Temporal Learning Paradigm for Traffic Flow Prediction |
S |
Kexin Xu, Zhemeng Yu, Yucen Gao, Songjian Zhang, Jun Fang, Xiaofeng Gao, and Guihai Chen |
Session A5-3 Data mining for modelling, visualization, personalization, and recommendation
Room CS 9, 09:00-10:00
Session Chair: Przemyslaw Kazienko, Wroclaw Tech, kazienko@pwr.edu.pl
09:00 |
DM438
|
Early Fire Detection based on Local Morphological Knowledge Matching |
R |
Xinzhi Wang, Mengyue Li, Nengjun Zhu, Jiayan Qian, and Zhanyi Zheng
|
|||
09:20 |
DM402
|
RecCoder: Reformulating Sequential Recommendation as Large Language Model-Based Code Completion |
R |
Kai-Huang Lai, Wudong Xi, Xingxing Xing, Wei Wan, Chang-Dong Wang, Min Chen, and Mohsen Guizani |
|||
09:40 |
DM726
|
ExoTST: Exogenous-Aware Temporal Sequence Transformer for Time Series Prediction |
S |
Kshitij Tayal, Arvind Renganathan, Xiaowei Jia, Vipin Kumar, and Dan Lu
|
Session A6-3 Applications of data mining
Room Hive, 09:00-10:00
Session Chair: Maurizio Atzori, University of Cagliari , atzori@unica.it
09:00 |
DM655
|
Financial Risk Assessment via Long-term Payment Behavior Sequence Folding |
R |
Yiran Qiao, Yateng Tang, Xiang Ao, Qi Yuan, Ziming Liu, Chen Shen, and Xuehao Zheng |
|||
09:20 |
DM743
|
Adaptive Process-Guided Learning: An Application in Predicting Lake DO Concentrations |
R |
Runlong Yu, Chonghao Qiu, Robert Ladwig, Paul Hanson, Yiqun Xie, Yanhua Li, and Xiaowei Jia |
|||
09:40 |
DM334
|
Interdependency Matters: Graph Alignment for Multivariate Time Series Anomaly Detection |
S |
Yuanyi Wang, Haifeng Sun, Chengsen Wang, Mengde Zhu, Wei Tang, Jingyu Wang, Qi Qi, Zirui Zhuang, and Jianxin Liao |
Session A5-4 Data mining for modelling, visualization, personalization, and recommendation
Room CS 5, 10:30-11:35
Session Chair: Parham Moradi, RMIT University, parham.moradi@rmit.edu.au
10:30 |
DM367
|
Continuous Exact Explanations of Neural Networks
|
R |
Alice Dethise and Marco Canini
|
|||
10:50 |
DM414
|
Periodic Prompt on Dynamic Heterogeneous Graph for Next Basket Recommendation |
S |
Ru-Bin Li, Man-Sheng Chen, Xin-Yu Ding, Chang-Dong Wang, Sihong Xie, Shuangyin Liu, Min Chen, and Mohsen Guizani |
|||
11:05 |
DM454
|
A Condensed Transition Graph Framework for Zero-shot Link Prediction with Large Language Models |
S |
Mingchen Li, Chen Ling, rui Zhang, and Liang Zhao |
|||
11:20 |
DM495
|
Influence-aware Group Recommendation for Social Media Propagation |
S |
Chengkun He, Xiangmin Zhou, Chen Wang, Longbing Cao, Jie Shao, and Zahir Tari |
Session A6-4 Applications of data mining
Room CS 7, 10:30-11:40
Session Chair: Flavio Giobergia, Politecnico di Torino, flavio.giobergia@polito.it
10:30 |
DM373
|
Utilitarian Online Learning from Open-World Soft Sensing
|
R |
Heng Lian, Yu Huang, Xingquan Zhu, and Yi He |
|||
10:50 |
DM641
|
CounterFair: Group Counterfactuals for Bias Detection, Mitigation and Subgroup Identification |
R |
Alejandro Kuratomi, Zed Lee, Panayiotis Tsaparas, Guilherme Dinis Junior, Evaggelia Pitoura, Tony Lindgren, and Panagiotis Papapetrou |
|||
11:10 |
DM573
|
D-Cube : Exploiting Hyper-Features of Diffusion Model for Robust Medical Classification |
S |
Minhee Jang, Juheon Son, Thanaporn Viriyasaranon, Junho Kim, and Jang-hwan Choi |
|||
11:25 |
DM648
|
Survival Analysis with Multiple Noisy Labels |
S |
Donna Tjandra and Jenna Wiens |
Session A1-4 Foundations, algorithms, models, and theory of data mining
Room CS 9, 10:30-11:40
Session Chair: Mubarak Gwaza Abdu-Aguye, MBZUAI, Mubarak.Abdu-Aguye@mbzuai.ac.ae
10:30 |
DM488
|
Margin-bounded Confidence Scores for Out-of-Distribution Detection
|
R |
Lakpa Tamang, Mohamed Reda Bouadjenek, Richard Dazeley, and Sunil Aryal |
|||
10:50 |
DM515
|
Fast and Accurate Triangle Counting in Graph Streams Using Predictions |
R |
Cristian Boldrin and Fabio Vandin
|
|||
11:10 |
DM390
|
Accurate and Fast Estimation of Temporal Motifs using Path Sampling |
S |
Yunjie Pan, Omkar Bhalerao, C. Seshadhri, and Nishil Talati |
|||
11:25 |
DM326 |
SHADE: Deep Density-based Clustering |
S |
Anna Beer, Pascal Weber, Lukas Miklautz, Collin Leiber, Walid Durani, Christian Böhm, and Claudia Plant |
Session A2-3 Deep learning and statistical methods for data mining
Room CHB, 10:30-11:45
Session Chair: Evgenii Tsymbalov, Amazon, etsymbalov@gmail.com
10:30 |
DM461
|
Combining Self-Supervision and Privileged Information for Representation Learning from Tabular Data |
R |
Haoyu Yang, Gyorgy Simon, Michael Steinbach, Genevieve Melton, and Vipin Kumar
|
|||
10:50 |
DM510
|
Towards Dynamic University Course Timetabling Problem: An Automated Approach Augmented via Reinforcement Learning |
R |
Yanan Xiao, XiangLin Li, Lu Jiang, Pengfei Wang, Kaidi Wang, and Na Luo |
|||
11:10 |
DM591
|
HFGNN: Efficient Graph Neural Networks using Hub-Fringe Structures |
R |
Pak Lon Ip, Sheng Hui Zhang, Xue Kai Wei, Tsz Nam Chan, and Leong Hou U
|
|||
11:30 |
DM628
|
Unsupervised Domain Adaptation for Action Recognition via Self-Ensembling and Conditional Embedding Alignment |
S |
Indrajeet Ghosh, Garvit Chugh, Abu Zaher Md Faridee, and Nirmalya Roy |
Women forum
Day 3: December 11, 2024 Women Forum 10:30- 12:00 Room: Room Hive Co-chairs:Prof. Xiaochun Yang & Prof. Xiaofeng Gao |
|
10:30 |
Forum Opening |
Prof. Kun Zhang, Program Committee Co-Chair |
|
10:35 |
Warm-Up Speech (with Personal Experience Sharing) |
Prof. Xiaochun Yang & Prof. Xiaofeng Gao, Women Forum Co-Chairs |
|
10:50 |
Intelligent Knowledge Discovery—Explorations in Talent Analytics |
Dr. Ying Sun The Hong Kong University of Science and Technology, Guangzhou, China |
|
11:10 |
Breaking Barriers in Time Series Analysis |
Dr. Zahra Ahmadi Hannover Medical School, Germany |
|
11:30 |
Exploring Data Science: A Personal Journey |
Ms. Li Qian Ludwig-Maximilians-Universit¨at M¨unchen, Germany |
|
11:45 |
The Research on Machine Learning for Data Management |
Ms. Chaohong Ma Renmin University of China |
|
12:00 |
Closing Speech |
Prof. Elena Baralis, Program Committee Co-Chair |
Day 4: December 12, 2024
Session A3-5 Mining from heterogeneous data sources
Room CS 5, 10:30-11:40
Session Chair: Djellel Difallah, NYU Abu Dhabi, djellel@nyu.edu
10:30 |
DM436
|
High-Fidelity Diffusion Editor for Zero-Shot Text-Guided Video Editing |
R |
Yan Luo, Zhichao Zuo, Zhao Zhang, Zhongqiu Zhao, Haijun Zhang, and Richang Hong
|
|||
10:50 |
DM475
|
Align Along Time and Space: A Graph Latent Diffusion Model for Traffic Dynamics Prediction |
R |
Yuhang Liu, Yingxue Zhang, Xin Zhang, Yu Yang, Yiqun Xie, Sahar Ghanipoor Machiani, Yanhua Li, and Jun Luo |
|||
11:10 |
DM483
|
Futures Quantitative Investment with Heterogeneous Continual Graph Neural Network |
S |
Zhizhong Tan, Min Hu, Bin Liu, and Guosheng Yin
|
|||
11:25 |
DM497
|
Multi-Hyperbolic Space-based Heterogeneous Graph Attention Network |
S |
Jongmin Park, Seunghoon Han, Jong-Ryul Lee, and Sungsu Lim
|
Session A5-5 Data mining for modelling, visualization, personalization, and recommendation
Room CS 7, 10:30-11:25
Session Chair: Shirui Pan, Griffith University, s.pan@griffith.edu.au
10:30 |
DM747
|
DISCO: A Hierarchical Disentangled Cognitive Diagnosis Framework for Interpretable Job Recommendation
|
R |
Xiaoshan Yu, Chuan Qin, Qi Zhang, Chen Zhu, Haiping Ma, Xingyi Zhang, and Hengshu Zhu |
|||
10:50 |
DM778
|
Bi-level User Modeling for Deep Recommender Systems |
R |
Yejing Wang, Dong Xu, Xiangyu Zhao, Zhiren Mao, Peng Xiang, Ling Yan, Yao Hu, Zijian Zhang, Xuetao Wei, and Qidong Liu
|
|||
11:10 |
DM708
|
An Explainable Recommender System by Integrating Graph Neural Networks and User Reviews |
S |
Sahar Batmani, Parham Moradi, Narges Haidari, and Mahdi Jalili |
Session A6-5 Applications of data mining
Room CS 9, 10:30-11:25
Session Chair: Ling Chen, University of Technology Sydney, ling.chen@uts.edu.au
10:30 |
DM806
|
A Learned Approach to Index Algorithm Selection |
R |
Chaohong Ma, Xiaohui Yu, Yifan Li, Aishan Maoliniyazi, and Xiaofeng Meng
|
|||
10:50 |
DM772 |
TAN: A Tripartite Alignment Network Enhancing Composed Image Retrieval with Momentum Distillation |
R |
Yongquan Wan, Erhe Yang, Cairong Yan, Guobing Zou, and Bofeng Zhang |
|||
11:10 |
DM604
|
AnomalyLLM: Few-shot Anomaly Edge Detection for Dynamic Graphs using Large Language Models |
S |
Shuo Liu, Yao Di, Lanting Fang, Zhetao Li, Wenbin Li, Kaiyu Feng, Xiaowen Ji, and Jingping Bi |
Session A1-5 Foundations, algorithms, models, and theory of data mining
Room Hive, 10:30-11:40
Session Chair: Shaoan Xie, Carnegie Mellon University, shaoan@cmu.edu
10:30 |
DM667
|
Scalable Graph Classification via Random Walk Fingerprints |
R |
Peiyan Li, Honglian Wang, and Christian Böhm |
|||
10:50 |
DM717
|
Warm-Starting Contextual Bandits under Latent Reward Scaling |
R |
Bastian Oetomo, R. Malinga Perera, Renata Borovica-Gajic, and Benjamin I. P. Rubinstein |
|||
11:10 |
DM446
|
Constructing $\epsilon$-Constrained Sparsified $\beta^s$-Complexes using Space Partitioning Trees |
S |
Rohit Singh and Philip Wilsey |
|||
11:25 |
DM394
|
DynoGraph: Dynamic Graph Construction for Nonlinear Dimensionality Reduction |
S |
Li Qian, Claudia Plant, Yalan Qin, Jing Qian, and Christian Böhm |
Session A1-6 Foundations, algorithms, models, and theory of data mining
Room CS 5, 13:30-14:55
Session Chair: Jiuyong Li, University of South Australia, jiuyong.li@unisa.edu.au
13:30 |
DM776
|
PROMIPL:A Probabilistic Generative Model for Multi-Instance Partial-Label Learning |
R |
Yin-Fang Yang, Wei Tang, and Min-Ling Zhang |
|||
13:50 |
DM783
|
A Novel Shadow Variable Catcher for Addressing Selection Bias in Recommendation Systems |
R |
Qingfeng Chen, Boquan Wei, Debo Cheng, Jiuyong Li, Lin Liu, and Shichao Zhang
|
|||
14:10 |
DM672
|
Reducing Unfairness in Distributed Community Detection
|
S |
Hao Zhang, Malith Jayaweera, Bin Ren, Yanzhi Wang, and Sucheta Soundarajan |
|||
14:25 |
DM780
|
An Efficient Graph Autoencoder with Lightweight Desmoothing Decoder and Long-Range Modeling |
S |
Jinyong Wen, Tao Zhang, Chunxia Zhang, Shiming Xiang, Chunhong Pan |
|||
14:40 |
DM798
|
MoRE-LLM: Mixture of Rule Experts Guided by a Large Language Model |
S |
Alexander Koebler, Ingo Thon, and Florian Buettner |
Session A2-5 Deep learning and statistical methods for data mining
Room CS 7, 13:30-15:00
Session Chair: Chuan Zhou, Peking University, zhouchuancn@pku.edu.cn
13:30 |
DM634
|
Counterfactual Brain Graph Augmentation Guided Bi-Level Contrastive Learning for Disorder Analysis |
R |
Guangwei Dong, Xuexiong Luo, Jing Du, Jia Wu, Shan Xue, Jian Yang, and Amin Beheshti |
|||
13:50 |
DM734
|
Feature Map Purification for Enhancing Adversarial Robustness of Deep Timeseries Classifiers |
R |
Mubarak Abdu-Aguye, Zaigham Zaheer, and Karthik Nandakumar
|
|||
14:10 |
DM790 |
EMIT - Event Based Masked Auto Encoding for Irregular Time Series |
R |
Hrishikesh Patel, Ruihong Qiu, Adam Irwin, Shazia Sadiq, and Sen Wang |
|||
14:30 |
DM809
|
PC3: Enhancing Concurrency in High-Conflict Transactions with Prior Cascading Control |
S |
Zhibin Wang, Jiangtao Cui, Xiyue Gao, Hui Zhang, Guiqi Ren, Yixiao Liu, Hui Li, and Kankan Zhao
|
|||
14:45 |
DM795
|
Handling Non-IID Data in Federated Learning Using Metaheuristic Optimization Techniques |
S |
Amin Birashk, Sadaf MD Halim, and Latifur Khan |
Session A3-6 Mining from heterogeneous data sources
Room CS 9, 13:30-15:00
Session Chair: Kijung Shin, KAIST, kijungs@kaist.ac.kr
13:30 |
DM713
|
Traffic Pattern Sharing for Federated Traffic Flow Prediction with Personalization |
R |
Hang Zhou, Wentao Yu, Sheng Wan, Yongxin Tong, Tianlong Gu, and Chen Gong |
|||
13:50 |
DM745
|
TROPICAL: Transformer-based Hypergraph Learning for Camouflaged Fraudsters Detection |
R |
Venus Haghighi, Behnaz Soltani, Nasrin Shabani, Jia Wu, Yang Zhang, Lina Yao, Quan Z. Sheng, and Jian Yang |
|||
14:10 |
DM760
|
MOStream: A Modular and Self-Optimizing Data Stream Clustering Algorithm |
R |
Zhengru Wang, Xin Wang, and Shuhao Zhang |
|||
14:30 |
DM467
|
Weakly-Supervised Graph Classification with Even a Single Key Subgraph Per Class |
S |
Lu Zhang, Chenbo Zhang, Jihong Guan, and Shuigeng Zhou |
|||
14:45 |
DM681
|
Graph Rhythm Network: Beyond Energy Modeling for Deep Graph Neural Networks |
S |
Yufei Jin and Xingquan Zhu |
Session A2-4 Deep learning and statistical methods for data mining
Room Hive, 13:30-15:00
Session Chair: Omkar Bhalerao , University of California, Santa Cruz, obhalera@ucsc.edu
13:30 |
DM610
|
A Bayesian Hierarchical Model for Orthogonal Tucker Decomposition with Oblivious Tensor Compression |
R |
Matthew Pietrosanu, Bei Jiang, and Linglong Kong
|
|||
13:50 |
DM611
|
Normalizing self-supervised learning for provably reliable Change Point Detection |
R |
Alexandra Bazarova, Evgenia Romanenkova, and Alexey Zaytsev |
|||
14:10 |
DM729
|
Enhancing Distribution and Label Consistency for Graph Out-of-Distribution Generalization |
S |
Song Wang, Xiaodong Yang, Rashidul Islam, Huiyuan Chen, Minghua Xu, Jundong Li, and Yiwei Cai
|
|||
14:25 |
DM741
|
CAKD: A Correlation-Aware Knowledge Distillation Framework Based on Decoupling Kullback-Leibler Divergence |
S |
Zao Zhang, Huaming Chen, Pei Ning, Nan Yang, and Dong Yuan |
|||
14:40 |
DM812
|
Rank Supervised Contrastive Learning for Time Series Classification |
S |
Qianying Ren, Dongsheng Luo, and Dongjin Song |
Tutorials
Tutorial 1: Causality and Large Models
Presenters: Haoxuan Li, Chuan Zhou, Mengyue Yang, Mingming Gong, Jun Wang, Xiao-Hua Zhou
Abstract: Our tutorial aims to explore the synergies between causality and large models, also known as “foundation models,” which have demonstrated remarkable capabilities across for helping data mining in healthcare, finance, and education. However, there are increasingly concerns about the trustworthy and interpretability of these complex ”black-box” LLMs behind the promising performance in data mining domains. A growing community of researchers is turning towards a more principled framework to address these concerns, better understand the behavior of large models, and improve their reliability and interpretability. Specifically, this tutorial will focus on three directions: causal agents for decision-making, LLMs for causality, and benefiting LLMs with causality. Besides, we introduce some open challenges and potential future directions for this area. We hope this tutorial could stimulate more ideas on this topic and facilitate the development of causality-aware large models.
Duration: One whole Day
Tutorial 2: Hypergraph Neural Networks: An In-Depth and Step-by-Step Guide
Presenters: Sunwoo Kim, Soo Yong Lee, Yue Gao, Alessia Antelmi, Mirko Polato, Kijung Shin
Abstract: Higher-order interactions (HOIs) are ubiquitous in real-world networks. Investigation of deep learning for networks of HOIs, expressed as hypergraphs, has become an important agenda for the data mining and machine learning communities. Thus, hypergraph neural networks (HNNs) have emerged as a powerful tool for representation learning on hypergraphs. Given the emerging trend, we provide a timely tutorial dedicated to HNNs. We cover the (1) inputs, (2) message passing schemes, (3) training strategies, (4) applications (e.g., recommender systems and time series analysis), and (5) open problems of HNNs. This tutorial is intended for researchers and practitioners who are interested in (hyper)graph representation learning and its applications.
Duration: Half-a-day
Tutorial 3: Uncertain Boundaries: Multidisciplinary Approaches to Copyright Issues in Generative AI
Presenters: As generative AI systems become more prevalent in creative fields, concerns about intellectual property rights have grown, particularly regarding the production of content that closely resembles human-created work. Recent controversies, where AI models have generated near-replicas of copyrighted material, underscore the urgency of reviewing the current copyright framework and developing methods to mitigate infringement risks. To this end, this tutorial offers a comprehensive analysis of these copyright challenges, examining them throughout the AI development life cycle and providing developers with actionable strategies. It begins by discussing the foundational goals and considerations for copyright in generative AI, followed by methods for detecting and assessing potential violations in AI outputs. Next, it introduces techniques to safeguard creative works and datasets from unauthorized replication. The tutorial also covers training methods aimed at minimising the risk of AI models reproducing protected content. Finally, it reviews the state of AI copyright regulation and suggests future research pathways to address existing gaps.
Duration: Half-a-day
Keynotes
Keynote 1: Preslav Nakov
Title: Towards Safe, Truly Open, and Factual Large Language Models
Abstract: We will discuss several initiatives towards safe, truly open, and factual large language models (LLMs). First, we will present Do-Not-Answer, a dataset for evaluating the guardrails of LLMs, which is at the core of the safety mechanisms incorporated in Jais, the world's leading open Arabic-centric foundation and instruction-tuned large language model, and Nanda, our recently released open Hindi LLM. Next, we will discuss the LLM360 initiative of MBZUAI's Institute on Foundation Models, aiming at developing fully transparent open-source LLMs. We will then examine the factuality challenges associated with large language models, and we will present some recent relevant tools for addressing these challenges developed at MBZUAI: (i) OpenFactCheck, a framework for fact-checking LLM output, for building customized fact-checking systems, and for benchmarking LLMs for factuality, (ii) LM-Polygraph, a tool for predicting an LLM's uncertainty in its output using cheap and fast uncertainty quantification techniques, and (iii) LLM-DetectAIve, a tool for machine-generated text detection.
Bio: Preslav Nakov is Professor and Department Chair for NLP at the Mohamed bin Zayed University of Artificial Intelligence. He is part of the core team at MBZUAI's Institute for Foundation Models that developed Jais, the world's best open-source Arabic-centric LLM, Nanda, the world's best Hindi model, and LLM360, the first truly open LLM. Previously, he was Principal Scientist at the Qatar Computing Research Institute, HBKU, where he led the Tanbih mega-project, developed in collaboration with MIT, which aims to limit the impact of "fake news", propaganda and media bias by making users aware of what they are reading, thus promoting media literacy and critical thinking. He received his PhD degree in Computer Science from the University of California at Berkeley, supported by a Fulbright grant. He is Chair-Elect of the European Chapter of the Association for Computational Linguistics (EACL), Secretary of ACL SIGSLAV, and Secretary of the Truth and Trust Online board of trustees. Formerly, he was PC chair of ACL 2022, and President of ACL SIGLEX. He is also member of the editorial board of several journals including Computational Linguistics, TACL, ACM TOIS, IEEE TASL, IEEE TAC, CS&L, NLE, AI Communications, and Frontiers in AI. He authored a Morgan & Claypool book on Semantic Relations between Nominals, two books on computer algorithms, and 250+ research papers. He received a Best Paper Award at ACM WebSci'2022, a Best Long Paper Award at CIKM'2020, a Best Resource Paper Award at EACL'2024, a Best Demo Paper Award (Honorable Mention) at ACL'2020, a Best Task Paper Award (Honorable Mention) at SemEval'2020, a Best Poster Award at SocInfo'2019, and the Young Researcher Award at RANLP’2011. He was also the first to receive the Bulgarian President's John Atanasoff award, named after the inventor of the first automatic electronic digital computer. His research was featured by over 100 news outlets, including Reuters, Forbes, Financial Times, CNN, Boston Globe, Aljazeera, DefenseOne, Business Insider, MIT Technology Review, Science Daily, Popular Science, Fast Company, The Register, WIRED, and Engadget, among others.
Photo: https://mbzuai.ac.ae/study/faculty/preslav-nakov/
Keynote 2: Bernhard Schölkopf
Title: Towards causal world models and digital twins
Abstract: Research on understanding and building artificially intelligent systems has moved from symbolic approaches to statistical learning, and is now beginning to study interventional models relying on concepts of causality. Some of the hard open problems of machine learning and AI are intrinsically related to causality, and progress may require advances in our understanding of how to model and infer causality from data, as well as conceptual progress on what constitutes a causal representation and a causal world model. I will present basic concepts and thoughts, and some applications to astronomy.
Bio: Bernhard Schölkopf's scientific interests are in machine learning and causal inference. He has applied his methods to a number of different fields, ranging from biomedical problems to computational photography and astronomy. Bernhard studied physics and mathematics and earned his Ph.D. in computer science in 1997, becoming a Max Planck director in 2001. He has (co-)received the Berlin-Brandenburg Academy Prize, the Royal Society Milner Award, the Leibniz Award, the BBVA Foundation Frontiers of Knowledge Award, and the ACM AAAI Allen Newell Award. He is Fellow of the CIFAR Program "Learning in Machines and Brains", and a Professor at ETH Zurich. He helped start the MLSS series of Machine Learning Summer Schools. In 2023, he founded the ELLIS Institute Tuebingen, and acts as its scientific director.
Keynote 3: Claudia Plant
Title: Clustering: Balancing Abstraction and Representation
Abstract: How to find a natural grouping of a large real data set? Clustering requires a balance between abstraction and representation. To identify clusters, we need to abstract from superfluous details of individual objects. But we also need a rich representation that emphasizes the key features shared by groups of objects that distinguish them from other groups of objects.
Each clustering algorithm implements a different trade-off between abstraction and representation. Classical K-means implements a high level of abstraction - details are simply averaged out - combined with a very simple representation - all clusters are Gaussians in the original data space. We will see how approaches to subspace and deep clustering support high-dimensional and complex data by allowing richer representations. However, with increasing representational expressiveness comes the need to explicitly enforce abstraction in the objective function to ensure that the resulting method performs clustering and not just representation learning. We will see how current deep clustering methods define and enforce abstraction through centroid-based and density-based clustering losses. Balancing the conflicting goals of abstraction and representation is challenging. Ideas from subspace clustering help by learning one latent space for the information that is relevant to clustering and another latent space to capture all other information in the data.
The talk ends with an outlook on future research in clustering. In my view, future methods will more adaptively balance abstraction and representation to improve performance, energy efficiency and interpretability. By automatically finding the sweet spot between abstraction and representation, the human brain is very good at clustering and other related tasks such as single-shot learning. So, there is still much to be explored.
Bio: Claudia Plant is full professor, leader of the Data Mining and Machine Learning research group at the Faculty of Computer Science University of Vienna, Austria. Her group focuses on new methods for exploratory data mining, e.g., clustering, anomaly detection, graph mining and matrix factorization. Many approaches relate unsupervised learning to data compression, i.e. the better the found patterns compress the data the more information we have learned. Other methods rely on finding statistically independent patterns or multiple non-redundant solutions, relying on deep learning or nature-inspired concepts such as synchronization. Indexing techniques and methods for parallel hardware support exploring massive data. Claudia Plant has co-authored over 150 peer-reviewed publications, among them more than 30 contributions to KDD and ICDM and 4 Best Paper Awards. Papers on scalability aspects appeared at SIGMOD, ICDE, and the results of interdisciplinary projects in leading application-related journals such as Bioinformatics, Cerebral Cortex and Water Research.