Call for Papers: Special Issue on Multi-Modal Large Language Representation Learning: Theory, Algorithms, and Applications

IEEE Transactions on Big Data seeks submissions for this upcoming special issue.
Share this on:
Submissions Due: 31 December 2025

Important Dates

  • Submission Deadline: 31 December 2025
  • First-Round Review Notification: 31 March 2026
  • Final Decision Notification: 30 June 2026
  • Tentative publication: Mid-2026

In the era of big data, the ubiquity of multi-modal data presents significant opportunities across various domains, from multimedia applications to intelligent systems. Multi-modal learning allows models to understand and process information from multiple data sources and modalities, leading to richer, more nuanced representations of the real world. For example, virtual assistants that integrate both text and visual information can provide more precise and context-aware responses. In autonomous driving, the fusion of data from cameras, lidars, and GPS enhances the vehicle’s perception, enabling more accurate decision-making. As multi-modal data continues to proliferate, its potential to revolutionize fields such as healthcare, autonomous systems, robotics, and personalized recommendations has become increasingly apparent.

However, despite the remarkable progress made in multi-modal learning, there remain significant gaps in fully leveraging the potential of multi-modal data. Large language models (LLMs) like GPT and LLaVA have demonstrated exceptional performance in natural language processing (NLP) tasks, yet their ability to handle and integrate multi-modal data remains limited. Extending these models to encompass multiple modalities—text, images, video, and audio—requires overcoming various theoretical and practical challenges. The inherent complexity of multi-modal data—characterized by diverse formats, feature distributions, and semantic meanings—poses a significant barrier to developing unified models capable of effectively processing this data.

Several key challenges still hinder the successful integration of multi-modal learning in real-world applications. One primary challenge is the effective fusion of heterogeneous data sources. Each modality often presents unique characteristics and challenges, such as differing formats or semantic structures, making seamless integration difficult. Another significant challenge is the online learning of dynamic, continuously generated multi-modal data. Unlike static datasets, real-world multi-modal data requires models that can adapt incrementally, learning from new data without the need for complete retraining. Moreover, ensuring the safety, security, and interpretability of multi-modal models is critical, especially in high-stakes domains such as healthcare, autonomous driving, and finance, where decisions based on these models can have profound impacts.

These challenges underscore the urgent need for advancements in multi-modal learning, particularly in the context of big data. While deep learning models, including neural networks and transformers, have shown promise in handling multi-modal data, their scalability, robustness, and real-world applicability require further exploration. There is also a pressing need to extend multi-modal learning techniques to broader application areas—ranging from healthcare and security to smart cities and industrial IoT. To address these issues, more advanced and efficient learning algorithms are required, capable of processing large, complex datasets in a reliable and scalable manner.

This special issue seeks to explore the latest advancements in multi-modal large language representation learning and address the critical challenges faced in this domain. It aims to bring together researchers and practitioners from academia and industry to share novel research findings, advanced algorithms, and successful application cases. By focusing on both theoretical foundations and practical solutions, the issue will contribute to bridging the gap between academia and industry. Furthermore, it will highlight the potential of multi-modal learning to drive innovation in diverse industries, including healthcare, e-commerce, manufacturing, and beyond. This special issue is designed to serve as a platform for advancing the state-of-the-art in multi-modal learning and promoting its adoption in real-world applications.

We invite submissions exploring advanced multi-modal large language representation learning. This special issue aims to showcase the latest research in the field, covering a broad spectrum of topics, including but not limited to:

  • Multi-modal large language representation learning models and paradigms
  • Safety and robustness in multi-modal large language representation learning: adversarial attacks, threats, and defenses
  • Distributed training techniques for multi-modal knowledge mining
  • Advanced approaches for complex multi-modal tasks, such as cross-modal retrieval and fusion
  • Multi-modal anomaly detection and its applications in various domains
  • Multi-modal learning in knowledge graph-based, social network, and IoT contexts
  • Multi-modal learning in NLP (e.g., text mining, knowledge graphs, etc.)
  • Multi-modal learning in CV (e.g., object detection, super-resolution, video-text retrieval, satellite-related applications, video tracking, etc)
  • Advanced large language models tailored for multi-modal learning
  • Multi-modal large model theory and technology
  • Novel architectures for Multi-Modal Large Language Representation learning (e.g., novel transformers, mamba, graph neural networks, etc.)
  • Explainable AI in multi-modal graph mining and knowledge discovery
  • Multi model neuro-symbolic learning
  • Real-world industrial applications of multi-modal learning (e.g., social networking, e-commerce, electricity, manufacturing, industry etc.)
  • Comprehensive surveys and analysis in large language model and multi-modal learning

Submission Information

For author information and guidelines on submission criteria, please visit the Author Information Page. Please submit papers through the IEEE Author Portal, and be sure to select the special-issue name. Manuscripts should not be published or currently submitted for publication elsewhere. Please submit only full papers intended for review, not abstracts, to the ScholarOne portal. Abstracts should be sent by email to the guest editors directly.

In addition to submitting your paper to IEEE Transactions on Big Data, you are also encouraged to upload the data related to your paper to IEEE DataPort. IEEE DataPort is IEEE’s data platform that supports the storage and publishing of datasets while also providing access to thousands of research datasets. Uploading your dataset to IEEE DataPort will strengthen your paper and will support research reproducibility. Your paper and the dataset can be linked, providing a good opportunity for you to increase the number of citations you receive. Data can be uploaded to IEEE DataPort prior to submitting your paper or concurrent with the paper submission.


Questions?

Contact the guest editors:

  • Hao Peng, Professor, Beihang University, Beijing, China, penghao@buaa.edu.cn
  • Lifang He, Associate Professor, Lehigh University, Bethlehem, PA, USA, lih319@lehigh.edu
  • Philip S. Yu, Distinguished Professor, University of Illinois at Chicago, Chicago, IL, USA, psyu@uic.edu