Special Issue of IEEE Transactions on Multimedia

“Weakly Supervised Learning for Image and Video Understanding”


With the goal of addressing fine-level image and video understanding tasks by learning from coarse-level human annotations, WSL is of particular importance in such a big data era as it can dramatically alleviate the human labor for annotating each of the structured visual/multimedia data and thus enables machines to learn from much larger-scaled data but with the equal annotation cost of the conventional fully supervised learning methods. More importantly, when dealing with the data from real-world application scenarios, such as the medical imaging data, remote sensing data, and audio-visual data, fine-level manual annotations are very limited and difficult to obtain. Under these circumstances, the WSL-based learning frameworks, specifically for the WSL-based multi-modality/multi-task learning frameworks, would bring great benefits. Unfortunately, designing effective WSL systems is challenging due to the issues of “semantic unspecificity” and “instance ambiguity”, where the former refers to the setting where the provided semantic label is at image level rather than specific instance-level while the latter refers to the ambiguity when determining an instance sample against the instance part or instance cluster. Principled solutions to address these problems are still understudied. Nowadays, with the rapid development of advanced machine learning techniques, such as the Graph Convolutional Networks, Capsule Networks, Transformers, Generative Adversarial Networks, and Deep Reinforcement Learning models, new opportunities have emerged for solving the problems in WSL and applying WSL to richer vision and multimedia tasks. This special issue aims at promoting cutting-edge research along this direction and offers a timely collection of works to benefit researchers and practitioners. We welcome high-quality original submissions addressing both novel theoretical and practical aspects related to WSL, as well as the real-world applications based on WSL approaches.


Topics of interests include, but are not limited to:

  • Multi-modality weakly supervised learning theory and framework;
  • Multi-task weakly supervised learning theory and framework;
  • Robust learning theory and framework;
  • Audio-visual learning under weak supervision;
  • Weakly supervised spatial/temporal feature learning;
  • Self-supervised learning frameworks and applications;
  • Graph Convolutional Networks/Graph Neural Networks-based weakly supervised learning frameworks;
  • Deep Reinforcement Learning for weakly supervised learning;
  • Emerging vision and multimedia tasks with limited supervision;


Manuscript submission:

15th January 2021 15th August 2021

Preliminary results:

15th April 2021 15th November 2021

Revisions due:

1st June 2021 1st January 2022


15th July 2021 15th February 2022

Final manuscripts due:

15th August 2021 15th March 2022

Anticipated publication:

4th quarter of 2021 Midyear 2022


Dingwen Zhang, Xidian University

Chuang Gan, MIT and MIT-IBM Watson AI Lab

Enrico Magli, Politecnico di Torino

David Crandall, Indiana University

Junwei Han, Northwestern Polytechnical University

Fatih Porikli, Australian National University