UAVM 2024

ACM Multimedia

ACM MM 2024 (

Workshop on

UAVs in Multimedia: Capturing the World from a New Perspective (UAVM 2024)

The accept papers will be published at ACM Multimedia Workshop (top 50%), and go through the same peer review process as the regular papers. Several authors will be invited to do a oral presentation.

[Accepted Workshop Proposal] [Submission Site]


  • 25/7/2024 - Challenge Open-source Code.
  • 23/4/2024 - Challenge Platform is now available.
  • 23/4/2024 - Paper submission site is now available.
  • 22/4/2024 - CFP is released.
  • 22/4/2024 - Workshop homepage is now available.

Workshop Schedule

11:30~11:45am break

11:45~12:00am (GMT-4) Challenge 1st-place Winner

12:00~12:15am (GMT-4) Challenge 2nd-place Winner

12:15~12:30am (GMT-4) Challenge 3rd-place Winner

Important Dates

Submission of papers:

  • Workshop Papers Submission: 5 July 2024
  • Workshop Papers Notification: 30 July 2024
  • Student Travel Grants Application Deadline: 5 August 2024
  • Camera-ready Submission: 6 August 2024
  • Conference Dates: 28 October 2024 – 1 November 2024

Please note: The submission deadline is at 11:59 p.m. of the stated deadline date Anywhere on Earth


Unmanned Aerial Vehicles (UAVs), also known as drones, have become increasingly popular in recent years due to their ability to capture high-quality multimedia data from the sky. With the rise of multimedia applications, such as aerial photography, cinematography, and mapping, UAVs have emerged as a powerful tool for gathering rich and diverse multimedia content. This workshop aims to bring together researchers, practitioners, and enthusiasts interested in UAV multimedia to explore the latest advancements, challenges, and opportunities in this exciting field. The workshop will cover various topics related to UAV multimedia, including aerial image and video processing, machine learning for UAV data analysis, UAV swarm technology, and UAV-based multimedia applications. In the context of the ACM Multimedia conference, this workshop is highly relevant as multimedia data from UAVs is becoming an increasingly important source of content for many multimedia applications. The workshop will provide a platform for researchers to share their work and discuss potential collaborations, as well as an opportunity for practitioners to learn about the latest developments in UAV multimedia technology. Overall, this workshop will provide a unique opportunity to explore the exciting and rapidly evolving field of UAV multimedia and its potential impact on the wider multimedia community.

The list of possible topics includes, but is not limited to:

  • Video-based UAV Navigation
    • Satellite-guided & Ground-guided Navigation
    • Path Planning and Obstacle Avoidance
    • Visual SLAM (Simultaneous Localization and Mapping)
    • Sensor Fusion and Reinforcement Learning for Navigation
  • UAV Swarm Coordination
    • Multiple Platform Collaboration
    • Multi-agent Cooperation and Communication
    • Decentralized Control and Optimization
    • Distributed Perception and Mapping
  • UAV-based Object Detection and Tracking
    • Aerial-view Object Detection, Tracking and Re-identification
    • Aerial-view Action Recognition
  • UAV-based Sensing and Mapping
    • 3D Mapping and Reconstruction
    • Remote Sensing and Image Analysis
    • Disaster Response and Relief
  • UAV-based Delivery and Transportation
    • Package Delivery and Logistics
    • Safety and Regulations for UAV-based Transportation

Submission Types

Paper can be submitted on [Open Review].

Submission template can be found at ACM or you may directly follow the overleaf template.

In this workshop, we welcome four types of submissions, all of which should relate to the topics and themes as listed in Section 3:

  • (1). Position or perspective papers (up to 4 pages in length, plus 2 pages for references): original ideas, perspectives, research vision, and open challenges in the area of evaluation approaches for explainable recommender systems;
  • (2). Challenge papers (up to 4 pages in length, plus 2 pages for references): original solution to the Challenge data, University160k, in terms of effectiveness and efficiency.
  • (3). Demonstration papers (up to 2 pages in length, plus 2 pages for references): original or already published prototypes and operational evaluation approaches in the area of explainable recommender systems. Page limits include diagrams and appendices. Submissions should be single-blind, written in English, and formatted according to the current ACM two-column conference format. Suitable LaTeX, Word, and Overleaf templates are available from the ACM Website (use “sigconf” proceedings template for LaTeX and the Interim Template for Word).


  • For privacy protection, please blur faces in the published materials (such as paper, video, poster, etc.)
  • For social good, please do not contain any misleading words, such as surveillance and secret.


Challenge Platform is at .

We also provide a multi-weather cross-view geo-localization dataset, called University160k-WX, and welcome your participation in the competition. The motivation is to simulate the real-world geo-localization scenario. In particular, University160k extends the current University-1652 dataset with extra 167,486 satellite-view gallery distractors. University160k-WX further introduces weather variants on University160k, including fog, rain, snow and multiple weather compositions. We will release University160k-WX on our website, and make a public leader board. These distractor satellite-view images have a size of $1024 \times 1024$ and are obtained by cutting orthophoto images of real urban and surrounding areas. Multiple weathers are randomly sampled to increase the difficulty of representation learning. In our primary evaluation, the distractor is challenging and makes the competitive baseline model, LPN, decrease the Recall@1 accuracy from $75.93\%$ to $64.85\%$ and the value of AP from $79.14\%$ to $67.69\%$ in the Drone $\rightarrow$ Satellite task. If we further introduce extreme weather, the performance further drops from $64.85\%$ to $7.94\%$. We hope more audiences can be involved to solve this challenge, and consider the robustness problem against extreme weather.

Check challenge details at Section 5 in

The challenge dataset contains two part.

  1. The basic dataset (training set) can be download by Request. Usually I will reply the download link in 5 minutes.

  2. The name-masked test-160k-WX dataset (query & gallery+distractor) can be downloaded from Onedrive. Since only drone will meet weather conditions, we only simulate weather on drone-view queries.

The submission example can be found at Baseline Submission. Please zip it as to submit the result.

Please return the top-10 satellite names. For example, the first query is Q3JI2tUwDkhcfip.jpeg. Therefore, the first line of returned result in answer.txt should be the format as follows:

e6kXgz36E8nOY2n       ioqKwvSIYYhiW2v       y4VmQPUYOMD8AH4       kpZ2QJlNBHMnbRA       xffJQs2n9DP17fg       IejrFHLQYBfce2y       cH79t5WJMEMZ3VA       W9u0j4N1nlFbI97       zDurtAW4FTJfNJ3       MuvIMNVdofmaRqG

Please return the result following the order of query at Query TXT It will be 37855 lines.

  • Wang, T., Zheng, Z., Sun, Y., Yan, C., Yang, Y., & Chua, T. S. (2024). Multiple-environment Self-adaptive Network for Aerial-view Geo-localization. Pattern Recognition, 152, 110363.
  • Zheng, Z., Wei, Y., & Yang, Y. (2020, October). University-1652: A multi-view multi-source benchmark for drone-based geo-localization. In Proceedings of the 28th ACM international conference on Multimedia (pp. 1395-1403).
  • Wang, C., Zheng, Z., Quan, R., Sun, Y., & Yang, Y. (2023). Context-aware pretraining for efficient blind image decomposition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 18186-18195).

Organizing Team

Zhedong Zheng, National University of Singapore, Singapore Yujiao Shi, ShanghaiTech University, China Tingyu Wang, Hangzhou Dianzi University, China
Chen Chen, University of Central Florida, USA Pengfei Zhu, Tianjing University, China Richard Hartley, Australian National University, Australia

Conference and Journal Papers

All papers presented at ACMMM 2024 will be included in ACM proceeding. All papers submitted to this workshop will go through the same review process as the regular papers submitted to the main conference to ensure that the contributions are of high quality.

Student Traval Funding

Please check

Workshop Citation

  title={The 2nd Workshop on UAVs in Multimedia: Capturing the World from a New Perspective},
  author={Zheng, Zhedong and Shi, Yujiao and Wang, Tingyu and Liu, Jun and Fang, Jianwu and Wei, Yunchao and Chua, Tat-seng},
  booktitle={Proceedings of the 32nd ACM International Conference on Multimedia Workshop},