UAVM 2023

ACM Multimedia

ACM MM 2023 (

Workshop on

UAVs in Multimedia: Capturing the World from a New Perspective (UAVM 2023)

The accept papers will be published at ACM Multimedia Workshop (top 50%), and go through the same peer review process as the regular papers. Several authors will be invited to do a oral presentation.

[Accepted Workshop Proposal] [Submission Site]

Join our Google Group for important updates.


  • The 2nd workshop 2024 is at .
  • 28/10/2023 - All proceeding papers can be found at ACM Website
  • 15/7/2023 - Challenge Open-source Code.
  • 23/4/2023 - Challenge Platform is now available.
  • 7/4/2023 - Paper submission site is now available.
  • 6/4/2023 - CFP is released.
  • 6/4/2023 - Workshop homepage is now available.

Workshop Schedule

2 Nov 9:30~10:30am (GMT-4) Nathan Jacobs (Washington University in St. Louis) (Last 10 minutes will be QA)

The talk slides can be found at [link].

2 Nov 10:30~11:30am (GMT-4) Rakesh Kumar (SRI International) (Last 10 minutes will be QA)

The talk slides can be found at [link].

2 Nov 11:30~11:45am break

2 Nov 11:45~12:00am (GMT-4) Challenge 1st-place Winner

2 Nov 12:00~12:15am (GMT-4) Challenge 2nd-place Winner

2 Nov 12:15~12:30am (GMT-4) Challenge 3rd-place Winner

Invited Speakers

Nathan Jacobs, Washington University in St. Louis Rakesh Kumar, SRI International

Talk: Learning to Map Anything, Anywhere, Anytime (Nathan Jacobs)

Abstract: What might it sound like here? How would you describe this place? Would it be unusual to see a large mammal if I took an early morning walk? These are all questions that are inherently spatial in nature and difficult to answer precisely. This talk explores a new approach to multi-modal remote sensing that shows how we might build a system that supports answering such questions at a global scale, enabling us to understand the Earth with a level of semantic, spatial, and temporal resolution that was previously impossible.

Bio:Nathan Jacobs earned a Ph.D. in Computer Science at Washington University in St. Louis (2010). After many years at the University of Kentucky, he is currently a Professor in the Computer Science & Engineering department at Washington University in St. Louis. Dr. Jacobs’ research area is computer vision; his specialty is developing learning-based algorithms and systems for processing large-scale image collections. His current focus is on developing techniques for mining information about people and the natural world from geotagged imagery, including images from social networks, publicly available outdoor webcams, and satellites. His research has been funded by NSF, NIH, DARPA, IARPA, NGA, ARL, AFRL, and Google.

Talk:Semantically Guided Collaborative Navigation, 3D Mapping, Planning and Control for Unmanned Platforms (Rakesh Kumar)

Abstract:Classical metric navigation systems use GPS and/ or prior 2D and 3D maps for localization of unmanned platforms. However, these systems do not operate n in GPS challenged or denied environments. Metric map-based navigation systems are also not robust to dynamic scene changes. In this talk, we will describe various methods we have developed to incorporate AI derived semantic information into metric navigation for both ground and aerial robots using data from multiple sensors. Compared to low-level metric features, semantic information is more robust to scene changes over time and can be matched across time/space/platforms. We will also discuss how 3D reference semantic maps can be built. Sharing semantic information also reduces bandwidth required for collaboration. Moreover, it enables natural language interaction between humans and mobile platforms. Finally, we will discuss how the semantic information can be used by robots to learn to navigate in unmapped areas much like humans are able to visit and navigate in new, never visited before locales. Our new approach, SayNav, leverages common-sense knowledge from Large Language Models (LLMs) for efficient generalization to complicated navigation tasks in unknown large-scale environments.

Bio:Rakesh “Teddy” Kumar, Ph.D., is Vice President, Information and Computing Sciences and Director of the Center for Vision Technologies at SRI International. In this role, he is responsible for leading research and development of innovative end-to-end vision solutions from image capture to situational understanding that translate into real-world applications such as robotics, intelligence extraction and human computer interaction. He has received the Outstanding Achievement in Technology Development award from his alma mater, University of Massachusetts Amherst, the Sarnoff Presidents Award, and Sarnoff Technical Achievement awards for his work in registration of multi-sensor, multi-dimensional medical images and alignment of video to three-dimensional scene models. The paper “Stable Vision-Aided Navigation for Large-Area Augmented Reality” co-authored by him received the best paper award in the IEEE Virtual Reality 2011 conference. The paper “Augmented Reality Binoculars” co-authored by him received the best paper award in the IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 2013 conference. Kumar has served on NSF review and DARPA ISAT panels. He has also been an associate editor for IEEE Transactions on Pattern Analysis and Machine Intelligence. He has co-authored more than 60 research publications, and received more than 50 patents. A number of spin-off companies have been created based on the research done at the Center for Vision Technologies. Kumar received his Ph.D. in Computer Science from the University of Massachusetts at Amherst in 1992. His M.S. in Electrical and Computer Engineering is from State University of New York at Buffalo in 1995, and his B.Tech in Electrical Engineering is from Indian Institute of Technology, Kanpur, India in 1983.


  1. Fabian Deuser1, Konrad Habel1, Martin Werner2, Norbert Oswald1 (1University of the Bundeswehr Munich, 2Technische Universität München)

  2. Zhifeng Lin1,2, Ranran Huang1, Jiancheng Cai1, Xinmin Liu1, Changxing Ding2, Zhenhua Chai1 (1Meituan, 2South China University of Technology)

  3. Haoran Li, Quan Chen, Zhiwen Yang, Jiong Yin (Hangzhou Dianzi University)

Challenge Open-source Codes

Please check

Important Dates

Submission of papers:

  • Workshop Papers Submission: 5 July 2023 13 July 2023
  • Workshop Papers Notification: 30 July 2023
  • Student Travel Grants Application Deadline: 5 August 2023
  • Camera-ready Submission: 6 August 2023
  • Conference Dates: 28 October 2023 – 3 November 2023

Please note: The submission deadline is at 11:59 p.m. of the stated deadline date Anywhere on Earth


Unmanned Aerial Vehicles (UAVs), also known as drones, have become increasingly popular in recent years due to their ability to capture high-quality multimedia data from the sky. With the rise of multimedia applications, such as aerial photography, cinematography, and mapping, UAVs have emerged as a powerful tool for gathering rich and diverse multimedia content. This workshop aims to bring together researchers, practitioners, and enthusiasts interested in UAV multimedia to explore the latest advancements, challenges, and opportunities in this exciting field. The workshop will cover various topics related to UAV multimedia, including aerial image and video processing, machine learning for UAV data analysis, UAV swarm technology, and UAV-based multimedia applications. In the context of the ACM Multimedia conference, this workshop is highly relevant as multimedia data from UAVs is becoming an increasingly important source of content for many multimedia applications. The workshop will provide a platform for researchers to share their work and discuss potential collaborations, as well as an opportunity for practitioners to learn about the latest developments in UAV multimedia technology. Overall, this workshop will provide a unique opportunity to explore the exciting and rapidly evolving field of UAV multimedia and its potential impact on the wider multimedia community.

The list of possible topics includes, but is not limited to:

  • Video-based UAV Navigation
    • Satellite-guided & Ground-guided Navigation
    • Path Planning and Obstacle Avoidance
    • Visual SLAM (Simultaneous Localization and Mapping)
    • Sensor Fusion and Reinforcement Learning for Navigation
  • UAV Swarm Coordination
    • Multiple Platform Collaboration
    • Multi-agent Cooperation and Communication
    • Decentralized Control and Optimization
    • Distributed Perception and Mapping
  • UAV-based Object Detection and Tracking
    • Aerial-view Object Detection, Tracking and Re-identification
    • Aerial-view Action Recognition
  • UAV-based Sensing and Mapping
    • 3D Mapping and Reconstruction
    • Remote Sensing and Image Analysis
    • Disaster Response and Relief
  • UAV-based Delivery and Transportation
    • Package Delivery and Logistics
    • Safety and Regulations for UAV-based Transportation

Submission Types

Paper can be submitted on [Open Review].

Submission template can be found at ACM or you may directly follow the overleaf template.

In this workshop, we welcome four types of submissions, all of which should relate to the topics and themes as listed in Section 3:

  • (1). Position or perspective papers (up to 4 pages in length, plus unlimited pages for references): original ideas, perspectives, research vision, and open challenges in the area of evaluation approaches for explainable recommender systems;
  • (2). Challenge papers (up to 4 pages in length, plus unlimited pages for references): original solution to the Challenge data, University160k, in terms of effectiveness and efficiency.
  • (3). Featured papers (title and abstract of the paper, plus the original paper): already published papers or papers summarizing existing publications in leading conferences and highimpact journals that are relevant for the topic of the workshop;
  • (4). Demonstration papers (up to 2 pages in length, plus unlimited pages for references): original or already published prototypes and operational evaluation approaches in the area of explainable recommender systems. Page limits include diagrams and appendices. Submissions should be single-blind, written in English, and formatted according to the current ACM two-column conference format. Suitable LaTeX, Word, and Overleaf templates are available from the ACM Website (use “sigconf” proceedings template for LaTeX and the Interim Template for Word).


  • For privacy protection, please blur faces in the published materials (such as paper, video, poster, etc.)
  • For social good, please do not contain any misleading words, such as surveillance and secret.


Challenge Platform is at .

We also provide a challenging cross-view geo-localization dataset, called University160k, and the workshop audience may consider to participate the competition. The motivation is to simulate the real- world geo-localization scenario that we usually face an extremely large satellite-view pool. In particular, University160k extends the current University-1652 dataset with extra 167,486 satellite- view gallery distractors. We will release University160k on our website, and make a public leader board. These distractor satellite- view images have a size of 1024 × 1024 and are obtained by cutting orthophoto images of real urban and surrounding areas. The larger image size ensures higher image clarity, while the wider framing range allows the images to contain more diverse scenes, such as buildings, city roads, trees, fields, and more (see Figure 3). In our primary evaluation, the distractor is challenging and make the competitive baseline model, LPN, decrease the Recall@1 accuracy from 75.93% to 64.85% and the value of AP from 79.14% to 67.69% in the Drone → Satellite task (Please see Table 2). We hope more audiences can be involved to solve this challenge, and may also consider the efficiency problem against a large candidate pool.

Check challenge details at Section 5 in

The challenge dataset contains two part.

  1. The basic dataset (training set) can be download by Request. Usually I will reply the download link in 5 minutes.

  2. The name-masked test-160k dataset (query & gallery+distractor) can be downloaded from Onedrive.

(In the future, you also can download the name-unmasked distractor dataset to quiclyt report number in your paper (Please add to satellite gallery) can be downloaded from Onedrive, Google Drive, or Baidu Disk( Code:78xf).)

The submission example can be found at Baseline Submission. Please zip it as ``’’ to submit the result.

Please return the top-10 satellite names. For example, the first query is Y2HVQvCQIwVmwzq.jpeg''. Therefore, the first line of returned result inanswer.txt’’ should be the format as follows:

LJMJGM5vTQM3iRy	ValP4k9neTZffLz	Co1CEWkBhHdTAM2	w2Nk6LrN5p2cF54	FuMp6XdwlRqScG2	4WVhVPBkr8TJTNJ	y7XiwY8lWpMZNar	AQZgRYUIyvpUnz8	bziEPp56rwI7e7E	qI9WAxrCnbaqjIq

Please return the result following the order of query at Query TXT It will be 37855 lines.

Organizing Team

Zhedong Zheng, National University of Singapore, Singapore Yujiao Shi, Australian National University, Australia Tingyu Wang, Hangzhou Dianzi University, China
Jun Liu, Singapore University of Technology and Design, Singapore Jianwu Fang, Chang’an University, China Yunchao Wei, Beijing Jiaotong University, China
Tat-Seng Chua, National University of Singapore, Singapore    

Conference and Journal Papers

All papers presented at ACMMM 2023 will be included in ACM proceeding. All papers submitted to this workshop will go through the same review process as the regular papers submitted to the main conference to ensure that the contributions are of high quality.

Student Traval Funding

Please check

Application Deadline: August 5, 2023

Workshop Citation

  title={UAVM '23: 2023 Workshop on UAVs in Multimedia: Capturing the World from a New Perspective},
  author={Zheng, Zhedong and Shi, Yujiao and Wang, Tingyu and Liu, Jun and Fang, Jianwu and Wei, Yunchao and Chua, Tat-seng},
  booktitle={Proceedings of the 31th ACM International Conference on Multimedia Workshop},