icmr 2018 conference guide

http://www.icmr2018.org/

ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL

ICMR 2018CONFERENCE GUIDE

June 11-14, 2018Yokohama Media and Communication Center

Yokohama, Japan

32

4

6

8

10

16

28

32

44

52

54

84

86

88

92

MESSAGE FROM THE GENERAL CHAIRSMESSAGE FROM THE TECHNICAL PROGRAM CHAIRSICMR 2018 CONFERENCE ORGANIZATIONKEYNOTEPANEL / INDUSTRIAL TALKPROGRAM AT A GLANCETUTORIALWORKSHOPMAIN PROGRAMDAY 1 62 DAY 2 70 DAY 3 78 DAY 4FLOOR PLANPOSTER AREA PLANDIRECTION TO RECEPTION VENUE / DIRECTION TO BANQUET VENUEICMR 2018 SUPPORTERS

CONTENTS

54

MESSAGE FROM THE GENERAL CHAIRS

We are delighted to welcome you to the eighth ACM International Conference on Multimedia Retrieval, ACM ICMR 2018, which is held from June 11th to 14th, 2018 in Yokohama, Japan. Welcome to Yokohama, Japan’s second most populous city and the symbolic city of its opening to the world. ACM ICMR is the premier conference and a worldwide event bringing together experts and practitioners in the fields related to multimedia retrieval across academia and industry. The core feature of the conference, which continues this year as in every year, is the outstanding Technical Program chosen through a highly selective review process. In addition to the Technical Program, this year’s conference features a diverse range of activities including Keynote talks, Demonstrations, Special Sessions, a Panel, a Doctoral Symposium, Industrial Talks, and Tutorials. Additionally, Workshops bring focus on new topics for investigation. In attempt to continuously improve ACM ICMR and ensure its vibrant role in the multimedia community, we have made a number of enhancements for this year’s conference. Among them, it should be noted that the Technical Program Committee decided to remove the distinction of long and short papers and unified them into a single “regular paper” track. We greatly acknowledge those who have contributed to the success of ACM ICMR 2018. We thank the many paper authors and proposal contributors for the various technical and program

components. We thank the large number of volunteers, including the Organizing Committee members and the Technical Program Committee members who worked very hard to create this year’s outstanding conference. Every aspect of the conference was aided by the Local Committee members to whom we are very grateful. We thank also the ACM and Sheridan Printing for their constant support. Finally, we thank our official sponsors, ACM and SIGMM, for their sustained help. We also thank our many supporters from Japan who generously sponsored ACM ICMR 2018. This includes NEC, Hitachi, NVIDIA, CyberAgent, LIFULL, Mercari, DeNA, and YAHOO! JAPAN. Other generous supports were kindly provided by Kayamori Foundation, The Telecommunications Advancement Foundation, SCAT (Support Center for Advanced Telecommunications Technology Research, Foundation), and Yokohama Convention & Visitors Bureau.

Enjoy ACM ICMR 2018!

Kiyoharu AizawaThe University of Tokyo

ACM ICMR 2018 General ChairsShin’ichi Satoh National Institute of Informatics

Michael Lew Leiden University

6 7

MESSAGE FROM THE TECHNICAL PROGRAM CHAIRS

We are pleased to introduce you to the Technical Program for the eighth ACM International Conference on Multimedia Retrieval, ACM ICMR 2018, which will take place from June 11 to June 14, 2018 in Yokohama, Japan.ACM ICMR is the premier conference in the area of multimedia retrieval, offering great opportunities for exchanging leading-edge multimedia retrieval ideas among researchers and practitioners across academia and industry. Multimedia computing, indexing, and retrieval continue to be one of the most exciting and fastest-growing research areas in the field of multimedia technology. The Technical Program of ACM ICMR have gathered outstanding works advancing the field of multimedia retrieval with state-of-the-art technologies, which were selected through a highly selective review process.From this year, following the success initiated at ACM Multimedia 2017, the conference invited research papers with varying lengths from six to eight pages, plus an additional one page for references. There is no longer the distinction between long and short papers but authors may themselves decide on the appropriate length of the paper. Long and short paper tracks were unified into a single “regular paper” track, and all papers went through the same review process and review period. This year, ACM ICMR received 179 submissions to the main conference. After a rigorous review process by the program committee members, 68 papers were accepted for presentation including 44 regular papers, 11 special session papers, 8 demo papers, and 5 doctoral symposium papers. Detailed statistics are as follows:

Benoit HuetEurecom

ACM ICMR 2018 Technical Program ChairsQi Tian The University of Texas at San Antonio

Keiji YanaiThe University of Electro-Communications, Tokyo

Regarding regular papers, we aimed to have four reviews for each submission. We have recruited 111 program committee members to contribute their professional reviews to the submission in a double-blind manner. In the end, we received 518 reviews, which means 3.8 reviews for each regular submission. We, TPC co-chairs, selected 21 oral papers and 23 poster papers based on the review scores and comments. As a result, we were able to prepare a very strong technical program, although unfortunately we had to reject many good papers.We greatly acknowledge those who have contributed to the success of ACM ICMR 2018. We thank the many paper authors and Technical Program Committee members who worked very hard to create this year’s outstanding program.

TRACKS # SUBMISSIONS # ACCEPTED ACCEPTANCE RATE

REGULAR 136 44 (ORAL: 21, POSTER: 23) 32.4% (ORAL: 15.4%)

SPECIAL SESSION 23 11 (ORAL: 5, POSTER: 6) 47.8% (ORAL: 21.7%)

DEMONSTRATIONS 13 8 61.5%

DOCTORAL SYMPOSIUM 7 5 71.4%

98

ICMR 2018 CONFERENCE ORGANIZATION

Kiyoharu Aizawa (The Univ. of Tokyo, Japan)

Michael Lew (Leiden Univ., Netherlands)

Shin’ichi Satoh (National Inst. of Informatics, Japan)

Noboru Babaguchi (Osaka Univ. , Japan)

Yong Rui (Lenovo, China)

Benoit Huet (Eurecom, Japan)

Qi Tian (The Univ. of Texas at San Antonio, Japan)

Keiji Yanai (The Univ. of Electro-Comm., Japan)

Ichiro Ide (Nagoya Univ., Japan)

Toshihiko Yamasaki (The Univ. of Tokyo, Japan)

Go Irie (Nippon Telegraph & Telephone, Japan)

Tao Mei (JD.com, China)

Martha Larson (Delft Univ. of Tech., Netherlands)

Takahiro Ogawa (Hokkaido Univ., Japan)

Shuqiang Jiang (Chinese Academy of Sci., China)

Takahiro Mochizuki (NHK STRL, Japan)

Winston Hsu (National Taiwan Univ., Taiwan)

Naoko Nitta (Osaka Univ., Japan)

Chong-Wah Ngo (City Univ. of Hong Kong, China)

Vincent Oria (New Jersey Inst. of Tech., United States)

Richang Hong (Hefei Univ. of Tech., China)

Balakrishnan Prabhakaran (The Univ. of Texas at Dallas, United States)

Koichi Shinoda (Tokyo Inst. of Tech., Japan)

Zhipeng Wu (Apple Japan, Japan)

Alexander Hauptmann (Carnegie Mellon Univ., United States)

Chil-Woo Lee (Chonnam National Univ., Korea)

Rainer Lienhart (Univ. of Augsburg, Germany)

Changsheng Xu (Chinese Academy of Sci., China)

Yusuke Matsui (National Inst. of Informatics, Japan)

Yoshitaka Ushiku (The Univ. of Tokyo, Japan)

Yusuke Uchida (DeNA, Japan)

ConferenceCo-Chairs

Honorary Co-Chairs

Technical Program Co-Chairs

Organizing Co-Chairs

Practitioners Co-Chairs

Doctoral Symposium Co-Chairs

Special Session Co-Chairs

Workshop Co-Chairs

Panel Co-Chairs

Tutorial Co-Chairs

Demo Co-Chairs

Publicity Co-Chairs

Publication Co-Chairs

Finance Chair

1110

KEYNOTE

1312

ABSTRACT

The media environment of program production, content delivery, and viewing has been changing because of progress in broadcasting and communication technologies and other technologies like IoT, cloud computing, and artificial intelligence (AI). In December 2018, 8K and 4K UHDTV satellite broadcasting will start in Japan, which means that viewers will soon be able to enjoy 8K and 4K programs featuring a wide color gamut and high dynamic range characteristics together with 22.2 multi-channel audio at home. Meanwhile, distribution services for sending content to PCs and smartphones through the Internet have rapidly been spreading and the introduction of the next generation of mobile networks (5G) will accelerate their spread. The coming of such advanced broadcast and broadband technologies and consequent changes in lifestyle will provide broadcasters with a great opportunity for a new stage of development. At NHK Science & Technology Research Laboratories (NHK STRL), we are pursuing a wide range of research with the aim of creating new broadcast services that can provide viewing experiences never before imagined and user experiences more attuned to daily life. To enhance the convenience of television and the value of TV programming, we are developing technology for connecting the TV experience with various activities in everyday life. Extensions to “Hybridcast Connect” will drive applications that link TVs, smartphones, and IoT. They will enable spontaneous consumption of content during everyday activities through various devices around the user. Establishing a new program production workflow with AI, which we call “Smart Production”, is one of our most important research topics. We are developing speech and face recognition technologies for making closed captions and metadata

BIO

Kohji Mitani joined NHK in 1987. He has contributed to develop 8K-SHV, for a future television system since 1995 at Science & Technology Research Laboratories. Especially he was involved in the development of 8K camera systems as a person in charge. He moved to the headquarters of NHK in 2010 and has belonged to Engineering Administration Department for the development of the practical 8K production and broadcasting system. In 2016, he was appointed to the current position at NHK. He received the Ph.D. degree in 1999, from Kyoto University and the fellow grade of membership from SMPTE in 2010. He is the vice president of the Institute of Image Information and Television Engineers for 2017-2018 term.

efficiently, as well as technologies for automatically converting content into computer-generated sign language, audio descriptions, and simplified Japanese. This presentation introduces these research achievements targeting 2020 and beyond, as well as other broadcasting technology trends including 4K8K UHDTV broadcasting in Japan, 3D imaging, and VR/AR.

Kohji MITANIDeputy Director of Science & Technology Research Laboratories NHK (Japan Broadcasting Corporation)

THE ONGOING EVOLUTION OF BROADCAST TECHNOLOGY

KEYNOTE KEYNOTE

14 15

ABSTRACT

As an industrial designer I have worked in collaboration with various researchers and scientists since the beginning of this century. I have made many prototypes showing the possibility of their leading edge technologies, and exhibited them in these years. As the archives of academic documents and papers have became open, and the internet gave public access to the recordings of various experiments being conducted throughout the world, technology in laboratories are now constantly exposed to the public. In this context, prototypes are becoming more important as the medium that bridges between advanced technology and society. Now a prototype is not merely an experimental machine. It is a device created to present user experience in advance, to share the benefits of the technology with many others. The role of a prototype is not limited to just sharing of values within the development team, but goes beyond that: it is a medium used to voice the significance of research and development to society; an inspiration to stimulate future markets; and also a tool to secure development budgets. A prototype is the physical embodiment of speculative story that connects people to technology that has yet to be brought to society. I would like to introduce some of the prototypes we developed and share the future vision they invoke.

BIO

As a design engineer, Yamanaka Shunji has designed industrial products ranging from wristwatches to railway carriages, while also developing the technology behind robots and telecommunication systems. He graduated with a BA in Engineering from The University of Tokyo in 1982 and spent five years at the Nissan Motors Design Centre before becoming a freelance industrial designer in 1987. In 1994, he founded his industrial design practice, Leading Edge Design, where he serves as president. From 2008 to 2012 he was a Professor at Keio University. He became a Professor at The University of Tokyo in 2013. His recent research focuses on re-examining the relationships between humans and man-made objects through projects such as beautiful prosthetics and lifelike robots. Yamanaka Shunji has been awarded numerous honours, including the 2004 Mainichi Design Award (sponsored by the major Japanese newspaper, Mainichi Shimbun), the iF Design Award, and multiple Good Design Awards (backed by the Japan Ministry of Economy, Trade, and Infrastructure). His 2010 work, Tagtype Garage Kit, is part of the New York Museum of Modern Art’s permanent collection.

Shunji YamanakaDesign Engineer / Professor, The University of Tokyo (Interfaculty Initiative in Information Studies / Institute of Industrial Science)

PROTOTYPING FOR ENVISIONING THE FUTURE

KEYNOTE KEYNOTE

1716

PANEL / INDUSTRIAL TALK

1918

TOP-5 PROBLEMS IN MULTIMEDIA RETRIEVAL

PANELISTS

Tat-Seng Chua (National University of Singapore)Michael Houle (National Institute of Informatics)Ramesh Jain (University of California, Irvine)Nicu Sebe (University of Trento)Rainer Lienhart (University of Augsburg)

FACILITATORS

Chong-Wah Ngo (City University of Hong Kong)Vincent Oria (New Jersey Institute of Technology)

ABSTRACT

Multimedia retrieval is hard, but what exactly are the core problems that are fundamentally important for all/most sub-fields of retrieval? Semantic gap is well regarded as a core problem since it was first named in year 2000. But after almost 20 years, how much have we tackled the problem? In addition to semantic gap, we have user gap. Is user gap a core problem? If so, why relatively more papers in this field are about semantic gap rather than user gap? Fusion should be a fundamental problem because we deal with different media, multiple modalities and multi-sensory data. But, most of the published papers use average or max fusion (pooling) or simply let neural network learn an “embedding space” to fuse different forms of data. Is fusing data in this ad-hoc manner sufficient and scientific in the context of retrieval? This community is particularly good in dealing with big data. But since the advancement of deep learning, more researches are about using big training data but small testing data for retrieval and annotation. Scalability is no longer an issue? Retrieval is supposed to be real-time and requires indexing. Is indexing multimedia data a core problem, or is it sufficient to just index individual media using off-the-shelf techniques and then combine the result for retrieval? Another issue that is seldom discussed in this community is the effect of the so called “curse of dimensionality” on Multimedia information retrieval. Multimedia data are represented as vectors in high dimensional spaces. As the dimensionality increases, the discriminative ability of similarity measures diminishes to the point where methods such as search and clustering that depend on them lose their effectiveness.

Are these problems (semantic gap, user gap, fusion, scalability, indexing) fundamentally important to retrieval? How far have we reached the solutions that the community can endorse? How do the current solutions impact the future era of retrieval and the applicability of multimedia search in different applications? This panel aims to solicit expert views from senior researchers who have actively contributed to the field for more than 20 years, and possibly pave a way for new research direction(s) in multimedia retrieval.

PANEL

PANEL / INDUSTRIAL TALK PANEL / INDUSTRIAL TALK

20 21

ABSTRACT

Recent advancements in image recognition technologies has enabled image recognition-based systems to be widely used in real world applications. In this talk, I will introduce NEC’s image-based object recognition technologies targeted for recognizing various manufactured goods and retail products from a camera, and talk about their industrial applications which we have developed and commercialized. These image-based object recognition technologies enable highly efficient and cost-effective management of goods and products throughout their life-cycle (manufacturing, distribution, retail, and consumption), which otherwise cannot be achieved by human labor or by use of ID tags. Firstly, I will talk about a technology to recognize multiple objects from a single image using feature matching of compact local descriptors, combined with a more recent Deep Learning-based recognition. It enables large number of objects to be recognized at once, which greatly reduces human labor and time for various product inspection and checking works. Using this technology, we have developed and commercialized the product inspection system in warehouses, the planogram recognition system for retail shop shelves, and the self-service POS system for easy-to-use and fast checkout in retail stores. Secondly, I will talk about the “Fingerprint of Things” technology. It enables individual identification of tiny manufactured parts (e.g. bolts and nuts) by identifying images of their unique surface patterns, just like human fingerprints. We have built a prototype of mass-produced parts traceability system, which enables users to easily track down the individual parts using a mobile device. In the talk, I will explain the key issues in realizing these industrial applications of image-based object recognition technologies.

BIO

Kota Iwamoto is a Principal Researcher of Data Science Research Laboratories, NEC Corporation. He received the B.E. degree and the M.E. degree in Electronics, Information and Communication Engineering from Waseda University in 2001 and 2003, respectively. He joined NEC Corporation in 2003 and since has been engaged in the research and commercialization of image processing, image/video indexing and retrieval, and image recognition technologies. Since 2007, he has also been involved in the ISO/IEC 15938 (known as MPEG-7) standardization and has contributed to the establishment of many international standards as an ISO project editor. He received the Young Scientists’ Prize of the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology (Japan) in 2012, the Advanced Technology Award “The Prize of Minister of Economy, Trade and Industry” in 2013, the JSPE Takagi Award in 2014, and the IEICE SUEMATSU-Yasuharu Award in 2017.

Kota IwamotoNEC Corporation

NEC’S OBJECT RECOGNITION TECHNOLOGIES AND THEIR INDUSTRIAL APPLICATIONS


2322

ABSTRACT

Social Networking Services (SNS) depend on user-generated content (UGC). A fraction of UGC is considered spam, such as adult, scam and abusive content. In order to maintain service reliability and avoid criminal activity, content moderation is employed to eliminate spam from SNS. Content moderation consists of manual content-monitoring operations and/or automatic spam-filtering. Detecting a small portion of spam among a large amount of UGC mostly relies on manual operation, thus it requires a large number of human operators and sometimes suffers from human error. In contrast, automatic spam-filtering can be processed with smaller cost, however it is difficult to follow spams’ continuously changing trend, and it may declines service experience due to false positives. This presentation introduces an integrated content moderation platform called “Orion”, which aims to minimize manual process and maximize detection of spam in UGC data. Orion preserves post history by users and services, which enables calculating the risk level of each user and decide whether monitoring is required. Also, Orion has a scalable API that can perform number of machine-learning based filtering processes, such as DNN (Deep Neural Network) and SVM for text and images that are posted in many SNS systems. We show that Orion improves efficiency of content moderation compared to a fully manual operation.

Yusuke FujisakaCyberAgent, Inc.

ORION: AN INTEGRATED MULTIMEDIA CONTENT MODERATION SYSTEM FOR WEB SERVICES

BIO

Yusuke Fujisaka is a machine learning research-and-development engineer at CyberAgent, Inc. since 2012. He studied computer science and telecommunication, and received B.S. and M.S. of Engineering from the Tokyo Institute of Technology in 2010, 2012, respectively. He currently works to build middleware, integrated platform systems for services provided by the company, with multimedia processing technology including image, audio and video recognition utilizing deep learning and other machine learning techniques.


24 25

ABSTRACT

The LIFULL HOME’S Data Set, which is provided for academic use since November 2015, is being used for research in a variety of fields such as economics, architecture, urban science and so on. In particular, since it contains 83 million object property images and 5.1 million floor plan images, utilization in the computer vision and multimedia field is thriving, and papers using data sets are also adopted at the top conference ICCV 2017 in the image processing field it is. This presentation summarizes the results that have been obtained through the provision of datasets, and shows plans to promote open innovation in the field of real estate technology.

BIO

Yoji Kiyota is a principal investigator at LIFULL Co., Ltd., which operates LIFULL HOME’S, the largest real estate properties listing website in Japan. His research interests include recommender systems, information navigations systems, and applications of natural language processing. From 2004 to 2012, he was engaged in research on information navigation systems for libraries as Assistant Professor and Special Lecturer at Information Technology Center, The University of Tokyo. In 2007, he was involved in the foundation of Littel Corporation, a start-up from The University of Tokyo, and worked on developing large-scale data processing technologies. He joined LIFULL in 2011 because of the LIFULL’s acquisition of Littel Corporation. In addition to research and development such as recommender systems for real estate information, he is engaged in joint research with universities, and promotion of open innovation through data provision. He is an editorial board of the Japanese Society for Artificial Intelligence (JSAI), and a secretary of the SIG in database systems of the Information Processing Society of Japan. He holds PhD and MS degrees in informatics, and BS degrees in engineering from Kyoto University. Corporate web site: https://lifull.com/en/

Yoji KiyotaPh.D., Principal Investigator, LIFULL Lab, LIFULL Co., Ltd.

PROMOTING OPEN INNOVATIONS IN REAL ESTATE TECH: PROVISION OF THE LIFULL HOME’S DATA SET AND COLLABORATIVE STUDIES


26 27

ABSTRACT

Hitachi has a wide variety of technologies ranging from systems for infrastructure to IT platforms such as railway management systems, water supply operation systems, manufacturing management systems for factories, surveillance cameras and monitoring systems, rolling stocks, power plants, servers, storages, data centers, and various IT systems for governments and companies. The research and development group of Hitachi is developing video analytics and other media processing techniques and applying them to various products and solutions with business divisions for such as public safety, productivity improvement of factories and other IT applications. In this talk, I would like to introduce some of the products, solutions and research topics in Hitachi which video analytics and image retrieval techniques are applied. These include an image search system for retrieving public registered design graphics, a person detection and tracking function for video surveillance system and our activities and results in TRECVID 2017. In each cases, we integrated our original high speed image search database and deep learning based image recognition technique. Through these use cases, I would like to present how image recognition and retrieval technologies are practically utilized to industrial products and solutions and contributing to the improvement of social welfare.

Tomokazu MurakamiPh.D., Senior Researcher Hitachi, Ltd., Research & Development Group, Center for Technology Innovation

INDUSTRIAL APPLICATIONS OF IMAGE RECOGNITION AND RETRIEVAL TECHNOLOGIES FOR PUBLIC SAFETY AND IT SERVICES

BIO

Tomokazu Murakami is a Senior Researcher of Hitachi, R&D Group, Center for Technology Innovation, currently leading a team of researchers developing video analytics technologies. He has been engaged in research and development of image processing, image coding and image recognition techniques for 20 years and now his team is developing new technologies and solutions for video surveillance and content management systems. He received a Master degree in Information and Communication Engineering and a Ph.D. degree in Information Science and Technology from The University of Tokyo. Dr. Murakami is a member of The Institute of Electronic, Information, and Communication Engineers (IEICE), The Institute of Image Information and Television Engineers (ITE), and The Virtual Reality Society of Japan (VRSJ).


2928

PROGRAM AT A GLANCE

3130

PROGRAM AT A GLANCEJUNE 11

9:00-9:30

DAY 1: TUTORIAL & WORKSHOP

REGISTRATION

WORKSHOP

1WORKSHOP

2TUTORIAL

1

TUTORIAL

2TUTORIAL

3WORKSHOP

3

WORKSHOP

1WORKSHOP

2TUTORIAL

1

WORKSHOP

1WORKSHOP

2TUTORIAL

2

ORAL SESSION 2(9:30-11:10)

REGISTRATION REGISTRATION REGISTRATION

PANEL

MORNING BREAK

INDUSTRIAL TALKS

WELCOME (9:45)

KEYNOTE 1

MORNING BREAK

AFTERNOON BREAK

AFTERNOON BREAK

ACMMM TPC WORKSHOP 2

ACMMM TPC WORKSHOP 1

RECEPTION BANQUET

ORAL SESSION 1

SPECIALSESSION

1

SPECIALSESSION

2

KEYNOTE 2

DOCTRALSYMPOSIUM

POSTER/DEMO/EXHIBIT

DEMO/EXHIBIT

SPECIALSESSION POSTER

ORAL SESSION 4

AFTERNOON

BREAKAFTERNOON

BREAK

BEST PAPER SESSION

LUNCH BREAKLUNCH BREAK

LUNCH BREAK LUNCH BREAK

LUNCH BREAK LUNCH BREAK

MORNING BREAK

ORAL SESSION 3

POSTER SPOTLIGHT

MORNING BREAK

DAY 2: MAIN CONFERENCE

DAY 3: MAIN CONFERENCE

DAY 4: INDUSTRIAL DAY & ACM MM TPC WORKSHOP

9:30-10:00

10:00-10:30

10:30-11:00

11:00-11:30

11:30-12:00

12:00-12:30

12:30-13:00

JUNE 12 JUNE 13 JUNE 14

13:00-13:30

13:30-14:00

14:00-14:30

14:30-15:00

15:00-15:30

15:30-16:00

16:00-16:30

16:30-17:00

17:00-17:30

17:30-18:00

18:00-18:30

18:30-19:00

JUNE 11 JUNE 12 JUNE 13 JUNE 14

PROGRAM AT A GLANCE PROGRAM AT A GLANCE

3332

TUTORIAL

34 35

ABSTRACT

For decades, we are interested in detecting objects and classifying them into a fixed vocabulary of lexicon. With the maturity of these low-level vision solutions, we are hunger for a higher-level representation of the visual data, so as to extract visual knowledge rather than merely bags of visual entities, allowing machines to reason about human-level decision-making and even manipulate the visual data at the pixel-level. In this tutorial, we will introduce a various of machine learning techniques for modeling visual relationships (e.g., subject-predicate-object triplet detection) and contextual generative models (e.g., generating photo-realistic images using conditional generative adversarial networks). In particular, we plan to start from fundamental theories on object detection, relationship detection, generative adversarial networks, to more advanced topics on referring expression visual grounding, pose guided person image generation, and context based image inpainting.

BIO

Dr. Hanwang Zhang is an Assistant Professor at Nanyang Technological University, Singapore. He was a research scientist at the Department of Computer Science, Columbia University, USA and a senior research fellow at the School of Computing, National University of Singapore, Singapore. He has received the B.Eng (Hons.) degree in computer science from Zhejiang University, Hangzhou, China, in 2009, and the Ph.D. degree in computer science from the National University of Singapore in 2014. His research interest includes computer vision, multimedia, and social media. Dr. Zhang is the recipient of the Best Demo runner-up award in ACM MM 2012, the Best Student Paper award in ACM MM 2013, and the Best Paper Honorable Mention in ACM SIGIR 2016. He is also the winner of Best Ph.D. Thesis Award of School of Computing, National University of Singapore, 2014.

Dr. Qianru Sun is a postdoctoral researcher in the department of Computer Vision and Multimodal Computing, Max-Planck Institute for Informatics, Germany. She holds a PhD degree from the School of Electronics Engineering and Computer Science, Peking University since Jan. 2016. Her research interests include computer vision and pattern recognition. Specific experiences have been made in human action recognition, anomaly event detection in videos, social relation recognition and head image inpainting in social media photos, person image generation for both low-resolution reidentification images and high-resolution fashion photos.

Hanwang ZhangNanyang Technological University, Singapore

Qianru SunMax-Planck Institute for Informatics, Germany

OBJECTS, RELATIONSHIPS, AND CONTEXT IN VISUAL DATA

TUTORIAL 1 TUTORIAL 1

3736

ABSTRACT

Recommendation systems play a vital role in online information systems and have become a major monetization tool for user-oriented platforms. In recent years, there has been increasing research interest in recommendation technologies in the information retrieval and data mining community, and significant progress has been made owing to the fast development of deep learning. However, in the multimedia community, there has been relatively less attention paid to the development of multimedia recommendation technologies. In this tutorial, we summarize existing research efforts on multimedia recommendation. We first provide an overview on fundamental techniques and recent advances on personalized recommendation for general items. We then summarize existing developments on recommendation technologies for multimedia content. Lastly, we present insight into the challenges and future directions in this emerging and promising area.

Xiangnan HeNational University of Singapore, Singapore

Hanwang ZhangNanyang Technological University, Singapore

Tat-Seng ChuaNational University of Singapore, Singapore

RECOMMENDATION TECHNOLOGIES FOR MULTIMEDIA CONTENT

BIO

Dr. Xiangnan He is a senior research fellow with School of Computing, National University of Singapore (NUS). He received his Ph.D. in Computer Science from NUS. His research interests span recommender systems, information retrieval, and multimedia processing. He has over 30 publications appeared in several top conferences such as SIGIR, WWW, MM, CIKM, and IJCAI, and journals including TKDE, TOIS, and TMM. His work on recommender systems has received the Best Paper Award Honorable Mention of ACM SIGIR 2016. Moreover, he has served as the PC member for the prestigious conferences including SIGIR, WWW, MM, KDD, WSDM, CIKM, IJCAI, AAAI, and ACL, and the regular reviewer for prestigious journals including TKDE, TOIS, TKDD, TMM etc.

Dr. Hanwang Zhang is an Assistant Professor at Nanyang Technological University, Singapore. He was a research scientist at the Department of Computer Science, Columbia University, USA and a senior research fellow at the School of Computing, National University of Singapore, Singapore. He has received the B.Eng (Hons.) degree in computer science from Zhejiang University, Hangzhou, China, in 2009, and the Ph.D. degree in computer science from the National University of Singapore in 2014. His research interest includes computer vision, multimedia, and social media. Dr. Zhang is the recipient of the Best


38 39

Demo runner-up award in ACM MM 2012, the Best Student Paper award in ACM MM 2013, and the Best Paper Honorable Mention in ACM SIGIR 2016. He is also the winner of Best Ph.D. Thesis Award of School of Computing, National University of Singapore, 2014.

Dr. Tat-Seng Chua is the KITHCT Chair Professor at the School of Computing, National University of Singapore. He holds a PhD from the University of Leeds, UK. He was the Acting and Founding Dean of the School from 1998-2000. Dr Chua’s main research interest is in multimedia information retrieval and social media analytics. In particular, his research focuses on the extraction, retrieval and question-answering (QA) of text and rich media arising from the Web and multiple social networks. He is the co-Director of NExT, a joint Center between NUS and Tsinghua University to develop technologies for live social media search. Dr Chua is the 2015 winner of the prestigious ACM SIGMM award for Outstanding Technical Contributions to Multimedia Computing, Communications and Applications. He is the Chair of steering committee of ACM International Conference on Multimedia Retrieval (ICMR) and Multimedia Modeling (MMM) conference series. Dr Chua is also the General Co-Chair of ACM Multimedia 2005, ACM CIVR (now ACM ICMR) 2005, ACM SIGIR 2008, and ACM Web Science 2015. He serves in the editorial boards of four international journals. Dr. Chua is the co-Founder of two technology startup companies in Singapore.


4140

Guo-Jun QiUniversity of Central Florida

MULTIMEDIA CONTENT UNDERSTANDING BY LEARNING FROM VERY FEW EXAMPLES: RECENT PROGRESS ON UNSUPERVISED, SEMI-SUPERVISED AND SUPERVISED DEEP LEARNING APPROACHES

ABSTRACT

In this tutorial, the speaker will present serval parallel efforts on building deep learning models with very few supervision information, with or without unsupervised data available. In particular, we will discuss in details.(1) Generative Adverbial Nets (GANs) and their applications to unsupervised feature

extractions, semi-supervised learning with few labeled examples and a large amount of unlabeled data. We will discuss the state-of-the-art results that have been achieved by the semi-supervised GANs.

(2) Low-Shot Learning algorithms to train and test models on disjoint sets of tasks. We will discuss the ideas of how to efficiently adapt models to tasks with very few examples. In particular, we will discuss several paradigms of learning-to-learn approaches.

(3) We will also discuss how to transfer models across modalities by leveraging abundant labels from one modality to train a model for other modalities with few labels. We will discuss in details the cross-modal label transfer approach.

BIO

Dr. Qi is a faculty member in the Department of Computer Science at the University of Central Florida. His research interests include knowledge discovery, analysis and aggregation of big data deluging from a variety of modalities and sources in order to build smart and reliable information and decision-making systems. He aspires to apply my research to solve the practical problems through high quality data processing and analysis in healthcare, sensor and social networks, financial systems and so forth. He was the recipient of one-time Microsoft Fellowship, and twice IBM Fellowships. His research has been sponsored by grants and projects from government agencies and industry collaborators, including NSF, IARPA, Microsoft, IBM, and Adobe. Dr. Qi has published more than 100 papers in a broad range of venues, such as Proceedings of IEEE, IEEE T PAMI, IEEE T KDE, IEEE T Image Processing, ACM SIGKDD,


42 43

WWW, ICML, ACM MM, CVPR, ICDM, SDM and ICDE. Among them are the best student paper of ICDM 2014, “the best ICDE 2013 paper” by IEEE Transactions on Knowledge and Data Engineering, as well as the best paper (finalist) of ACM Multimedia 2007 (2015). He has served or will serve as a technical program co-chair for MMM 2016 and ACM Multimedia 2020, and an area chair (a senior program committee member) for ICCV, ICPR, ACM SIGKDD, ACM CIKM, as well as ACM Multimedia. He is also serving or has served in the program committees of several academic conferences, including CVPR, ICCV, KDD, WSDM, CIKM, IJCAI, ICMR, ACM Multimedia, ACM/IEEE ASONAM, ICDM, ICIP, and ACL. He is an associate editor for IEEE Transactions on Circuits and Systems for Video Technology (CSVT), as well as a guest/lead editor for the special issue on “Big Media Data: Understanding, Search, and Mining” in IEEE Transactions on Big Data, “Deep Learning for Multimedia Computing” in IEEE Transactions on Multimedia, and “Social Media Mining and Knowledge Discovery” in Multimedia Systems, Springer. He was also a panelist for the NSF and the United States Department of Energy.


4544

WORKSHOP

46 47

WORKSHOP “LIFELOG SEARCH CHALLENGE”MESSAGE FROM WORKSHOP GENERAL CHAIRS

It is our great pleasure to welcome you to the first Lifelog Search Challenge – LSC 2018. Lifelogging is the process of capturing multiple aspects of one’s life in digital form and is becoming an increasingly important research topic. However, lifelog organisation and retrieval continues to pose significant challenges for the research community and there has been almost no progress on the evaluation of interactive lifelog retrieval systems to date. Consequently, the mission of the LSC is to support the development and comparative evaluation of interactive lifelog retrieval systems by releasing test collections and defining research challenges to be solved by the community in an open and collaborative manner. We hope that the LSC 2018 will be first annual workshop in a series. LSC 2018 promises to be a highly interactive and entertaining workshop modelled on the successful Video Browser Showdown (VBS) annual competition at the MMM conference series. LSC is a participation workshop, which means that the participants write and present a paper describing their retrieval system, as well as taking part in the live interactive search challenge. Consequently, the workshop is highly interactive, with seven short oral presentations, a lively panel discussion, followed by the actual search challenge with integrating both expert and novice user experimental evaluations. The call for papers attracted submissions from Asia and Europe and ultimately seven papers were selected for inclusion in the program, from Ireland, Austria, the Czech Republic, Spain, Netherlands and Vietnam. Six of these papers have translated into working interactive search engines for evaluation, including refinements of existing successful retrieval engines from the VBS, as well as four new systems that consider novel access and retrieval mechanisms,

including the use of Virtual Reality for user interaction.Putting together LSC 2018 was a team effort. We first thank the authors/developers for providing the content of the program and systems for the challenge. We are grateful to the organisers and program committee for reviewing papers and providing feedback for authors. We thank the team at Dublin City University and Klagenfurt University for preparing the dataset, the novel time-specific information needs and the real-time evaluation engine. Finally, we thank the chairs of ACM ICMR’18, the workshop chairs who provided wonderful support and the ACM SIGs.We hope that you will find this program interesting and thought-provoking and that the workshop will provide you with a valuable opportunity to share ideas with other researchers and teams from around the world.

Cathal GurrinOrganising co-ChairDublin City University, Ireland

Duc-Tien Dang-NguyenOrganising co-ChairDublin City University, Ireland

Klaus SchoeffmannOrganising co-ChairKlagenfurt University, Austria

Michael RieglerOrganising co-ChairOslo Metropolitan University, Norway

Hideo JohoOrganising co-ChairUniversity of Tsukuba, Japan

Luca PirasOrganising co-ChairUniversity of Cagliari, Italy

WORKSHOP 1 WORKSHOP 1

4948

WORKSHOP “MMART&ACM 2018”MESSAGE FROM WORKSHOP GENERAL CHAIRS

WORKSHOP ORGANIZATION

It is our great pleasure to welcome you to the 2018 International Joint Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia – MMArt&ACM’18. This is the first time that we join two workshops, i.e., International Workshop on Multimedia Artworks Analysis (MMArt) and International Workshop on Attractiveness Computing in Multimedia (ACM), in order to enlarge the scope of discussion issues and inspire more works in related fields. This is also the first time we host these two workshops at ACM International Conference on Multimedia Retrieval (ICMR). Previously we already had two successful and highly interactive MMArts and ACMs at different venues. In this year’s joint workshop, we will have one keynote talk and six inspiring technical papers. The keynote talk will be given by Dr. Takeshi Yamada from teamLab Incorporation. The six technical papers are from Japan and Taiwan. We appreciate the contributions of our respected keynote speaker and contributors of technical papers. We wish MMArt&ACM 2018 be a small but highly interactive and enjoyable workshop. This joint workshop cannot be made without the contributions of our program committee. We would like to thank all committee members for their great efforts in paper reviewing and suggestion. We would also thank ICMR to be the home venue to host this joint workshop.

MMArt&ACM 2018 Organizers,Wei-Ta Chu (National Chung Cheng University, Taiwan)Norimichi Tsumura (Chiba University, Japan)Toshihiko Yamasaki (The University of Tokyo, Japan)Takaaki Shiratori (Oculus Research, USA)Hideto Motomura (Panasonic Corporation, Japan)

Organizing Chairs:Cathal Gurrin (Dublin City University, Ireland)Klaus Schoeffmann (Klagenfurt University, Austria)Hideo Joho (University of Tsukuba, Japan)Duc-Tien Dang-Nguyen (Dublin City University, Ireland)Michael Riegler (Oslo Metropolitan University, Norway)Luca Piras (University of Cagliari, Italy)

Additional Programme Committee:Liting Zhou (Dublin City University, Ireland)Jürgen Primus (Klagenfurt University, Austria)Aaron Duane (Dublin City University, Ireland)Bernd Münzer (Klagenfurt University, Austria)Rashmi Gupta (Dublin City University, Ireland)Andreas Leibetseder (Klagenfurt University, Austria)Sabrina Kletz (Klagenfurt University, Austria)

Data Organization Committee:Liting Zhou (Dublin City University, Ireland)Duc-Tien Dang-Nguyen (Dublin City University, Ireland)Rashmi Gupta (Dublin City University, Ireland)Bernd Münzer (Klagenfurt University, Austria)Klaus Schoeffmann (Klagenfurt University, Austria)Cathal Gurrin (Dublin City University, Ireland)


50 51

WORKSHOP “MULTIMEDIA FOR RETECH’18”MESSAGE FROM WORKSHOP GENERAL CHAIRS

It is our great pleasure to organize the first international workshop on multimedia for real estate tech (RETech), in conjunction with ACM ICMR 2018 held in Yokoyama, Japan. Real estate industries, which are essential for our daily life, are one of the frontiers of digitization. Although most of real estate business is currently operated by human labor, innovations based on various information technologies are drastically changing the real estate industries. Especially, multimedia processing and retrieval technologies, including image processing, 3D scanning, geospatial analysis, IoT, virtual reality, augmented reality, and telecommunication systems, are spreading rapidly. However, there are still a lot of research challenges on multimedia technologies to be tackled, e.g., quality, costs, sensitivity, diversity, and attractiveness. The purpose of this workshop is to share, discuss, and encourage research challenges on multimedia technologies in real estate industries.The call for papers attracted seven submissions from Austria, Japan, and Russia. The program committee, with the contribution of additional reviewers, performed a blind peer-review process and accepted five papers. The topics includes computer vision, value estimation, recommender systems, and virtual reality.We first thank all the authors for being interested in the purpose of this workshop and submitting papers. We are also grateful to the program committee and additional reviewers, who worked very hard in reviewing papers and providing feedback for authors. Finally, we thank the organizing committee of ACM ICMR 2018. We hope that you will find this program interesting and thought-provoking and that the symposium will provide you with a valuable opportunity to share ideas with other researchers and practitioners from institutions around the world.

Workshop OrganizationGeneral Chairs:Yoji Kiyota (LIFULL Co., Ltd., Japan) Toshihiko Yamasaki (The University of Tokyo, Japan)Program Committee:Chihiro Shimizu (Nihon University, Japan) Hirohiko Suwa (Nara Institute of Science and Technology, Japan) Yutaka Arakawa (Nara Institute of Science and Technology, Japan) Ryoma Kitagaki (The University of Tokyo, Japan) Shimpei Nomura (Recruit Sumai Company Ltd., Japan)

Yoji KiyotaGeneral Co-chairLIFULL Co., Ltd., Japan

Toshihiko YamasakiGeneral Co-chairThe University of Tokyo,Japan


5352

MAIN PROGRAM

54 55

DAY 1 TUTORIAL & WORKSHOP,JUNE 11, 2018

DAY 1

Registration (9:00-)

Tutorial 1: Objects, Relationships, and Context in Visual Data (Hall)by Hanwang Zhang (Nanyang Technological University), and Qianru Sun (Max-Planck Institute for Informatics)

Tutorial (10:00-11:30)

Morning Break (11:30-12:00)

Tutorial (12:00-13:00)

Tutorial 2: Recommendation Technologies for Multimedia Content (Hall)by Xiangnan He (National University of Singapore), Hanwang Zhang (Nanyang Technological University), and Tat-Seng Chua (National University of Singapore)

Tutorial (14:30-15:30)

Afternoon Break (15:30-16:00)

Tutorial (16:00-17:30)

MAIN PROGRAM

56 57

DAY 1 DAY 1

Tutorial 3: Multimedia Content Understanding by Learning from Very Few Examples: Recent Progress on Unsupervised, Semi-Supervised and Supervised Deep Learning Approaches (Room A)by Guo-Jun Qi (University of Central Florida)

Tutorial (16:00-17:30)

Workshop 1: Workshop on Lifelog Search Challenge (Room A)

Organizing Chairs’ Welcome (10:00-10:15)

Session 1: Oral Paper Presentations (10:15-11:30, Chair: Klaus Schoeffmann)[WS1-1] Andreas Leibetseder, Bernd Muenzer, Sabrina Kletz, Manfred Jürgen Primus, Klaus Schöffmann. LiveXplore at the Lifelog Search Challenge 2018 [WS1-2] Liting Zhou, Zaher Hinbarji, Duc-Tien Dang-Nguyen, Cathal Gurrin. LIFER - An Interactive Lifelog Retrieval System [WS1-3] Jakub Lokoc, Tomáš Souček, Gregor Kovalcik. Using an Interactive Video Retrieval Tool for LifeLog Data

[WS1-4] Aaron Duane, Wolfgang Hürst, Cathal Gurrin. Virtual Reality Lifelog Explorer for the Lifelog Search Challenge at ACM ICMR 2018[WS1-5] Thanh-Dat Truong, Minh-Triet Tran, Vinh-Tiep Nguyen. Lifelogging Retrieval Based on Semantic Concepts Fusion[WS1-6] Adria Alsina, Xavier Giró, Cathal Gurrin. An Interactive Lifelog Search Engine for for the Lifelog Search Challenge at ACM ICMR 2018[WS1-7] Wolfgang Hürst, Kevin Ouwehand, Marijn Mengerink, Aaron Duane, Cathal Gurrin. Geospatial Access to Lifelogging Photos in Virtual Reality


Session 2: Panel Discussion (12:00-12:30)Challenges of Lifelog Search and Access Chair: Cathal GurrinPanel: Duc-Tien Dang-Nguyen, Klaus Schoeffmann, Wolfgang Hurst

Session 3: Lifelog Search Challenge (Expert Users) (12:30-13:00)

Lunch Break (13:00-14:00)

Session 4: Lifelog Search Challenge (Novice Users) (14:00-15:30)

MAIN PROGRAM MAIN PROGRAM

5958

Workshop 2: Joint Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia (Room B)

Workshop Organizers’ Welcome (10:00-10:05)

Keynote Speech (10:05-11:00, Chair: Toshihiko Yamasaki)Digital Art by teamLabby Takeshi Yamada (teamLab inc.)

Session 1: Multimedia Artworks Analysis (11:00-11:45, Chair: Wei-Ta Chu)[WS2-1] Chien-Wen Chen, Wen-Cheng Chen, and Min-Chun Hu, Doodle Master: A Doodle Beautification System Based on Auto-encoding Generative Adversarial Networks[WS2-2] Sijie Shen, Toshihiko Yamasaki, Michi Sato, and Kenji Kajiwara, Photo Selection for Family Album using Deep Neural Networks[WS2-3] Hiya Roy, Toshihiko Yamasaki, and Tatsuaki Hashimoto, Predicting Image Aesthetics Using Objects in the Scene


Session 2: Attractiveness Computing in Multimedia (12:10-12:55, Chair: Norimichi Tsumura)[WS2-4] Mayuko Iriguchi, Hiroki Koda, and Nobuo Masataka, Colour Perception Characteristics of Women in Menopause [WS2-5] Hirokazu Doi, Norimichi Tsumura, and Kazuyuki Shinohara, Temporal Course of Neural Processing during Skin Color Perception [WS2-6] Kensuke Tobitani, Tatsuya Matsumoto, Yusuke Tani, and Noriko Nagata, Modeling the Relation between Skin Attractiveness and Physical Characteristics

DAY 1 DAY 1


60 61

Workshop 3: Workshop on Multimedia for RETech’18 (Room B)

General Chairs’ Welcome (14:30-14:40)

Session 1: Floor Plan Analyses (14:40-15:30, Chair: Yoji Kiyota)[WS3-1] Toshihiko Yamasaki, Jin Zhang, Yuki Takada, Apartment Structure Estimation Using Fully Convolutional Networks and Graph Model[WS3-2] Naoki Kato, Toshihiko Yamasaki, Kiyoharu Aizawa, Takemi Ohama, Users’ Preference Prediction of Real Estates Featuring Floor Plan Analysis using FloorNet


Session 2: Multimedia Applications for Real Estate Industries (16:00-17:15, Chair: Toshihiko Yamasaki)[WS3-3] David Koch, Miroslav Despotovic, Muntaha Sakeena, Mario Doeller, Matthias Zeppelzauer, Visual Estimation of Building Condition with Patch-level ConvNets[WS3-4] Ilya Makarov, Alisa Korinevskaya, Vladimir Aliev, Fast Semi-dense Depth Map Estimation [WS3-5] Naoki Murata, Satoshi Suga, Eichi Takaya, Satoshi Ueno, Yoji Kiyota, Satoshi Kurihara. Proposition of VR-MR Hybrid System for Sharing Living-in-Room

Discussion (17:15-17:30)

Reception (19:00-, Fisherman’s Market)Located inside Yokohama Red Brick Warehouse.

Day 1 Day 1

main program main program

62 63

DAY 2 MAIN CONFERENCE, JUNE 12, 2018

DAY 2


Welcome (9:45-10:00, Hall)

Keynote 1 (10:00-11:00, Hall, Chair: Kiyoharu Aizawa)The Ongoing Evolution of Broadcast Technologyby Kohji Mitani (Science & Technology Research Laboratories NHK)


MAIN PROGRAM

64 65

DAY 2 DAY 2

Special Session 1: Predicting User Perceptions of Multimedia Content (Chair: Claire-Hélène Demarty)

Oral (14:00-14:45, Hall)[SS1-1] Dmitry Kuzovkin, Tania Pouli, Remi Cozot, Olivier Le Meur, Jonathan Kervec and Kadi Bouatouch: Image Selection in Photo Albums[SS1-2] Yasemin Timar, Nihan Karslioglu, Heysem Kaya and Albert Ali Salah: Feature Selection and Multimodal Fusion for Estimating Emotions Evoked by Movie Clips[SS1-3] Sarath Sivaprasad, Tanmayee Joshi, Rishabh Agrawal and Niranjan Pedanekar: Multimodal Continuous Prediction of Emotions in Movies using Long Short-Term Memory Networks

Spotlight (14:45-14:55, Hall)[SS1-4] Yang Liu, Zhonglei Gu, Tobey H. Ko and Kien A. Hua: Learning Perceptual Embeddings with Two Related Tasks for Joint Predictions of Media Interestingness and Emotions[SS1-5] Jayneel Parekh, Harshvardhan Tibrewal and Sanjeel Parekh: Deep Pairwise Classification and Ranking for Predicting Media Interestingness[SS1-6] Ivan Gonzalez Diaz, Jenny Benois-Pineau, Jean-Philippe Domenger and Aymar de Rugy: Perceptually-guided Understanding of Egocentric Video Content: Recognition of Objects to Grasp[SS1-7] Wenlu Yang, Maria Rifqi, Christophe Marsala and Andrea Pinna: Towards Better Understanding of Player’s Game Experience

Best Paper Session (11:30-13:00, Hall, Chair: Benoit Huet)[BS-1] Goncalo Marcelino, Ricardo Pinto and Joao Magalhaes: Ranking News-Quality Multimedia[BS-2] Niluthpol Mithun, Juncheng Li, Florian Metze and Amit Roy-Chowdhury: Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval[BS-3] Shizhe Chen, Jia Chen, Qin Jin and Alex Hauptmann: Class-aware Self-Attention for Audio Event Recognition[BS-4] Andrea Ceroni, Ma Chenyang and Ralph Ewerth: Mining Exoticism from Visual Content with Fusion-based Deep Neural Networks



6766

DAY 2 DAY 2

Demo (14:00-16:30, Room A, Chair: Koichi Shinoda, Zhipeng Wu)[DE-1] Longhui Wei, Xiaobin Liu, Jianing Li and Shiliang Zhang: VP-ReID: Vehicle and Person Re-Identification System[DE-2] Maguell Sandifort, Jianquan Liu, Shoji Nishimura and Wolfgang Hürst: VisLoiter+: An Entropy Model-Based Loiterer Retrieval System with User-friendly Interfaces[DE-3] Wenjie Duan, Kengo Makino, Rui Ishiyama, Toru Takahashi, Yuta Kudo and Pieter Jonker: Automated Scanning and Individual Identification System for Parts without Marking or Tagging[DE-4] Nico Hezel and Kai Uwe Barthel: Dynamic construction and manipulation of hierarchical quartic image graphs[DE-5] Jonas Krause, Gavin Sugita, Kyungim Baek and Lipyeow Lim: WTPlant (What’s That Plant?): a Deep Learning System for Identifying Plants in Natural Images[DE-6] Matthew Cooper, Jian Zhao, Chidansh Bhatt and David Shamma: MOOCex: Exploring Educational Video via Recommendation[DE-7] Yangbangyan Jiang, Qianqian Xu, Xiaochun Cao and Qingming Huang: Who to Ask: An Intelligent Fashion Consultant[DE-8] Chou Po-Wen, Lin Fu-Neng, Chang Keh-Ning and Chen Herng-Yow: A Simple Score Following System for Music Ensembles Using Chroma and Dynamic Time Warping

Special Session 2: Social-Media Visual Summarization / Large-Scale 3D Multimedia Analysis and Applications (Chair: Joao Magalhaes, Rongrong Ji)

Oral (14:55-15:25, Hall)[SS2-1] Po-Yao Huang, Junwei Liang, Jean-Baptiste Lamare and Alexander Hauptmann: Multimodal Filtering of Social Media for Temporal Monitoring and Event Analysis[SS2-2] Xiangyu Yue, Bichen Wu, Sanjit Seshia, Kurt Keutzer and Alberto Sangiovanni-Vincentelli: A LiDAR Point Cloud Generator: from a Virtual World to Autonomous Driving

Spotlight (15:25-15:30, Hall)[SS2-3] Guoyu Lu and Jingkuan Song: 3D Image-based Indoor Localization Joint With WiFi Positioning[SS2-4] Zhiwei Li and Lei Yu: Compare Stereo Patches Using Atrous Convolutional Neural Networks

Special Session Posters (15:30-16:30, Foyer)Posters of all the Special Session Papers will be presented.


68 69

Oral Session 1: Multimedia Retrieval (16:30-18:30, Hall, Chair: Chong-Wah Ngo)[OS1-1] Xing Xu, Jingkuan Song, Huimin Lu, Yang Yang, Fumin Shen and Zi Huang: Modal-adversarial Semantic Learning Network for Extendable Cross-modal Retrieval[OS1-2] Kevin Joslyn, Kai Li and Kien Hua: Cross-Modal Retrieval Using Deep De-correlated Subspace Ranking Hashing[OS1-3] Ge Song and Xiaoyang Tan: Learning multilevel semantic similarity for large-scale multi-label image retrieval[OS1-4] Limeng Cui, Zhensong Chen, Jiawei Zhang, Philip S. Yu, Yong Shi and Lifang He: Multi-view Collective Tensor Decomposition for Cross-modal Hashing[OS1-5] Lei Zhou, Xiao Bai, Xianglong Liu and Jun Zhou: Binary Coding by Matrix Classifier for Effective Subspace Retrieval[OS1-6] Zhongyan Zhang, Lei Wang, Yang Wang, Luping Zhou, Jianjia Zhang and Fang Chen: Instance Image Retrieval by Aggregating Sample-based Discriminative Characteristics

Industrial Exhibition (14:00-16:30, Foyer)[IE-1] NEC Corporation[IE-2] NVIDIA[IE-3] CyberAgent, Inc.[IE-4] LIFULL Co., Ltd.[IE-5] Mercari

DAY 2 DAY 2


70 71

DAY 3 MAIN CONFERENCE, JUNE 13, 2018

DAY 3


Oral Session 2: Multimedia Content Analysis (June 13th, 9:30-11:00, Hall, Chair: Wei-Ta Chu)[OS2-1] Wenjie Zhang, Junchi Yan, Xiangfeng Wang and Hongyuan Zha: Deep eXtreme Multi-label Learning[OS2-2] Feiran Huang, Xiaoming Zhang, Chaozhuo Li, Zhonghua Zhao, Yueying He and Zhoujun Li: Multimodal Network Embedding via Attention based Multi-view Variational Autoencoder[OS2-3] Devanshu Arya and Marcel Worring: Exploiting Relational Information in Social Networks using Geometric Deep Learning on Hypergraphs[OS2-4] Matthias Zeppelzauer, Miroslav Despotovic, Muntaha Sakeena, David Koch and Mario Doller: Automatic Prediction of Building Age from Photographs[OS2-5] Kejun Zhang, Hui Zhang, Simeng Li, Changyuan Yang and Lingyun Sun: The PMEmo Dataset for Music Emotion Recognition

MAIN PROGRAM

72 73

DAY 3 DAY 3

Poster Spotlight Session (12:30-13:00, Hall, Chair: Keiji Yanai)[PS-1] Hanjiang Lai: Transductive Zero-Shot Hashing via Coarse-to-Fine Similarity Mining[PS-2] Xin Luo, Peng-Fei Zhang, Ye Wu, Zhen-Duo Chen, Hua-Junjie Huang and Xin-Shun Xu: Asymmetric Discrete Cross-Modal Hashing[PS-3] Xiang Zhang, Guohua Dong, Yimo Du, Chengkun Wu, Zhigang Luo and Canqun Yang: Collaborative Subspace Graph Hashing for Cross-modal Retrieval[PS-4] Ye Wu, Xin Luo, Xin-Shun Xu, Shanqing Guo and Yuliang Shi: Dictionary Learning based Supervised Discrete Hashing for Cross-Media Retrieval[PS-5] Bingqing Ke, Jie Shao, Zi Huang and Heng Tao Shen: Feature Reconstruction by Laplacian Eigenmaps for Efficient Instance Search[PS-6] Zachary Seymour and Zhongfei Zhang: Image Annotation Retrieval with Text-Domain Label Denoising[PS-7] Zachary Seymour and Zhongfei Zhang: Multi-label Triplet Embeddings for Image Annotation from User-Generated Tags[PS-8] Chandramani Chaudhary, Poonam Goyal, Joel R A Moniz, Navneet Goyal and Yi-Ping Phoebe Chen: Linguistic Patterns and Cross Modality-based Image Retrieval for Complex Queries

Oral Session 3: Multimedia Applications (June 13th, 11:30-12:30, Hall, Chair: Wolfgang Hürst)[OS3-1] Zunlei Feng, Zhenyun Yu, Yezhou Yang, Yongcheng Jing, Junxiao Jiang and Mingli Song: Interpretable Partitioned Embedding for Customized Multi-item Fashion Outfit Composition[OS3-2] Peirui Cheng and Weiqiang Wang: A Multi-Oriented Scene Text Detector with Position-Sensitive Segmentation[OS3-3] Lan Wang, Yang Wang, Susu Shan and Feng Su: Scene Text Detection and Tracking in Video with Background Cues


7574

DAY 3 DAY 3

[PS-18] Xu Sun, Yuantian Wang, Tongwei Ren, Zhi Liu, Zheng-Jun Zha and Gangshan Wu: Object Trajectory Proposal via Hierarchical Volume Grouping[PS-19] Sungeun Hong, Woobin Im and Hyun Seung Yang: CBVMR: Content-Based Videoˮusic Retrieval Using Soft Intra-Modal Structure Constraint[PS-20] Yi Tang, Zhi Jin, Wenbin Zou and Xia Li: Multi-Scale Spatiotemporal Conv-LSTM Network for Video Saliency Detection[PS-21] Jianfei Xue and Koji Eguchi: Supervised Nonparametric Multimodal Topic Modeling Methods for Multi-class Video Classification[PS-22] Baohan Xu, Hao Ye, Yingbin Zheng, Heng Wang, Tianyu Luwang and Yu-Gang Jiang: Dense Dilated Network for Few Shot Action Recognition[PS-23] Haonan Qiu, Yingbin Zheng, Hao Ye, Yao Lu, Feng Wang and Liang He: Precise Temporal Action Localization by Evolving Temporal Proposals


[PS-9] Minh-Son Dao, Quang-Nhat-Minh Pham, Asem Kasem and Mohamed Saleem Haja Nazmudeen: A Context-Aware Late-Fusion Approach for Disaster Image Retrieval from Social Media[PS-10] Yugo Sato, Tsukasa Fukusato and Shigeo Morishima: Face Retrieval Framework Relying on User’s Visual Memory[PS-11] Xueping Wang, Weixin Li, Guodong Mu, Di Huang and Yunhong Wang: Facial Expression Synthesis by U-Net Conditional Generative Adversarial Networks[PS-12] Hongzhi Li, Joseph Ellis, Lei Zhang and Shih-Fu Chang: PatternNet: Visual Pattern Mining with Deep Neural Network[PS-13] Mingjie Zheng, Sheng-Hua Zhong, Songtao Wu and Jianmin Jiang: Steganographer Detection based on Multiclass Dilated Residual Networks[PS-14] Maguell L.T.L. Sandifort, Jianquan Liu, Shoji Nishimura and Wolfgang Hurst: An Entropy Model for Loiterer Retrieval across Multiple Surveillance Cameras[PS-15] Philipp Harzig, Christian Eggert and Rainer Lienhart: Visual Question Answering With a Hybrid Convolution Recurrent Model[PS-16] Shuai Liao, Efstratios Gavves and Cees Snoek: Searching and Matching Texture-free 3D Shapes in Images[PS-17] Duc Tien Dang Nguyen, Michael Riegler, Liting Zhou and Cathal Gurrin: Challenges and Opportunities within Personal Life Archives


76 77

DAY 3 DAY 3

Demo (14:00-16:30, Room A, Chair: Koichi Shinoda, Zhipeng Wu)See the list on Day 2.

Industrial Exhibition (14:00-16:30, Foyer)See the list on Day 2.

Keynote 2 (16:30-17:30, Hall, Chair: Shin’ichi Satoh)Prototyping for Envisioning the Futureby Shunji Yamanaka (The Univ. of Tokyo)

Oral Session 4: Video Analysis (17:30-18:30, Hall, Chair: Koichi Shinoda)[OS4-1] Yang Mi, Kang Zheng and Song Wang: Recognizing Actions in Wearable-Camera Videos by Training Classifiers on Fixed-Camera Videos[OS4-2] Romain Cohendet, Karthik Yadati, Ngoc Q. K. Duong and Claire-Helene Demarty: Annotating, understanding, and predicting long-term video memorability[OS4-3] Daniel Rotman, Dror Porat, Gal Ashour and Udi Barzelay: Optimally Grouped Deep Features Using Normalized Cost for Video Scene Detection

Banquet (19:00-, Hotel New Grand)

Poster Session (14:00-16:00, Foyer, Chair: Keiji Yanai)Posters of all the Best Session/Oral Session/Poster Papers will be presented.(Core time: 14:00-15:00 for odd number IDs, 15:00-16:00 for even number IDs)

Doctoral Symposium (14:00-16:00, Hall, Chair: Martha Larson, Takahiro Ogawa)[DS-1] Wan-Lun Tsai: Personal Basketball Coach: Tactic Training through Wireless Virtual Reality[DS-2] Andreas Leibetseder and Klaus Schoeffmann: Extracting and Using Medical Expert Knowledge to Advance in Video Processing for Gynecologic Endoscopy[DS-3] Noa Garcia: Temporal Aggregation of Visual Features for Large-Scale Image-to-Video Retrieval[DS-4] Naoki Saito, Takahiro Ogawa, Satoshi Asamizu, and Miki Haseyama: Tourism Category Classification on Image Sharing Services Through Estimation of Existence of Reliable Results[DS-5] Rashmi Gupta and Cathal Gurrin: Considering Documents in Lifelog Information Retrieval?


78 79

DAY 4 INDUSTRIAL DAY & ACM MM TPC WORKSHOP, JUNE 14, 2018

DAY 4


Panel (9:30-10:30, Hall)Title: Top-5 problems in multimedia retrievalPanelists: Tat-Seng Chua, Michael Houle, Ramesh Jain, Nicu Sebe, Rainer LienhartFacilitators: Chong-Wah Ngo, Vincent Oria


Industrial Talks (11:00-13:00, Hall, Chair: Go Irie, Tao Mei)[IT-1] NEC Corporation NEC’s Object recognition technologies and their industrial applications by Kota Iwamoto[IT-2] CyberAgent, Inc. Orion: An Integrated Multimedia Content Moderation System for Web Services by Yusuke Fujisaka[IT-3] LIFULL Co., Ltd. Promoting Open Innovations in Real Estate Tech: Provision of the LIFULL HOME’S Data Set and Collaborative Studies by Yoji Kiyota

MAIN PROGRAM

80 81

[MT-6] Rainer Lienhart, Mining Automatically Estimated Poses from Video Recordings of Top Athletes


ACMMM TPC Workshop (17:00-19:00, Hall, Chair: Nicu Sebe)[MT-7] Benoit Huet, Affective Multimodal Analysis for the Media Industry[MT-8] Xin Yang, Deep Neural Networks for Automated Prostate Cancer Detection and Diagnosis in Multi-parametric MRI[MT-9] Heng Tao Shen, Cross-Media Retrieval: State of the Art[MT-10] Rongrong Ji, Towards Compact Visual Analysis Systems[MT-11] Max Mühlhäuser, Multimedia Research: There’s life in the old dog yet

[IT-4] Hitachi, Ltd. Industrial applications of image recognition and retrieval technologies for public safety and IT services by Tomokazu Murakami


ACMMM TPC Workshop (14:30-16:30, Hall, Chair: Nicu Sebe) [MT-1] Yu-Gang Jiang, Brain-inspired Deep Models for Visual Recognition[MT-2] Masataka Goto, Frontiers of Music Technologies[MT-3] Jia Jia, Mental Health Computing via Harvesting Social Media Data[MT-4] Qi Tian, Person Re-Identification: Recent Advances and Challenges[MT-5] Qin Jin, Multi-level Multi-aspect Multimedia Analysis

DAY 4 DAY 4


84 85

HallFoyer

Industorial Demo

Coffee, Tea, Refreshments(only during breaks)

Registration

Meeting room 3

Meeting room 2

Meeting room 1

RestroomsRoom B

Room A

Restrooms

FLOOR PLAN: 6F (Hall, Foyer) FLOOR PLAN: 7F (Room A, Room B)

Local Guides on Google Maps

86 87

DAY2 POSTER AREA PLAN: 6F (Foyer) DAY3 POSTER AREA PLAN: 6F (Foyer)

Poster core time:15:30-16:30Special Session posters

Best Papers and regular Poster Papers can be put on the wall on both Day2 and Day3. Note that their core time is on Day3.

Poster core time:14:00-15:00 odd IDs 15:00-16:00 even IDs

PS-5

PS-6

PS-15

BS-2

BS-1

BS-3

BS-4

PS-16

PS-4

PS-7

PS-14

PS-17

SS2-4

SS1-4

PS-3

PS-8

PS-13

PS-18

PS-23

SS2-3

SS1-5

SS1-3

PS-2

PS-9

PS-12

PS-19

PS-22

SS2-2

SS1-6

SS1-2

PS-1

PS-10

PS-11

PS-20

PS-21

SS2-1

SS1-7

SS1-1

PS-5

PS-6

PS-15

BS-2

BS-1

BS-3

BS-4

PS-16

OS4-2

OS4-1

OS1-6

OS1-5

PS-4

PS-7

PS-14

PS-17

OS4-3

OS3-3

OS2-1

OS1-4

PS-3

PS-8

PS-13

PS-18

PS-23

OS3-2

OS2-2

OS1-3

PS-2

PS-9

PS-12

PS-19

PS-22

OS3-1

OS2-3

OS1-2

PS-1

PS-10

PS-11

PS-20

PS-21

OS2-5

OS2-4

OS1-1

Hall Hall

YAMASHITA PARK

FISHERMAN’S MARKET(INSIDE YOKOHAMA RED BRICK WAREHOUSE)

YOKOHAMA MEDIA AND COMMUNICATION CENTER

HOTEL NEW GRAND

YAMASHITA PARK

YOKOHAMA MEDIA AND COMMUNICATION CENTER

DIRECTION TO RECEPTION VENUE DIRECTION TO BANQUET VENUE

8988

ICMR 2018 SUPPORTERS

CORPORATESUPPORTERS(GOLD)

CORPORATESUPPORTERS(SILVER)

CORPORATESUPPORTERS(BRONZE)

SUPPORTERS