Machine Learning Faculty Publications

Latent Memory-augmented Graph Transformer for Visual Storytelling

Mengshi Qi, Beijing University of Posts and Telecommunications
Jie Qin, Nanjing University of Aeronautics and Astronautics
DI Huang, Beihang University
Zhiqiang Shen, Carnegie Mellon University & Mohamed bin Zayed University of Artificial IntelligenceFollow
Yi Yang, University of Technology Sydney
Jiebo Luo, University of Rochester

Document Type

Conference Proceeding

Publication Title

MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia

Abstract

Visual storytelling aims to automatically generate a human-like short story given an image stream. Most existing works utilize either scene-level or object-level representations, neglecting the interaction among objects in each image and the sequential dependency between consecutive images. In this paper, we present a novel Latent Memory-augmented Graph Transformer∼(LMGT ), a Transformer based framework for visual story generation. LMGT directly inherits the merits from the Transformer, which is further enhanced with two carefully designed components, i.e., a graph encoding module and a latent memory unit. Specifically, the graph encoding module exploits the semantic relationships among image regions and attentively aggregates critical visual features based on the parsed scene graphs. Furthermore, to better preserve inter-sentence coherence and topic consistency, we introduce an augmented latent memory unit that learns and records highly summarized latent information as the story line from the image stream and the sentence history. Experimental results on three widely-used datasets demonstrate the superior performance of LMGT over the state-of-the-art methods.

First Page

4892

Last Page

4901

DOI

10.1145/3474085.3475236

Publication Date

10-17-2021

Keywords

memory network, scene graph, transformer, visual storytelling

Comments

IR conditions: non-described

Recommended Citation

M. Qi, J. Qin, D. Huang, Z. Shen, Y. Yang, and J. Luo, "Latent Memory-augmented Graph Transformer for Visual Storytelling", In Proceedings of the 29th ACM Intl. Conf. on Multimedia (MM '21), ACM, pp. 4892–4901, Oct 2021. doi:10.1145/3474085.3475236

Additional Links

DOI link: https://dl.acm.org/doi/10.1145/3474085.3475236

Link to Full Text

COinS

Machine Learning Faculty Publications

Latent Memory-augmented Graph Transformer for Visual Storytelling

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Browse

Contribute

Links

Machine Learning Faculty Publications

Latent Memory-augmented Graph Transformer for Visual Storytelling

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Share

Browse

Contribute

Links