transitive hashing network for heterogeneous multimedia … · 2020. 9. 21. · 0.6 0.8 1 cvh imh...

18
Transitive Hashing Network for Heterogeneous Multimedia Retrieval Zhangjie Cao 1 , Mingsheng Long 1 , Jianmin Wang 1 , and Qiang Yang 2 1 School of Software Tsinghua University 2 Department of Computer Science Hong Kong University of Science and Technology The Thirty-first AAAI Conference on Artificial Intelligence, 2017 Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 1 / 18

Upload: others

Post on 04-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • Transitive Hashing Network for HeterogeneousMultimedia Retrieval

    Zhangjie Cao1, Mingsheng Long1, Jianmin Wang1, and Qiang Yang2

    1School of SoftwareTsinghua University

    2Department of Computer ScienceHong Kong University of Science and Technology

    The Thirty-first AAAI Conference on Artificial Intelligence, 2017

    Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 1 / 18

  • Motivation Large Scale Multimedia Retrieval

    Cross-modal RetrievalNearest Neighbor (NN) similarity retrieval across modalities

    Database: X img = {ximg1 , . . . , ximgN

    } and Query: qtxtCross-modal NN: NN (qtxt) = min

    x

    img œX img d!x

    img , qtxt"

    Precision: 0.625

    Image Query

    Top 16 Returned Tags

    [‘lake’]

    (a) I æ T (Image Query on Text DB)

    Precision: 0.625

    Tags Query

    Top 16 Returned Images

    [‘sky sun’]

    (b) T æ I (Text Query on Image DB)

    Figure: Cross-modal retrieval: similarity retrieval across media modalities.

    Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 2 / 18

  • Motivation Hashing Methods

    Hashing Methods

    generate imagedescriptors Approximate

    NearestNeighborRetrieval

    -1

    1generate imagehash codes

    SIFTDeCAF

    generate textdescriptors

    -1

    1generate texthash codes

    Word2VecTF-IDFmountainriver

    SuperioritiesMemory

    128-d float : 512 bytes æ 16 bytes

    1 billion items : 512 GB æ 16 GB

    TimeComputation: x10 - x100 faster

    Transmission (disk / web): x30

    faster

    ApplicationsApproximate nearest neighbor

    search

    Compact representation, Feature

    Compression for large datasets

    Distribute and transmit data

    online

    Construct index for large-scale

    database

    Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 3 / 18

  • Motivation Hashing Methods

    Traditional VS. Transitive

    animalsleaves

    tigerszoos

    portraitactor

    athletescelebrity

    bluesky

    humanslove

    animals meaningflowersign

    killtree

    sunsetlake

    peoplewelfare

    browntree

    athletespeople

    heterogeneous relationship

    image modality (auxiliary dataset)

    heterogeneous relationship(auxiliary dataset)

    image modality (query/database)

    text modality (database/query)image modality (query/database)

    text modality (auxiliary dataset)

    text modality (database/query)

    animalsleaves

    traditional cross-modalhashing

    transitive cross-modalhashing

    Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 4 / 18

  • Method Model

    Network Architecture

    MMD MMD

    database/query

    hash layer

    hash layer

    image network text network

    query/database

    -1 1 -1 1 -11-1 1 -11

    pairwisecross-entropy loss

    pairwisequantization loss

    pairwisequantization loss

    Transitive Hashing Network

    auxiliary dataset

    Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 5 / 18

  • Method Loss

    Heterogeneous Relationship Learning

    humanslove

    animals meaningflowersign

    killtree

    sunsetlake

    peoplewelfare

    browntree

    athletespeople

    image modality (auxiliary dataset)

    heterogeneous relationship(auxiliary dataset)

    image modality (query/database)

    text modality (auxiliary dataset)

    text modality (database/query)

    animalsleaves

    Given heterogeneous relationship S = {sij

    },Logarithm Maximum a Posteriori estimation

    log p (Hx , Hy |S) Ã log p (S|Hx , Hy ) p (Hx ) p (Hy )=

    ÿ

    s

    ij

    œSlog p

    1sij

    |hxi

    , hyj

    2p (hx

    i

    ) p1hy

    j

    2, (1)

    where p(S|Hx , Hy ) is likelihood function, and p(Hx ) and p(Hy ) areprior distributions.

    Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 6 / 18

  • Method Loss

    Heterogeneous Relationship Learning

    Likelihood functionFor each pair of points x

    i

    and yj

    , p(sij

    |hxi

    , hyj

    ) is the conditionalprobability of their relationship s

    ij

    given their hash codes hxi

    and hyj

    ,which can be defined using the pairwise logistic function,

    p!sij

    |hxi

    , hyj

    "=

    I‡

    !+hx

    i

    , hyj

    ,", s

    ij

    = 11 ≠ ‡ !+hx

    i

    , hyj

    ,", s

    ij

    = 0

    = ‡!+

    hxi

    , hyj

    ,"s

    ij

    !1 ≠ ‡ !+hx

    i

    , hyj

    ,""1≠sij ,

    (2)

    where ‡ (x) = 1/(1 + e≠x ) is the sigmoid function and hxi

    = sgn(zxi

    )and hy

    i

    = sgn(zyi

    ).

    Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 7 / 18

  • Method Loss

    Heterogeneous Relationship Learning

    PriorFor ease of optimization, continuous relaxation that hx

    i

    = zxi

    andhy

    i

    = zyi

    is applied to the binary constraints.Then, to control the quantization error and close the gap betweenHamming distance and its surrogate for learning accurate hash codes,a new cross-entropy prior is proposed over the continuous activations{zú

    i

    } as

    p (zúi

    ) Ã exp3

    ≠⁄H3

    1

    b ,|zú

    i

    |b

    44, (3)

    where ú œ {x , y}, and ⁄ is the parameter of the exponentialdistribution.

    Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 8 / 18

  • Method Loss

    Heterogeneous Relationship Learning

    Optimization problem for heterogeneous relationship

    min�

    J = L + ⁄Q, (4)

    where ⁄ is the trade-o� parameter between pairwise cross-entropy lossL and pairwise quantization loss Q, and � is network parameters.

    Specifically,

    L =ÿ

    s

    ij

    œSlog

    11 + exp

    1ezx

    i

    , zyj

    f22≠ s

    ij

    ezx

    i

    , zyj

    f. (5)

    Q =ÿ

    s

    ij

    œS

    bÿ

    k=1(≠ log(|zx

    ik

    |) ≠ log(|zyjk

    |)). (6)

    Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 9 / 18

  • Method Loss

    Homogeneous Distribution Alignment

    humanslove

    animals meaningflowersign

    killtree

    sunsetlake

    peoplewelfare

    browntree

    athletespeople

    image modality (auxiliary dataset)

    heterogeneous relationship(auxiliary dataset)

    image modality (query/database)

    text modality (auxiliary dataset)

    text modality (database/query)

    animalsleaves

    Integral prob. metrics F-divergences

    wasserstein Hellinger

    MMD

    TV

    KL

    Pearson chi^2

    Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 10 / 18

  • Method Loss

    Homogeneous Distribution AlignmentMaximum Mean Discrepancy (MMD) [jmlr 12’]MMD is a nonparametric distance measure to compare di�erentdistributions P

    q

    and Px

    in reproducing kernel Hilbert space H (RKHS)endowed with feature map „ and kernel k, formally defined asD

    q

    ,...Ehq≥P

    q

    [„ (hq)] ≠ Ehx ≥Px

    [„ (hx )]...

    2

    H, where Pq and Px are thedistribution of the query set X q, and the auxiliary set X̄ .

    MMD between auxiliary dataset X̄ and query set X q

    Dq

    =n̂ÿ

    i=1

    n̂ÿ

    j=1

    k1zq

    i

    , zqj

    2

    n̂2 +n̄ÿ

    i=1

    n̄ÿ

    j=1

    k1zx

    i

    , zxj

    2

    n̄2 ≠ 2n̂ÿ

    i=1

    n̄ÿ

    j=1

    k1zq

    i

    , zxj

    2

    n̂n̄ ,

    (7)where k(z

    i

    , zj

    ) = exp(≠“||zi

    ≠ zj

    ||2) is the Gaussian kernel.

    Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 11 / 18

  • Method Loss

    Transitive Hashing NetworkUnified optimization problem

    min�

    C = J + µ (Dq

    + Dd

    ) , (8)

    where µ is a trade-o� parameter between the MAP loss J and theMMD penalty (D

    q

    + Dd

    ).

    ExtensionsWhen the auxiliary set is small, we assume that the auxiliary set isrelated to database and query sets. However, if the auxiliary set islarge, this requirement can be aborted since it’s common that theauxiliary set has relationship with other sets. Thus, we can use a largeset as auxiliary set in any task. We can even use relationship modelpre-trained from large-scale datasets and fine-tune it with ourhomogeneous alignment method. This widely-used pre-training andfine-tuning strategy makes our method more easily deployable.

    Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 12 / 18

  • Experiments Setup

    Experiments Setup

    Datasets:

    humanslove

    animals meaningflowersign

    killtree

    sunsetlake

    peoplewelfare

    browntree

    athletespeople

    NUS-WIDE (auxiliary dataset)

    ImageNet (query/database)

    NUS-WIDE (auxiliary dataset)

    YahooQA (database/query)

    animalsleaves

    Protocols: MAPs, Precision-Recall CurveParameter selection: cross-validation by jointly assessingMethods to compare with: two unsupervised methods Cross-ViewHashing (CVH) and Inter-Media Hashing (IMH), two supervisedmethods Quantized Correlation Hashing (QCH) and HeterogeneousTranslated Hashing (HTH), and one deep hashing method DeepCross-Modal Hashing (DCMH).

    Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 13 / 18

  • Experiments Results

    Results and DiscussionTable: MAP Comparison of Cross-Modal Retrieval Tasks on NUS-WIDE andImageNet-YahooQA

    Task Method NUS-WIDE ImageNet-YahooQA8 bits 16 bits 24 bits 32 bits 8 bits 16 bits 24 bits 32 bits

    I æ T

    IMH 0.5821 0.5794 0.5804 0.5776 0.0855 0.0686 0.0999 0.0889CVH 0.5681 0.5606 0.5451 0.5558 0.1229 0.1180 0.0941 0.0865QCH 0.6463 0.6921 0.7019 0.7127 0.2563 0.2494 0.2581 0.2590HTH 0.5232 0.5548 0.5684 0.5325 0.2931 0.2694 0.2847 0.2663

    DCMH 0.7887 0.7397 0.7210 0.7460 0.5133 0.5109 0.5321 0.5087THN 0.8252 0.8423 0.8495 0.8572 0.5451 0.5507 0.5803 0.5901

    T æ I

    IMH 0.5579 0.5593 0.5528 0.5457 0.1105 0.1044 0.1183 0.0909CVH 0.5261 0.5193 0.5097 0.5045 0.0711 0.0728 0.1116 0.1008QCH 0.6235 0.6609 0.6685 0.6773 0.2761 0.2847 0.2795 0.2665HTH 0.5603 0.5910 0.5798 0.5812 0.2172 0.1702 0.3122 0.2873

    DCMH 0.7882 0.7912 0.7921 0.7718 0.5163 0.5510 0.5581 0.5444THN 0.7905 0.8137 0.8245 0.8268 0.6032 0.6097 0.6232 0.6102

    Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 14 / 18

  • Experiments Results

    Results and Discussion

    Recall0 0.2 0.4 0.6 0.8 1

    Pre

    cis

    ion

    0.4

    0.6

    0.8

    1

    CVHIMHHTHQCHDCMHTHN

    (a) I æ TRecall

    0 0.2 0.4 0.6 0.8 1

    Pre

    cis

    ion

    0.4

    0.6

    0.8

    1

    CVHIMHHTHQCHDCMHTHN

    (b) T æ IRecall

    0 0.2 0.4 0.6 0.8 1

    Pre

    cis

    ion

    0

    0.2

    0.4

    0.6

    0.8

    1

    CVHIMHHTHQCHDCMHTHN

    (c) I æ TRecall

    0 0.2 0.4 0.6 0.8 1

    Pre

    cis

    ion

    0

    0.2

    0.4

    0.6

    0.8

    1

    CVHIMHHTHQCHDCMHTHN

    (d) T æ IFigure: Precision-recall curves of Hamming ranking @ 24-bits codes onNUS-WIDE (a)-(b) and ImageNet-YahooQA (c)-(d).

    Key ObservationsTHN is a new state of the art method for the more conventionalcross-modal retrieval problems where the relationship between queryand database is available for training as in the NUS-WIDE dataset.The homogeneous distribution alignment module of THN e�ectivelycloses this shift by matching the corresponding data distributions withthe maximum mean discrepancy.

    Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 15 / 18

  • Experiments Results

    Empirical Analysis

    THN-ip: is the variant which uses the pairwise inner-product loss insteadof the pairwise cross-entropy lossTHN-D: is the variant without using the unsupervised training dataTHN-Q: is the variant without using the quantization loss

    Table: MAP of THN variants on ImageNet-YahooQA

    Method I æ T T æ I8 bits 16 bits 24 bits 32 bits 8 bits 16 bits 24 bits 32 bitsTHN-ip 0.2976 0.3171 0.3302 0.3554 0.3443 0.3605 0.3852 0.4286THN-D 0.5192 0.5123 0.5312 0.5411 0.5423 0.5512 0.5602 0.5489THN-Q 0.4821 0.5213 0.5352 0.4947 0.5731 0.5592 0.5849 0.5612THN 0.5451 0.5507 0.5803 0.5901 0.6032 0.6097 0.6232 0.6102

    Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 16 / 18

  • Experiments Results

    Empirical Analysis

    Key ObservationsTHN outperforms THN-ip by very large margins, confirming theimportance of well-defined loss functions for heterogeneousrelationship learning.THN outperforms THN-D, which convinces that THN can furtherexploit the unsupervised training data to bridge the Hamming spacesof auxiliary dataset (NUS-WIDE) and query/database sets(ImageNet-YahooQA) such that the auxiliary dataset can be leveragedas a bridge to transfer knowledge between query and database.THN outperforms THN-Q, indicating pairwise quantization loss canreduce the quantization errors when binarizing continuousrepresentations to hash codes.

    Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 17 / 18

  • Summary

    Summary

    We designe a new transitive deep hashing problem for heterogeneousmultimedia retrieval without direct relationship between query set anddatabase set.We propose a pairwise cross-entropy and a pairwise quantization lossfor heterogeneous relationship learning.We achieve homogeneous distribution alignment by minizing MMDloss between query/database set and auxiliary set.In the future, we plan to extend the method to social media problems.

    Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 18 / 18

    MotivationMethodExperimentsSummary