a detection method of customer's products selection

2019/03/16 IPSJ第81回全国大会インタフェース-知識セッション

A Detection Method of Customer's Products

Selection Behavior Based on Top-view

Images in Retail Environments

1

Jiahao Wen1, Muhammad Alfian Amrizal 2,

Toru Abe 1,3, Takuo Suganuma 1,3

1 Graduate School of Information Sciences, Tohoku University

2 Research Institute of Electrical Communication, Tohoku University

3 Cyberscience Center, Tohoku University

2

1. Introduction

Less human labor

+

Ubiquitous cameras in shops

Smart Retail

• Smart

Effective data collection

Valuable data can be effectively collected

in a smart retail store

1.1 Background (1/2)

Images’ url: https://www.youtube.com/watch?v=jYjRr7GQVyw


Retail

Marketing

Valuable data

（offline store）

What is valuable data?

3

Most/Least

consumed productCustomer’s

preference

1.1 Background (2/2)

1. Introduction

Popular/Unpopular products - Known

Products picked by customers but weren’t bought (returned) – Unknown

Traditional

thinking

Our

thinking

Recognize shopping process

(Purpose of this research)

Get more data from

shopping process

Problem

Solution

• Customer behavior is a kind of valuable data.


1.2 Research Outline

4

1. Introduction

• Problem statementExisting methods fail to recognize some valuable behaviors such as

when customers consider a product but end up returning it back to

the shelf.

• Main goalImprove recognition of such product returning behavior in retail

stores to provide more valuable information for marketing.

• ProposalExtract objects from top-view images by a CNN model. An output

sequence from CNN is learnt by an RNN model to classify

behaviors.

1. Recognition using context data.

2. Reduce recognition complexity compared to existing methods.

3. Using only a normal camera caused lower implementation cost.


5

2. Related Work

• Using two models increases complexity.

• Solution to noise in visual features isn’t

mentioned.

[1] Donahue J, Hendricks L A, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T. Long-term recurrent convolutional networks for visual recognition and description. arXiv: 1411.4389, 2016.

• Based on Recurrent Neural Network (RNN)

2.1 An existing method for Behavior Recognition


• RNN-based methods use context

information for recognition.

• Proposed Long-term recurrent

convolutional networks (LRCN) [1] uses

two models to recognize whether there

is object’s interaction or not, such as in

tennis and jogging.

Fig.1 Architecture of LRCN

2.2 Comparison with the existing method

6

2. Related Work

• Proposed Method (CNN+RNN):

• Compared to CNN-only method[2]-[4]:

• Compared to LRCN[1]:


Only need a normal cameraUtilize context information

More accurate Lower cost

Uses only one RNN model

instead of two modelsNoise in visual features can be

solved in sequence learning

Reduce complexity Robust to noise

[2] Chéron G, Laptev I, Schmid C. P-CNN: pose-based CNN features for action recognition, IEEE, pp. 3218-3226, 2015.[3] J. Yamamoto, K. Inoue, M. Yoshioka, "Investigation of customer behavior analysis based on top-view depth camera", WACVW 2017, pp. 67-74, 2017.[4] D. Liciotti, M. Contigiani, E. Frontoni, A. Mancini, P. Zingaretti, V. Placidi, “Shopper Analytics: A Customer Activity Recognition System Using a Distributed RGB-D

Camera Network”, VAAM 2014, vol 8811, pp. 146-157, 2014.

7

3.1 System Overview (1/2)

3. Proposal

• Main Process:


Human Recognition

• Recognize human by a trained CNN model.

Detection Part

(Loop)

Classification Part

human’s region

1. Detect hands from the recognized human.

2. Detect products around hands.

3. Record detected hands and products on a state sequence 𝑆𝑛.

1. Regularize the original 𝑆𝑛.(Working On)

2. Classify behavior from the regularized 𝑆𝑛′.

State Sequence

𝑆𝑛

An example of the nth customer:1. Pick up product 1

2. Return product 2…

Frame 𝑖

Shelf Area Book

𝑆𝑛 = {𝑆0, , … , , … , 𝑆𝐿}𝑆1 𝑆𝑖

𝑆𝑖 = { , }𝑂 1

id = 1𝑂𝑈𝑇

8

3. Proposal3.1 System Overview (2/2)

Camera

Installation

Top-view

Image

Shelf Area

• Assumption:1. Each person walks into the frame is a different person.

2. Each person is alone and do not interact with other customers.


• Output:1. Behavior’s type:

• Primitive: pick up, return

• High level: change, pick again

2. A shopping process consists

of the above behavior’s types

and their time stamps.

Use only a normal camera

• Input Image: Image from a top-view Camera

(A camera that takes images

from above of the customers)

𝑆𝑖 - State of frame 𝑖

𝐻𝑖 - Hands’ State

P𝑖 - Products’ id

For the 2nd frame,

𝑆1 = (𝐻1, 𝑃1)

3.2 Detection Part (1/2)

9

3. Proposal

• Record visual results in a State Sequence 𝑆𝑛

𝑆𝑛 = {𝑆0, , 𝑆2, … , , … , 𝑆𝐿}𝑆1 𝑆𝑖

When hand is out of shelf area:

𝐻1 = OWhen no product in hand:

𝑃1 = ∅Write as 𝑆1 = (𝑂, ∅)

For the 𝑖𝑡ℎ frame,

𝑆𝑖 = (𝐻𝑖 , 𝑃𝑖)

• For the 𝑛𝑡ℎ customer, 𝑆𝑛 is:

𝐻𝑖 = 𝐼, 𝑂“𝐼”, “𝑂” for hand in or out of shelf area.

𝑃𝑖 = {∅, 𝑖𝑑}“∅” is for no product in hands,

“𝑖𝑑” is for product’s id.


HAND: OUTPRODUCT ID: ∅

Shelf Area

handhand

10

3. Proposal

• Example of a state sequence 𝑆𝑛

𝑆1 = {(𝑂, ∅)} 𝑆1 = {… , (𝐼, ∅)} 𝑆1 = {… , (𝐼, ∅)}

𝑆1 = {… , (𝑂, ∅), (𝐼, ∅)}

𝑆1 = {… , (𝐼, 1)}

id = 1

𝑆1 = {… , (𝐼, 2)}

id = 2

𝑆1 = {… , (𝑂, 2)}

id = 2

𝑆1 does not change

(1) Walk in (2) Reaching product 1 (3) Pick up product 1 (4) Return product 1

(5) Reaching product 2 (6) Pick up product 2 (7) Take away product 2 (8) Leave

3.2 Detection Part (2/2)


• State of each frame is saved on the state sequence 𝑆𝑛

In this example, 𝑆𝑛 is constantly updated

based on customer’s action sequence

3.3 Classification Part (1/2)

11

3. Proposal

• RNN-based Regularize Model

𝑆1′ =.{… , 𝐼, 1 , 𝑂, 1 , … }

• In practice, the original 𝑆𝑛 must be regularized according to the rule of

classification for the next step of classification.

Failed target recognition

𝑆1 =.{… , 𝐼, 1 , 𝐼, 1 , (𝐼, 1), (𝐼, ∅),(𝐼, 1), (𝐼, 1), 𝐼, ∅ , 𝐼, ∅ ,. 𝐼, 1 , 𝐼, 1 , 𝑂, 1 , 𝑂, 1 , 𝑂, 1 ,. 𝑂, ∅ , 𝑂, 1 , … }

Regularized SequenceOriginal Sequence

Repeated states

• An RNN-based process model is trained to read the original sequence

and output a regularized sequence.

RNN-based

Regularize Model


Noise robustness

Only one RNN model

is used

3.3 Classification Part (2/2)

12

3. Proposal

𝑆1′ =.{(𝑂, ∅), (𝐼, ∅),.(𝐼, 1), (𝐼, ∅),.(𝑂, ∅), (𝐼, ∅),.(𝐼, 2), (𝑂, 2)}

𝑆1′ =.{(𝑂, ∅), (𝐼, ∅),.(𝐼, 1), (𝐼, ∅),.(𝑂, ∅), (𝐼, ∅),.𝑃𝐼𝐶𝐾(2)}

𝑆1′ =.{(𝑂, ∅), (𝐼, ∅),.𝑅𝐸𝑇𝑈𝑅𝑁(1),.(𝑂, ∅), (𝐼, ∅),.𝑃𝐼𝐶𝐾(2)}

• Classify behavior from 𝑆𝑛


Optional

1. Return Product 1

2. Pick up Product 21. Change Product 1

into Product 2

𝑆1′ = .{.RETURN(1),PICK(2)}

𝑆1′ = .{.𝐶𝐻𝐴𝑁𝐺𝐸 12 }

Remove repeated

states

Final Output

Utilize context information

High Level

Primitive

In this part, we:

1. Regularize the original 𝑆𝑛

2. Classify behavior by matching pattern

according to a pre-determined rule

Rule 2: 𝐼, 𝑖𝑑 , 𝐼, ∅ =𝑅𝐸𝑇𝑈𝑅𝑁(𝑖𝑑)

Rule 1: 𝐼, 𝑖𝑑 , 𝑂, 𝑖𝑑 =PICK(𝑖𝑑)

13

4. Conclusion• Problem:

1. Existing methods fail to recognize product returning behaviors.2. Using sensors and advanced camera cost a lot.

• Goal:Recognize customers’ behavior during product selection and outputtheir shopping process using only a normal camera.

• Proposal:1. Extract visual features by a CNN model.2. Regularize visual features by an RNN model.3. Classify behaviors in a regularized sequence with pre-determined

rules.

• Advantages:


Use context information Only a normal camera

Correct incomplete visual features Only one RNN model

a detection method of customer's products selection

Documents