a detection method of customer's products selection
TRANSCRIPT
2019/03/16 IPSJ第81回全国大会 インタフェース-知識セッション
A Detection Method of Customer's Products
Selection Behavior Based on Top-view
Images in Retail Environments
1
Jiahao Wen1, Muhammad Alfian Amrizal 2,
Toru Abe 1,3, Takuo Suganuma 1,3
1 Graduate School of Information Sciences, Tohoku University
2 Research Institute of Electrical Communication, Tohoku University
3 Cyberscience Center, Tohoku University
2
1. Introduction
Less human labor
+
Ubiquitous cameras in shops
Smart Retail
• Smart
Effective data collection
Valuable data can be effectively collected
in a smart retail store
1.1 Background (1/2)
Images’ url: https://www.youtube.com/watch?v=jYjRr7GQVyw
2019/03/16 IPSJ第81回全国大会 インタフェース-知識セッション
Retail
Marketing
Valuable data
(offline store)
What is valuable data?
3
Most/Least
consumed productCustomer’s
preference
1.1 Background (2/2)
1. Introduction
Popular/Unpopular products - Known
Products picked by customers but weren’t bought (returned) – Unknown
Traditional
thinking
Our
thinking
Recognize shopping process
(Purpose of this research)
Get more data from
shopping process
Problem
Solution
• Customer behavior is a kind of valuable data.
2019/03/16 IPSJ第81回全国大会 インタフェース-知識セッション
1.2 Research Outline
4
1. Introduction
• Problem statementExisting methods fail to recognize some valuable behaviors such as
when customers consider a product but end up returning it back to
the shelf.
• Main goalImprove recognition of such product returning behavior in retail
stores to provide more valuable information for marketing.
• ProposalExtract objects from top-view images by a CNN model. An output
sequence from CNN is learnt by an RNN model to classify
behaviors.
1. Recognition using context data.
2. Reduce recognition complexity compared to existing methods.
3. Using only a normal camera caused lower implementation cost.
2019/03/16 IPSJ第81回全国大会 インタフェース-知識セッション
5
2. Related Work
• Using two models increases complexity.
• Solution to noise in visual features isn’t
mentioned.
[1] Donahue J, Hendricks L A, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T. Long-term recurrent convolutional networks for visual recognition and description. arXiv: 1411.4389, 2016.
• Based on Recurrent Neural Network (RNN)
2.1 An existing method for Behavior Recognition
2019/03/16 IPSJ第81回全国大会 インタフェース-知識セッション
• RNN-based methods use context
information for recognition.
• Proposed Long-term recurrent
convolutional networks (LRCN) [1] uses
two models to recognize whether there
is object’s interaction or not, such as in
tennis and jogging.
Fig.1 Architecture of LRCN
2.2 Comparison with the existing method
6
2. Related Work
• Proposed Method (CNN+RNN):
• Compared to CNN-only method[2]-[4]:
• Compared to LRCN[1]:
2019/03/16 IPSJ第81回全国大会 インタフェース-知識セッション
Only need a normal cameraUtilize context information
More accurate Lower cost
Uses only one RNN model
instead of two modelsNoise in visual features can be
solved in sequence learning
Reduce complexity Robust to noise
[2] Chéron G, Laptev I, Schmid C. P-CNN: pose-based CNN features for action recognition, IEEE, pp. 3218-3226, 2015.[3] J. Yamamoto, K. Inoue, M. Yoshioka, "Investigation of customer behavior analysis based on top-view depth camera", WACVW 2017, pp. 67-74, 2017.[4] D. Liciotti, M. Contigiani, E. Frontoni, A. Mancini, P. Zingaretti, V. Placidi, “Shopper Analytics: A Customer Activity Recognition System Using a Distributed RGB-D
Camera Network”, VAAM 2014, vol 8811, pp. 146-157, 2014.
7
3.1 System Overview (1/2)
3. Proposal
• Main Process:
2019/03/16 IPSJ第81回全国大会 インタフェース-知識セッション
Human Recognition
• Recognize human by a trained CNN model.
Detection Part
(Loop)
Classification Part
human’s region
1. Detect hands from the recognized human.
2. Detect products around hands.
3. Record detected hands and products on a state sequence 𝑆𝑛.
1. Regularize the original 𝑆𝑛.(Working On)
2. Classify behavior from the regularized 𝑆𝑛′.
State Sequence
𝑆𝑛
An example of the nth customer:1. Pick up product 1
2. Return product 2…
Frame 𝑖
Shelf Area Book
𝑆𝑛 = {𝑆0, , … , , … , 𝑆𝐿}𝑆1 𝑆𝑖
𝑆𝑖 = { , }𝑂 1
id = 1𝑂𝑈𝑇
8
3. Proposal3.1 System Overview (2/2)
Camera
Installation
Top-view
Image
Shelf Area
• Assumption:1. Each person walks into the frame is a different person.
2. Each person is alone and do not interact with other customers.
2019/03/16 IPSJ第81回全国大会 インタフェース-知識セッション
• Output:1. Behavior’s type:
• Primitive: pick up, return
• High level: change, pick again
2. A shopping process consists
of the above behavior’s types
and their time stamps.
Use only a normal camera
• Input Image: Image from a top-view Camera
(A camera that takes images
from above of the customers)
𝑆𝑖 - State of frame 𝑖
𝐻𝑖 - Hands’ State
P𝑖 - Products’ id
For the 2nd frame,
𝑆1 = (𝐻1, 𝑃1)
3.2 Detection Part (1/2)
9
3. Proposal
• Record visual results in a State Sequence 𝑆𝑛
𝑆𝑛 = {𝑆0, , 𝑆2, … , , … , 𝑆𝐿}𝑆1 𝑆𝑖
When hand is out of shelf area:
𝐻1 = OWhen no product in hand:
𝑃1 = ∅Write as 𝑆1 = (𝑂, ∅)
For the 𝑖𝑡ℎ frame,
𝑆𝑖 = (𝐻𝑖 , 𝑃𝑖)
• For the 𝑛𝑡ℎ customer, 𝑆𝑛 is:
𝐻𝑖 = 𝐼, 𝑂“𝐼”, “𝑂” for hand in or out of shelf area.
𝑃𝑖 = {∅, 𝑖𝑑}“∅” is for no product in hands,
“𝑖𝑑” is for product’s id.
2019/03/16 IPSJ第81回全国大会 インタフェース-知識セッション
HAND: OUTPRODUCT ID: ∅
Shelf Area
handhand
10
3. Proposal
• Example of a state sequence 𝑆𝑛
𝑆1 = {(𝑂, ∅)} 𝑆1 = {… , (𝐼, ∅)} 𝑆1 = {… , (𝐼, ∅)}
𝑆1 = {… , (𝑂, ∅), (𝐼, ∅)}
𝑆1 = {… , (𝐼, 1)}
id = 1
𝑆1 = {… , (𝐼, 2)}
id = 2
𝑆1 = {… , (𝑂, 2)}
id = 2
𝑆1 does not change
(1) Walk in (2) Reaching product 1 (3) Pick up product 1 (4) Return product 1
(5) Reaching product 2 (6) Pick up product 2 (7) Take away product 2 (8) Leave
3.2 Detection Part (2/2)
2019/03/16 IPSJ第81回全国大会 インタフェース-知識セッション
• State of each frame is saved on the state sequence 𝑆𝑛
In this example, 𝑆𝑛 is constantly updated
based on customer’s action sequence
3.3 Classification Part (1/2)
11
3. Proposal
• RNN-based Regularize Model
𝑆1′ =.{… , 𝐼, 1 , 𝑂, 1 , … }
• In practice, the original 𝑆𝑛 must be regularized according to the rule of
classification for the next step of classification.
Failed target recognition
𝑆1 =.{… , 𝐼, 1 , 𝐼, 1 , (𝐼, 1), (𝐼, ∅),(𝐼, 1), (𝐼, 1), 𝐼, ∅ , 𝐼, ∅ ,. 𝐼, 1 , 𝐼, 1 , 𝑂, 1 , 𝑂, 1 , 𝑂, 1 ,. 𝑂, ∅ , 𝑂, 1 , … }
Regularized SequenceOriginal Sequence
Repeated states
• An RNN-based process model is trained to read the original sequence
and output a regularized sequence.
RNN-based
Regularize Model
2019/03/16 IPSJ第81回全国大会 インタフェース-知識セッション
Noise robustness
Only one RNN model
is used
3.3 Classification Part (2/2)
12
3. Proposal
𝑆1′ =.{(𝑂, ∅), (𝐼, ∅),.(𝐼, 1), (𝐼, ∅),.(𝑂, ∅), (𝐼, ∅),.(𝐼, 2), (𝑂, 2)}
𝑆1′ =.{(𝑂, ∅), (𝐼, ∅),.(𝐼, 1), (𝐼, ∅),.(𝑂, ∅), (𝐼, ∅),.𝑃𝐼𝐶𝐾(2)}
𝑆1′ =.{(𝑂, ∅), (𝐼, ∅),.𝑅𝐸𝑇𝑈𝑅𝑁(1),.(𝑂, ∅), (𝐼, ∅),.𝑃𝐼𝐶𝐾(2)}
• Classify behavior from 𝑆𝑛
2019/03/16 IPSJ第81回全国大会 インタフェース-知識セッション
Optional
1. Return Product 1
2. Pick up Product 21. Change Product 1
into Product 2
𝑆1′ = .{.RETURN(1),PICK(2)}
𝑆1′ = .{.𝐶𝐻𝐴𝑁𝐺𝐸 12 }
Remove repeated
states
Final Output
Utilize context information
High Level
Primitive
In this part, we:
1. Regularize the original 𝑆𝑛
2. Classify behavior by matching pattern
according to a pre-determined rule
Rule 2: 𝐼, 𝑖𝑑 , 𝐼, ∅ =𝑅𝐸𝑇𝑈𝑅𝑁(𝑖𝑑)
Rule 1: 𝐼, 𝑖𝑑 , 𝑂, 𝑖𝑑 =PICK(𝑖𝑑)
13
4. Conclusion• Problem:
1. Existing methods fail to recognize product returning behaviors.2. Using sensors and advanced camera cost a lot.
• Goal:Recognize customers’ behavior during product selection and outputtheir shopping process using only a normal camera.
• Proposal:1. Extract visual features by a CNN model.2. Regularize visual features by an RNN model.3. Classify behaviors in a regularized sequence with pre-determined
rules.
• Advantages:
2019/03/16 IPSJ第81回全国大会 インタフェース-知識セッション
Use context information Only a normal camera
Correct incomplete visual features Only one RNN model