machine learning in magento 2

Machine Learning in Magento 2

Customer retention

Recommendations during product choosing Cross-sell during purchasing process Realtime personal discounts Personalized search results

Customer returns

Predict following sales and offer the customer Personalized discounts and hot-sales e-mails Work with abandoned baskets After-sale support by e-mail of phone

Customer behavior analytics

Customers segmentation Automatic clusterization Searching for hidden behavior patterns Signalizing about unusual customer activities

Customer behavior prediction

Predict customer preferences Predict unknown data about customer Predict future purchases Predict lost customers Predict anything for what you can find correlations

The Main goal:personalization

Convert visitors to HAPPY buyers

1. Analyze visitor2. Predict visitor needs3. Determine visitor behavior pattern4. Inject into salesflow the most effective additional points of influence

personalized for the visitor5. Suggest exactly what the visitor wants6. Make the visitor happy buyer7. Thank him for the purchasing and suggest more

Magento 2.0instance

Data Sources

Events flow

Generate events

ES

Persist events to datastore

321 …Event consumers

Machine Learning standalone service

Data required for prediction models

Consumer asks by SOAP for a prediction

results

Create sub-events

Calls to API in order to ML decisions

Communicates with visitors

Realtime calls to ML service API to obtain predicted data

Hadoop Spark MLBatch and realtime long-term history analysis, heavy reporting

Customer activities, internal data changes

Data Flow

Data sources

1. Products catalog, inventory2. Pages visit logs3. Purchases, abandoned baskets4. Ratings, reviews5. External data sources like Twitter, Amazon, public datasets, etc.6. Timeseries with

• history of changes of product’s prices• customers activity log

Events FLOW

1. Common event bus using RabbitMQ for small customers and Apache Kafka for a large

2. It’s a horizontal highly scalable solution3. All data inside events should get to the persistent datastore according to

consumers rules4. After that consumers may trigger sub-event for the ML algorithms that

depends on changed data5. If ML algorithm should call some API method in Magento (for example add

customer to a new segment), it would publish event for the appropriate consumer

6. On each step we have the opportunity to integrate any external systems into our process flow through the event bus

Persistent datastore

1. Datastore should have three levelsI. In-memory datastore to cache operational data for realtime queriesII. Operational datastore to persists all appropriate data for machine learning

algorithmsIII. Analytical datastore for all historical data which will be used for a heavy

reporting and deep ML analysis2. Due to the probabilistic nature of the ML algorithm, in datastore

architecture we can sacrifice Consistency of CAP theory and guarantee Availability and Partition tolerance

3. On the first step of discussing I propose to use Redis(VoltDB, Aerospike, Tarantool), ElasticSearch(Solr, MySQL, HBase) and Hadoop

Machine learning service

1. Will be implemented as standalone service2. Binary/SOAP/REST protocols using HTTP/TCP transport layer3. Direct read-only access to all data sources4. ACL checks should be implemented on clients5. Horizontally scalable nothing-shared architecture6. Calculated models will be synced using binary protocol without master-

node (Zookeeper)7. Each node has its own memory pool to store internal datasets for

calculations

Hadoop + Spark

Should be implemented only for extremely large stores Hadoop is a very slow datastore But Hadoop and Spark together allow us to use machine learning

algorithms in near-realtime and distributed manner Using event bus we can write all the data to Hadoop and run ML tasks

on unlimited volumes of data: Reporting Batch clustering Searching for patterns and outliers

Use CASES

Realtime Recommendations

Using all historical data about user’s activity and internal datasources we can predict customer needs

User activities: Page views log with duration of an each page view Visitor returns Registrations Ratings and reviews Purchases Abandoned shopping cart

Internal datasources Product’s prices with changes Discounts Customer segments

Personal discounts

Create behavior patterns and detect cases when merchant should give a personal discount to customer on a particular product

Discounts will be shown to customer in realtime during catalog browsing If customer didn’t purchase discounted product, algorithm should take

this into consideration in further work with this customer

Personalized product catalog

Besides product recommendations ML algorithms can determine customer’s preferences and generate product catalog page according to them

Product’s list may naturally include starred products from predicted list Another way is sorting list by “the best choice”

machine learning in magento 2

Data & Analytics