machine learning in magento 2
TRANSCRIPT
Machine Learning in Magento 2
WHAT
Customer retention
Recommendations during product choosing Cross-sell during purchasing process Realtime personal discounts Personalized search results
Customer returns
Predict following sales and offer the customer Personalized discounts and hot-sales e-mails Work with abandoned baskets After-sale support by e-mail of phone
Customer behavior analytics
Customers segmentation Automatic clusterization Searching for hidden behavior patterns Signalizing about unusual customer activities
Customer behavior prediction
Predict customer preferences Predict unknown data about customer Predict future purchases Predict lost customers Predict anything for what you can find correlations
The Main goal:personalization
Convert visitors to HAPPY buyers
1. Analyze visitor2. Predict visitor needs3. Determine visitor behavior pattern4. Inject into salesflow the most effective additional points of influence
personalized for the visitor5. Suggest exactly what the visitor wants6. Make the visitor happy buyer7. Thank him for the purchasing and suggest more
HOW
Magento 2.0instance
Data Sources
Events flow
Generate events
ES
Persist events to datastore
321 …Event consumers
Machine Learning standalone service
Data required for prediction models
Consumer asks by SOAP for a prediction
results
Create sub-events
Calls to API in order to ML decisions
Communicates with visitors
Realtime calls to ML service API to obtain predicted data
Hadoop Spark MLBatch and realtime long-term history analysis, heavy reporting
Customer activities, internal data changes
Data Flow
Data sources
1. Products catalog, inventory2. Pages visit logs3. Purchases, abandoned baskets4. Ratings, reviews5. External data sources like Twitter, Amazon, public datasets, etc.6. Timeseries with
• history of changes of product’s prices• customers activity log
Events FLOW
1. Common event bus using RabbitMQ for small customers and Apache Kafka for a large
2. It’s a horizontal highly scalable solution3. All data inside events should get to the persistent datastore according to
consumers rules4. After that consumers may trigger sub-event for the ML algorithms that
depends on changed data5. If ML algorithm should call some API method in Magento (for example add
customer to a new segment), it would publish event for the appropriate consumer
6. On each step we have the opportunity to integrate any external systems into our process flow through the event bus
Persistent datastore
1. Datastore should have three levelsI. In-memory datastore to cache operational data for realtime queriesII. Operational datastore to persists all appropriate data for machine learning
algorithmsIII. Analytical datastore for all historical data which will be used for a heavy
reporting and deep ML analysis2. Due to the probabilistic nature of the ML algorithm, in datastore
architecture we can sacrifice Consistency of CAP theory and guarantee Availability and Partition tolerance
3. On the first step of discussing I propose to use Redis(VoltDB, Aerospike, Tarantool), ElasticSearch(Solr, MySQL, HBase) and Hadoop
Machine learning service
1. Will be implemented as standalone service2. Binary/SOAP/REST protocols using HTTP/TCP transport layer3. Direct read-only access to all data sources4. ACL checks should be implemented on clients5. Horizontally scalable nothing-shared architecture6. Calculated models will be synced using binary protocol without master-
node (Zookeeper)7. Each node has its own memory pool to store internal datasets for
calculations
Hadoop + Spark
Should be implemented only for extremely large stores Hadoop is a very slow datastore But Hadoop and Spark together allow us to use machine learning
algorithms in near-realtime and distributed manner Using event bus we can write all the data to Hadoop and run ML tasks
on unlimited volumes of data: Reporting Batch clustering Searching for patterns and outliers
Use CASES
Realtime Recommendations
Using all historical data about user’s activity and internal datasources we can predict customer needs
User activities: Page views log with duration of an each page view Visitor returns Registrations Ratings and reviews Purchases Abandoned shopping cart
Internal datasources Product’s prices with changes Discounts Customer segments
Personal discounts
Create behavior patterns and detect cases when merchant should give a personal discount to customer on a particular product
Discounts will be shown to customer in realtime during catalog browsing If customer didn’t purchase discounted product, algorithm should take
this into consideration in further work with this customer
Personalized product catalog
Besides product recommendations ML algorithms can determine customer’s preferences and generate product catalog page according to them
Product’s list may naturally include starred products from predicted list Another way is sorting list by “the best choice”