volume logistics application mongodb for a high · 2018. 4. 26. · mongodb for a high volume...

Santa Clara, California | April 23th – 25th, 2018

MongoDB for a High Volume Logistics

Application

about me ...

Eric Potvin

Software Engineer in the performance team

at Shipwire, an Ingram Micro company, in Sunnyvale, California

… A little background

who are we?We offer a cloud-based fulfillment software platform

This platform enables thousands of brands and online retailers to manage their order fulfillment operations

We support 20+ warehouses in multiple countries like USA, Canada, Australia, Italy, Germany and China

Some warehouses are unable to easily adapt their systems to new technologies

Warehouses are using old infrastructure; aka servers (AS/400) or service providers

Warehouses understands files

… and FTP

warehouses are … old-fashioned

what we have to deal withMillions of files received monthly

Gigabytes of various document file types (XML, TXT/CSV, PDF)

Limitations on file received (raw zip files vs zip files)

Limitations of FTP connections

lots of data to maintain8 processing servers

Ingesting millions of files per month

Thousands of log files

100+ GB of monthly logs / 250+ GB of data files

server resources & limitationsBy manipulating so many files, we are suffering from high server resources

consumptions

- Lots of processes with constant high CPU usage

- Each processes has high RAM usage,

- And high network usage - GBs of data transferred hourly

searching for information can be tediousOften, we need to look for data in case of errors or a common “we didn’t receive these files”

Data and logs are not available for users

Finding information requires an engineer to connect to each server

what about...NFS? This will eliminate the lookup across servers but still have some issues:

- Still large amount of files- Network overhead for large files- And … -bash: /bin/ls: Argument list too long

MySQL- Changing data structure requires maintenance

… so why did we choose MongoDB?

get all data at no cost?Analytics software are great and allow any user to see data

But they can be costly and limited

MongoDB gives the flexibility to save what we need

With no monthly or setup fee

better integrationsAll data can now be visible by all users

Can be integrated with our in-house applications

Self-service tool allow users to take actions immediately in case of issues

Accurate real-time tracking of documents

Real-time monitoring of documents, server resources

no more frequent reads/writesNo more slow CRUD operations on an XML file on disk

Avoid millions of disk and memory operations

It also make our code healthier …

From:Document doc = db.parse(<my_file>);

Element elem = doc.getDocumentElement();

NodeList nl = elem.getElementsByTagName(<child>);

for(int i = 0; i < nl.getLength(); i++) {

NodeList node = (Element)nl.item(i).getElementsByTagName(<tag>);

for(int j = 0; j < node.getLength(); j++) {

// fetch data for what I need

// and update later

To: mongoClient.getDatabase(myDatabase)

.getCollection(myCollection)

.find(search)

.projection(whatINeed)

// and update later

collection.update(search, dataToUpdate);

simplified code

available for everyone and instantlyNow all our apps can access MongoDB

Microservices can access the same data without delay

Data is available instantly, even after multiple manipulations

another ALTER? seriously? ...No more “system under maintenance” because we need to alter a big table

No need to care about schema update due to a warehouses updated file

And no need to store the entire content in a blob and try to search within

where is my data?Can access data using a “single point of access” (all depends which secondary I am reading from)

Faster data access with multiple secondaries

No more “file locked” … and waiting for unlock ...

server goes down, no big dealElection process is fantastic!

No more “down time” due to single points of failure

Easy to expand and/or upgrade

How did we reduce server resource usage?

example of manipulating a single order1 order from Chicago, USA to Québec City, Canada using an international carrier,

1 product ordered.

This requires at least 7 XML files and 3 PDF files to be created

This files contains multiple nodes giving details about shipping details- Tracking numbers- Number of boxes shipped- Carrier including details- etc...

File size can be up to few Megabytes

shipping confirmation example

nested loops of … O(n*r)?Looping through few Megabytes file is slow

- Each loop calls API and update database records

What if the process crashed, where to start from?

- Manual recovery

Constant server monitoring resources

iterations (what we used to have)

Open the entire file in memory

Loop through each record,

For each record loop through each box shipped

For each box shipped,

Loop through each product (quantity shipped, reason if not shipped)

Enough !

let’s keep this simple: O(1)

no more loops ...Save the data we only care about

- Our own standard format using Kilobytes of data

Higher efficiency of searching documents

- One simple document, one single query

“Stateful” resourcekeep track of data changes inside the document

No more intensive memory and disk usage due to multiple file manipulations

Real-time manual change from a UI by any user

Fault tolerantMongoDB gives us persistent data (server reboot, segmentation fault, etc…)

Eliminates memory issues when reading multiple large text file in memory

Free up resources for other applications running on the same server

server resourcesThis result in very low resource usage processes

CPU percentage and load went down drastically

Network usage dropped considerably

disk utilizationNo more -bash: /bin/ls: Argument list too long

Lots of free space reused for something else

No more frequent “cleanup” or disk maintenance

No more file archiving/maintenance to a backup server

No more disk at 95% utilization alerts

Let’s see a simple example

Application logs

application logs (what we used to have)

Each application logs its data to their own specific files

Each log uses different log level based on what is executed

CRIT (0), ERR (1), WARN (2), INFO (3), DEBUG (4)

Logs are saved with following format in /var/log/my_application/my_app.log

2017-11-12T03:50:02-08:00 [ INFO / 3 ] (PID: 12345): My message

application log (search)To search, we simply need to run:

for x in $(seq 1 8); do

ssh "p$x.myserver" grep -r "my search" /logs/app/* ;

Done

… wait … and … wait

no more !

let’s fix this

logging in MongoDBEach application logs its data to their own specific namespace

Database used: <application_name>

Collection used: <application_specific>

Example: warehouse.sending_files

logging in MongoDB (example)

{“datetime”: date: ISODate(),

“level”:”INFO”, “code”:3,

“pid”: 12345,

“message”: “file orders_1234.zip sent to /inbound/” }

MongoDB log (search)use logs;

db.my_app.find();

db.my_app.find({level: “INFO”});

db.my_app.find({message: /some specific data/);

archiving logsArchiving data can be done by using the TTL index

● Warning: ttl index runs every 60 seconds on all namespaces and records to identify which records needs to be

removed. This can slow down data access.

Another way is to create a daemon that generates “yearly or monthly” collections.

Then, use the mongodump to archive the records.

So …

What can MongoDB do for you?

Thank You!

volume logistics application mongodb for a high · 2018. 4. 26. · mongodb for a high volume...

Documents