consulting and development agency. a data science and ... · provide full stack data science...

A data science and machine learning

consulting and development agency.

Data science projects are complex and require a range of technical

skills to ensure successful outcomes. Blue Orange offers full-service

development to increase the delivery of cutting-edge data insights.

Think data science as a service.

Predictive Analytics

Make DataWork for You

Predictive analytics uses many

techniques from data mining, statistics,

modeling, and machine learning to

analyze current data to make

predictions about future.

Data Warehousing Data Visualization

A unified data warehouse is a federated

repository to store all the data types.

This unification simplifies access and

expands analytics capabilities over

transaction based data storage.

Data visualization provides a quick, clear understanding of the information. Advanced analytics outputs require accurate and transparent representation to drive adoption and support consistent decision making.

Our Capabilities

AWS Registered Partner

What we do:

Provide full stack data science support to scale the capabilities of internal consulting data teams. From cloud infrastructure

engineering and custom ML development to integrated dashboards and data strategy, Blue Orange helps companies make

better decisions with their data.

● Dynamic Pricing

● Lead Segmentation

● Customer Churn

● Recommenders

● Trend Analysis

● Customer Lifetime Value

● Natural Language Processing

● Talent Analytics

Experience

Our Approach

Cas

e St

ud

y

Since we had a production

predictive model, our focus was to

identify and improve future results

based on existing results. We

began by aggregating historical

data to estimate the value of each

keyword as a benchmark. The bank

had stored historical bid data on

previous campaigns and

determined a correlation keyword

attribution.

1Estimate Keyword Value

2Model Selection

3Solutions

SectorBanking

VerticalMarketing Optimization

Th

e P

rob

lem

Technologies

The bank had a third-party ‘black box’ ad bidding system and they wanted to verify efficacy while improving price-per-click (PPC) bids on search terms in Google Ads marketplace. The company lacked an accurate way to estimate their marketing attribution per keyword. They relied on qualitative tracking to assess the third-party bidding tool.

We identified this as a Multi-Armed Bandit

(or contextual bandit with keyword

estimation data) problem where the

problem is defined as choosing to allocate a

fixed set of resource between alternative

options. In this case, the Estimated Keyword

Values were applied against competing

campaigns.

This approach asserted preference to

campaigns that are performing well within

target estimations, while deranking

variations that would underperform.

ModelUpper Confidence Bound/Epsilon-Greedy

A few approaches were applied to optimize different campaigns.

Other models were tested but these resulted in the most

immediate improvement:

Upper Confidence Bound: This strategy is based on the Optimism in

the Face of Uncertainty principle, and assumes that the unknown

mean payoffs of each arm will be as high as possible, based on

observable data.

Epsilon-Greedy: A randomly chosen campaign was selected a

fraction ε of the time. The other times, the arm with highest known

payout is pulled and these were measured in comparison then

reinforced.

Extermax was looking to optimize their customer acquisition process and they were looking for a data-driven methodology for these solutions. They wanted to predict highest value customer engagement touchpoints as well as model Customer Lifetime Value (CLTV). Their focus was on sales and marketing optimizing, but other departments ended up using CLTV to calculate benefits.

They aimed to measure and determine optimal solutions for the following:● How much should I spend to acquire a customer? ● What types of customers should sales reps spend the most time on trying to acquire?● What is the most effective marketing touch point and at what frequency?

We developed a user acquisition tracking platform for a mobile gaming client.

Built a predictive tracking tool to customize each marketing touch point for potential customers. Additionally, we calculated the Lifetime Value of each individual user among several cohorts segmented by each network.

● Designed and implemented customer based prediction models (Linear Regression models, NDB/pareto model) to calculate Lifetime Value per user.

● Applied user segmentation for enhanced user acquisition. ● Determined the specific efficiency (profit/loss) of individual ad-networks.● Performed several user segmentation techniques: K-Neighborhoods + PCA and

customized RFM analysis.

Th

e P

rob

lem

Cas

e St

ud

y

SectorGaming

VerticalCLV/Market Segmentation

TechnologiesModelK-NN/PCA

ModelNDB/Pareto

We developed a custom data platform to track Exelon’s corporate reputation. This project included ingesting, correlating and aggregating all publicly available media sources on the open web related the company or corresponding reputation drivers.

All data was scraped, ingested, and processed using a series of NLP techniques, including sentiment, topic modeling, classification and grouping. For advanced processing analysis, data was persisted in a graph database for advanced in-pipeline analytics. The underlying data volume was very large, requiring in-memory processing for ongoing analysis.

We delivered a series of custom dashboard focused on topic modeling, time-series, anomaly detection and aggregation summaries in an interactive application. Due to the data volume, we used both large precomputation and near-real time aggregation indexing jobs to provide interactive dashboards.

Exelon identified a range of brand reputations drivers for the company, which now inform all communications and marketing activities. Exelon required a natural language analysis engine to measure, correlate and visualize the drivers related to online and earned media and its potential effect on the company.

Th

e P

rob

lem

Cas

e St

ud

y

SectorEnergy

VerticalMarketing Optimization

TechnologiesModelNatural Language Processing

https://www.prefect.io

https://aws.amazon.com/rds/

https://www.elastic.co/what-is/elk-stack

In an effort to systematically improve data standardization and quantify the hiring pipeline, we applied

numerous data science techniques in two foundational aspects of the hiring pipeline.

Unstructured to Structured Data Processing● We used pLSA/LDA for resume topic modeling. This was applied to extract structured

attributes from unstructured associated text.

● We applied SVM/Random forest and other models to classify and clean this extracted

content based on different weighted factor provided by the SME.

Candidate Scoring● We first implemented a weighted heuristic model to established a benchmark.

● To allow for improved and standardized candidate ranking, we used a heavily feature trained

logistic regression model.

Point72 Asset Management was looking to quantify beneficial hiring characteristics and to develop predictive

hiring indicators to filter candidate applications. They had 10 years of unstructured free-text, both through

resume, third-party data and interview notes. This contained large amounts of unstructured (free text, scans,

emails) data. They were looking to standardize this data for improved analysis and to reveal non-standard

correlative success factors. Th

e P

rob

lem

Cas

e St

ud

y

SectorFinance

VerticalTalent Analytics

ModelSVM/Random Forest

Technologies

To save time and money, we opted to

verify our model on a derived dirty

dataset with characteristics similar

to our target data. The benefit of

generated data is that we have an

accurately labeled dataset to isolate

model accuracy from data accuracy.

Since we required manually

generated training data, this allowed

us to test and train multiple models

before investing effort in manual

data curation.

Point72 was looking to enrich highly variable resumes data with many additional data sources. Each of the data

sources were riddled with inconsistencies, misspellings, and missing data. This caused highly varied record

linkage and prevented accurate merging, deduplication and association. This impacted nearly 60% - 70% of the

applicable candidate data.

Th

e P

rob

lem

Cas

e St

ud

y

SectorFinance

VerticalTalent Analytics

Using field names, data type and

distributional divisions, we predicted

related column values for linkage. This

automated scheme mapping to

increase new data ingestion. For highly

regular fields (like names, date of birth,

emails, etc.), data matching accuracy

was 96%, even with field name

variation. (Ex first <> name).

1Model Verification

2Inferring Semantic Schema

Relations

3LSTM using Semantic

Representation

We found best accuracy using a

recurrent neural network that applied a

semantic representation of each entity

to determine potential linkages. We

used language level transfer learning

leveraging the FastText to identify the

semantic meaning of potential related

node values. Industry standard linkage

using Initial implementations of our

model were able to achieve up to 93%

accuracy.

TechnologiesModelMirrored LSTM

After an iterative approach of

industry standard heuristic

and statistical methods, we

opted for a deep learning

solution to meet the

complexity of the dirty and

inconsistent data set. We

based our solution on

MassMutual’s industry leading

approach but we were able to

get higher accuracy.

Cas

e St

ud

y

SectorCRM/ERP

VerticalSales Optimization

Th

e P

rob

lem

Three: Calculate a CPC that Promotes Your Goals

Stage three combines the estimated dollar value determined from stage one’s artificial intelligence-powered PPC calculations and the cost ecosystem analyzed in stage two. The decision engine applies the advertisers targets, bidding strategies, and goals and then runs through and selects the best bids to maximize performance, given the data, calculations, and goals. Often, a Portfolio approach is used to bid against a target while maintaining an efficiency metric. This is the modern approach to PPC bid optimization that most bid management tools utilize – if they’re designed for medium to large SEM programs. QuanticMind differs in some ways from legacy tools, discussed further in the Guide.

Four: Calculate Bid Adjustments

Stage four repeats nearly the same process completed in the first three steps, but on a different set of data and with a different purpose: calculating and automatically applying bid adjustments. QuanticMind’s model shines at this point, using machine learning to optimize bid adjustments at scale. Device Bid Modifiers, Geo Location Bid Modifiers, and Audience Bid Modifiers can all be automatically calculated and applied, based on their relative successes in the SEM program. The data science algorithms used here are another advantage when attempting to calculate optimized bids at scale.

Five: Anomaly Detection

Stage five moves into the often understated – but highly important – anomaly detection. This is one of several areas where the infrastructure discussed at the top can “flex” it’s strength. When designed for effective capturing, cleaning, and piping of data from any source, the system provides better data for better execution. However, the opposite has negative effects: when data is missing, or seems different than forecasts would suggest is reasonable, the performance can take a hit. Fully optimized bidding platforms prevent these problems by using multiple anomaly detection and issue-prevention steps, ensuring bids aren’t pushed based on bad data.

Technologies

A middle market PE firm needed help integrating 4 acquired CRM/ERP companies. They introduced

Blue Orange to the CEO of the merged company to provide architectural guidance on their data

infrastructure to support unified data and sales optimization. Due to disparate data sets, the

company had no insight into the efficacy of their upper funnel engagement or attribution across

their sales cycle.

StageDue Diligence Audit

1. Siloed data systems hinder coordination, planning, and tracking.

2. Low conversion on sales efforts.3. Lack visibility into sales processes.4. Development team has no resources for an

internally focused, stand alone project.

1. Scalable architecture creates confidence that data driven operations will not be outpaced by growth.

2. Increase top of funnel conversion using ML prediction to improve lead segmentation.

3. Improve sales modeling and oversight with real-time, full funnel dashboards.

4. Getting quick results was crucial. They needed to solve the problem quickly and then add complexity later.

Blue Orange Design ConsiderationsBusiness Challenge

Blue Orange helped build the first production prototype of PingThings’ PredictiveGrid. The PredictiveGrid is an Advanced Sensor Analytics Platform (ASAP) architected to ingest, store, access, visualize, analyze, and train machine learning and deep learning algorithms with sensor data measuring the grid with nanosecond temporal resolution.

Initial predictive problems addressed:

● Rapid post-event analysis and reporting● Sensor data cleaning and management● Fault detection, prediction, and localization● Anomaly identification, classification, and prediction● Failure signature identification

Cas

e St

ud

y

SectorEnergy

VerticalAnalytics

Th

e P

rob

lem

Three: Calculate a CPC that Promotes Your Goals

Stage three combines the estimated dollar value determined from stage one’s artificial intelligence-powered PPC calculations and the cost ecosystem analyzed in stage two. The decision engine applies the advertisers targets, bidding strategies, and goals and then runs through and selects the best bids to maximize performance, given the data, calculations, and goals. Often, a Portfolio approach is used to bid against a target while maintaining an efficiency metric. This is the modern approach to PPC bid optimization that most bid management tools utilize – if they’re designed for medium to large SEM programs. QuanticMind differs in some ways from legacy tools, discussed further in the Guide.

Four: Calculate Bid Adjustments

Stage four repeats nearly the same process completed in the first three steps, but on a different set of data and with a different purpose: calculating and automatically applying bid adjustments. QuanticMind’s model shines at this point, using machine learning to optimize bid adjustments at scale. Device Bid Modifiers, Geo Location Bid Modifiers, and Audience Bid Modifiers can all be automatically calculated and applied, based on their relative successes in the SEM program. The data science algorithms used here are another advantage when attempting to calculate optimized bids at scale.

Five: Anomaly Detection

Stage five moves into the often understated – but highly important – anomaly detection. This is one of several areas where the infrastructure discussed at the top can “flex” it’s strength. When designed for effective capturing, cleaning, and piping of data from any source, the system provides better data for better execution. However, the opposite has negative effects: when data is missing, or seems different than forecasts would suggest is reasonable, the performance can take a hit. Fully optimized bidding platforms prevent these problems by using multiple anomaly detection and issue-prevention steps, ensuring bids aren’t pushed based on bad data.

Technologies

PingThings was a startup looking to build a real-time platform to leverage machine-learning for physical systems on the electric utility grid and high-value industrial assets such as GSU transformers and step-down transformers. They wanted an analytics platform to track sensor data, focusing on storing and manipulating time-series data and modeling complex relationships between synchrophasors' high-resolution signals.

ModelLightGBM/XGBoost

12Data ScienceAdopt AWS SageMaker and complementary services

DATA SCIENCE INFRASTRUCTURE

• Use AWS SageMaker to get access to the standard Python Data

Science stack

• Jupyter Notebooks

• Numpy / Pandas / SciPy / etc.

• Scikit-learn for initial ML efforts

• Benefits:

• Serverless, on-demand infrastructure

• Huge ecosystem of libraries

• Defined workflow for deployment and continuous

improvement

MACHINE LEARNING INFRASTRUCTURE

• Use AWS SageMaker to instantly get access to all cutting-edge ML

stacks

• Jupyter Notebooks

• TensorFlow / PyTorch / XGBoost / etc.

• Use built-in AWS SageMaker features for labeling, training and

deploying models to live endpoints

• Benefits:


• Huge ecosystem of libraries

• Defined workflow for deployment and continuous

improvement

• Use the new Data Science / ML Infrastructure to improve

automation

• Keyword Tagging: Apply modern topic modeling and clustering

techniques

• OCR: custom solution trained on available data corpus to achieve

higher accuracy and recall

• Observation Extraction: Apply deep-learning based information

extraction

REPORTING & METRICS

• Establish Metrics / KPIs with key stakeholders

• Expose to stakeholders via a BI Reporting Tool (ex. PowerBI, AWS

QuickSight, Tableau)

• Benefits:


• Data democratization – give stakeholders a view into data

science efforts

THE ADVANTAGES OF WORKING WITH BLUE ORANGE

Industry-Leading Data Architecture

Data science starts with data cleaning and preparation.

We implement modern scalable data architecture from the start to expedite

ongoing analytics.

Clear Project Insight

We take the mystery out of data science

implementation with clear project insight and

non-technical project communication.

1 2 3

Access to Specialized Talent

Bring on a Ph.D data scientist for two weeks to work on a predictive customer segmentation

problem without the hiring challenges or

ongoing commitment.

Work-for-Hire

Leverage our decades of experience while building

something lasting in-house. We work with both existing technical

teams or as a standalone resource.

4

Uiba offers Machine Learning for

Organizational Management to medium and

large-sized organizations. This platform

enables organizations to hire, allocate, and

develop their workforce in a manner designed

to maximize productivity, minimize cost, and

achieve optimal efficiency. Blue Orange

developed and designed the first version of

their platform.

Blue Orange was instrumental in helping our

company achieve the early breakthroughs

necessary to get Uiba where we are today.

Josh and his team provided a great deal of

insight beyond the blocking and tackling of

development work, which helped us avoid

unnecessary but costly mistakes as we

responded to customer needs. Their work

was always top notch and delivered on time.

- Jason Cowell CEO of Uiba

Blue Orange Digital designed a cutting-edge

hiring and recruiting platform using machine

learning to optimize sourcing. We also used the

data analysis tools to identify high-value

applicants and optimize candidate funnel.

Over the course of my career, I’ve worked with

at least a dozen technology teams, and it is

without question that the Blue Orange team

stands above them all. It’s not just that they

believe in using frontier technologies or that

the expectation is constant learning and

improvement, but his sense of product, the

insights they provide, enables a product to be

truly usable and sticky. As a manager, perhaps

most valuable is the level of transparency the

team provides about progress and

deadlines. With other tech teams, it can be

excruciating to extract plans, in-depth updates

or explanations for issues as they arise. The

Blue Orange team is a partner, collaborator,

and leader.

- Lauren B., Executive at Point72 Asset

Management

TestimonialsProject Bio

Testimonial

Project Bio

Testimonial

Trusted by Fortune 500s and Innovative AI Companies

Our Leadership

Chief Executive Officer

Josh Miramant Colin Van Dyke Chief Technology

Officer

Lead Data ScientistDr Uri Schonfeld

Our Employees Come From:Trusted By Leading Brands:

A Data Science Agency

[email protected]

79 Madison AveNew York, New York

10017

(530) 454-5830

blueorange.digital

consulting and development agency. a data science and ... · provide full stack data science...

Documents