volume 6: data analytics hengtian september 2014 … · models in the coming big data age. ......

6
Volume 6: Data Analytics September 2014 Artificial intelligence, machine learning, and natural language processing have moved from experimental concepts to business disruptors, inspired by faster Internet speeds, the cloud scale, and matured distributed computing frameworks, to drive insights that aid real-time decision- making. Although the Big Data age we envision is still on the way, new business models built around the so-called data economy are emerging rapidly. Hengtian began probing for Big Data technologies about three years ago, after McKinsey published the famous research report marking Big Data as the next frontier for innovation, competition and productivity. On the one hand, Hengtian has strong capabilities on the whole Big Data technology stack, including data collecting, Big Data processing and storing, data mining, and data visualization/reporting, and has developed a series of related products/solutions with its own intellectual property. On the other hand, Hengtian is gearing towards vertical solutions that will bring true value from Big Data to specific sectors, e.g. finance, the public sector, and the media, as well as exploring innovative business models in the coming Big Data age. Going forward, we continue to provide Big Data-enabled solutions to our clients. Since Big Data is a strategic trend for enterprises, we are devoted to not just offering IT outsourcing services, but helping our clients to build their future data roadmaps and to derive the most value from both internal and external data. Xinling Dong Director of Technology FOREWORD Newsletter Hengtian 1

Upload: dinhtruc

Post on 10-May-2018

213 views

Category:

Documents


1 download

TRANSCRIPT

Volume 6: Data Analytics September 2014

Artificial intelligence, machine learning, and natural language processing

have moved from experimental concepts to business disruptors, inspired

by faster Internet speeds, the cloud scale, and matured distributed

computing frameworks, to drive insights that aid real-time decision-

making. Although the Big Data age we envision is still on the way, new

business models built around the so-called data economy are emerging

rapidly.

Hengtian began probing for Big Data technologies about three years

ago, after McKinsey published the famous research report marking Big

Data as the next frontier for innovation, competition and productivity. On

the one hand, Hengtian has strong capabilities on the whole Big Data

technology stack, including data collecting, Big Data processing and

storing, data mining, and data visualization/reporting, and has developed

a series of related products/solutions with its own intellectual property.

On the other hand, Hengtian is gearing towards vertical solutions that

will bring true value from Big Data to specific sectors, e.g. finance, the

public sector, and the media, as well as exploring innovative business

models in the coming Big Data age. Going forward, we continue to

provide Big Data-enabled solutions to our clients.

Since Big Data is a strategic trend for enterprises, we are devoted to not

just offering IT outsourcing services, but helping our clients to build their

future data roadmaps and to derive the most value from both internal

and external data.

Xinling Dong

Director of Technology

FOREWORD

Newsletter Hengtian

1

INNOVATION AT HENGTIAN

Hengtian’s Data Analytics Solution Overview—HT Analytics

Data Service Bus, or DSB (see Figure 1: Data Service Bus Architecture) is a universal data analytics solution. The

concept of DSB is derived from enterprise service bus (ESB), a common service-oriented architecture, which integrates

different applications by putting a communication bus between each of them, and then enabling each application to talk

to the bus. Similarly, DSB integrates different data sources rather than applications and is smart enough to know where

to grab data from massive data sources in order to deliver the best results to customers. DSB is highly versatile and

can be used for any analytics purpose.

Figure 1: Data Service Bus Architecture

* Flexible self-configuration for data

synchronization

DAO adaptors allow customers to configure the

data sources from which they want to extract

information. The data sources can be any kind of

database, internal or external, social media .

The use of Thrift in DAO adaptors makes it easier

to synchronize structured data from ODS data-

bases / data warehouses, semi-structured data

such as that found in spreadsheet tables, and

unstructured data such as that in email, word

documents, and NOSQL databases.

* Big data technologies make on-demand ana-

lytics possible

The solution described here is built on the

generally-accepted distributed computing

framework, Hadoop, that leverages the MapRe-

duce programming model to achieve a highly

scalable, distributed processing capacity. Big

Data systems extract the raw data from opera-

tional systems into a NoSQL database, HBase,

avoiding repetitive ETL processes when regula-

tory rules change. HBase is based on HDFS

(Hadoop Distributed File System). It is able to

store very large files with streaming data access,

and runs on clusters of commodity hardware.

These evolving technologies help to lower costs

while providing a more flexible alternative to the

central data repository model.

2

* The benefits of semantic web technology

DSB is able to synchronize data from any ODS database to any NOSQL database without an ETL proc-

ess, unlike a traditional data warehouse wherein the same data may appear with different identifiers

and in different formats. With DSB, semantic web technology is leveraged to establish a logical link

among related data. The way DSB uses semantic web technology to establish these logical links is

through domain ontology, which is used to maintain data relationships. What’s more, the raw data is

stored in the NOSQL database intact. The client query is broken down into a SPARQL query and a

MapReduce program, which search the RDF graph and database respectively.

What makes the DSB method superior to other methods is how it maps all of the data from the data-

base with the domain ontology to a consolidated virtual RDF graph, which can be explored though stan-

dard RDF query language, such as SPARQL.

* Machine learning (AI) boosts analytics

Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn

without being explicitly programmed.

Machine learning focuses on the development of computer programs that can teach themselves to

grow and change when exposed to new data.

Today machine learning is widely used in many areas, such as biomedicine, credit scoring, and

weather forecasting. DSB uses machine learning to make itself smarter. It makes use of customer infor-

mation as well as information that’s already in the database to develop relationships among the data,

which is linked via semantic web. In this way, DSB is able to shed light on various aspects of business:

● Generate customer insights

Target the most profitable product

Rack customer preferences

Track customer loyalty

Guide to enhance satisfaction

● Accelerate production innovation

Identify customer needs

Find new market opportunities

Refine cross-sell strategies

● Understand financial performance

Find potential investments

Predict potential risks

* Algorithm engine makes development easier

The algorithm engine is plug-and-play, which allows the users to develop their own algorithms to run in

DSB. The engine can run algorithms written in any language, allowing developers with different skill

sets to easily create new algorithms.

3

INDUSTRY SPOTLIGHT

Data Analytics Industry’s Prospect and Development

According to InformationWeek’s 2014 Salary Survey (BI & Analytics),

the rare data scientist compensation is bucking the salary trends

while the salaries of traditional BI / analytic professionals remain stag-

nant. The potential salary of a data scientist (over USD350K) points

to to “optimize” their talent via re-organization in order to achieve bet-

ter USD350K) points to a paradigm shift in utilizing the BI / analytics

in our day to day business. The survey does not suggest that enter-

prises are paying less attention to traditional BI / analytic solu-

tions. Rather, it explains the practice of many companies to

“optimize” their talent via re-organization in order to achieve better

results in resource allocation.

Retailers, financial services and telecommunications companies are

typically heavy users of advanced analytics and prefer broadly dedi-

cated teams to accomplish what they need. The healthcare field is

the newest adopter, with a 54 percent adoption rate. What makes it

different from other companies is its adoption of Big Data technology,

which is no longer limited to the data hosted internally. It now has an

unlimited data collection potential. In essence, the skills mastered by

a traditional data analyst are not necessarily comparable to the skills

of a successful data scientist.

Today, data scientists are tasked with finding the co-relations among

huge data sets. Thanks to the age of the Internet, there are no pre-

defined boundaries when it comes to the process of data collection. A

broad knowledge of the different domains of business data and an

intuitive understanding of how they can relate to one another are the

most important skills of a data scientist. Today’s enterprises are not

satisfied with complex, numbers-only reporting. Simply analyzing his-

torical data is no longer adequate. There is now a demand for ana-

lysts who are able to apply data visualization techniques, big data

modeling, and NoSQL data mining techniques. The key lies in ana-

lyzing both private and public data to make predictions about con-

sumer trends and product usage patterns. This paves the way for tre-

mendous innovation regarding the creation and marketing of products

to the public.

CHINESE BUSINESS SPOTLIGHT

Big Opportunities in China to Mine Big Data

Big data is a collection of data sets

so large and complicated that it is

difficult to process using common

database-managing tools or tradi-

tional data-processing applications.

China, with more than 1.35 billion

people, is made for big data analy-

sis of everything from health to in-

frastructure to tastes in entertain-

ment. Super-computing centers are

quickly being established in this

country of over 618 million Internet

users, one of the fastest-growing

mobile markets.

According to government projec-

tions, China’s information market

value will exceed 3.2 trillion yuan

(US$515 billion) in 2015. The value

is yet to be determined and there

is controversy over whether analy-

sis of big data truly reflects real-life

situations. Nonetheless, the term

pops up on business proposals and

negotiations across industries, as

have related terms such as data

warehouse, scalable data manage-

ment and data digging.

Only last month, a conference was

held in Beijing to promote business

opportunities regarding big

4

business opportunities regarding

big data in Guizhou, a southwest-

ern province generally considered

less developed. Big cities such as

Shanghai, Beijing and Guangzhou

also started their own big data 3- or

5-year plans in early 2013.

Last May, China’s National Devel-

opment and Reform Commission

(NDRC), a key decision-maker in

the economy, launched a model big

data service platform. It has ap-

proved many big data-related pro-

jects likely specialized industrial

zones and super-computing cen-

ters across the country.

Chinese companies, especially

Internet service providers, have

been busy mining gold from their

enormous data sets. E-commerce

giant Alibaba has also released a

few reports based on data sets

from its large user bases.

The potential business opportuni-

ties in enormous amounts of data

have attracted not just the sharks:

many small Internet service compa-

nies have also tried grabbing

shares early on.

http://www.shanghaidaily.com/

feature/news-feature/Big-

opportunities-in-China-to-mine-big-

data/shdaily.shtml

HENGTIAN NEWS

Hengtian Products Garners Two Innovation Awards at INT’L SOFT CHINA 2014

Organized by the Ministry of Industry and informationization and vari-

ous other ministries, The INT'L SOFT CHINA 2014 took place in Bei-

jing on May 29, 2014. As a National Strategic Key Software Enter-

prise, Hengtian was invited to attend the expo, where it claimed Soft-

ware Product Innovation Awards for both its Data Analysis Platform

and Enterprise Infrastructure Cloud Platform (HT Cloud). Hengtian

also attended the Key Sectors’ Demand and Supply Dialogue that

aimed to address the software needs of governmental and central fi-

nancial enterprises, as well as public companies. Also addressed

were new government policies, measures and procurement demands

regarding the domestic software industry.

”Software Drives Information Consumption, Software Spurs Economic

Transformation and Upgrades” was the theme of the expo, which ex-

hibited the most recent software developments that strive to promote

information consumption, improve the quality of life, and advance In-

formationization. The interactive and visually-interesting nature of this

event made it an unforgettable experience with its focus on

“consumption” a major highlight.

Co-Sponsored Innovative Technologies Semi-nar a Great Success

The first Innovative Technologies Seminar, co-sponsored by the Soft-

ware College of Zhejiang University and Hengtian and organized by

the Software Engineering Lab of the Software College of Zhejiang

University, successfully took place on Zhejiang University campus in

June. Nearly 100 academia and business experts attended the event

and contributed to the dialogue on eight cutting edge subjects in

cloud computing and big data.

Research scholars in the fields of software and computer science from

Zhejiang University and the Rochester Institute of Technology as well

as senior technical experts from Hengtian were invited to give

speeches on various topics including "Lightweight Cloud Foundry

Based on Docker", "Technical Dialogue on E-Payment Real-Time Risk

Monitoring", "The CEP Streaming Data Processing Engine", "Big Data

Analysis Based on Spark" and "Smart Home System Based on IoT

and the Mobile Internet". During the Q&A session, guest speakers and

attendants engaged in thought-provoking, in-depth discussions.

It provides an opportunity for external to learn the R&D work carried

out by Zhejiang University and Hengtian

5

If you would like to know more about Hengtian or the services we offer, please contact us: U.S. office in Boston: [email protected] Steve Toussaint Tel: +1 857-239-9658 China HQ in Hangzhou: [email protected] Tel: +86 571-8827-0208

HENGTIAN SERVICES SPOTLIGHT Recently, a large US institution chose Hengtian to be an analytics partner, to help their risk depart-

ment to pick up more advanced analytics capabilities. HengTian provided her advanced product 'HT

Analytics' (HTA) as a platform. Moreover, Hengtian also provided data scientists, data engineers,

and visualization engineers to help client build solutions in a cost-effective way. For more informa-

tion, please refer to the graph in next page.

6