volume 6: data analytics hengtian september 2014 … · models in the coming big data age. ......
TRANSCRIPT
Volume 6: Data Analytics September 2014
Artificial intelligence, machine learning, and natural language processing
have moved from experimental concepts to business disruptors, inspired
by faster Internet speeds, the cloud scale, and matured distributed
computing frameworks, to drive insights that aid real-time decision-
making. Although the Big Data age we envision is still on the way, new
business models built around the so-called data economy are emerging
rapidly.
Hengtian began probing for Big Data technologies about three years
ago, after McKinsey published the famous research report marking Big
Data as the next frontier for innovation, competition and productivity. On
the one hand, Hengtian has strong capabilities on the whole Big Data
technology stack, including data collecting, Big Data processing and
storing, data mining, and data visualization/reporting, and has developed
a series of related products/solutions with its own intellectual property.
On the other hand, Hengtian is gearing towards vertical solutions that
will bring true value from Big Data to specific sectors, e.g. finance, the
public sector, and the media, as well as exploring innovative business
models in the coming Big Data age. Going forward, we continue to
provide Big Data-enabled solutions to our clients.
Since Big Data is a strategic trend for enterprises, we are devoted to not
just offering IT outsourcing services, but helping our clients to build their
future data roadmaps and to derive the most value from both internal
and external data.
Xinling Dong
Director of Technology
FOREWORD
Newsletter Hengtian
1
INNOVATION AT HENGTIAN
Hengtian’s Data Analytics Solution Overview—HT Analytics
Data Service Bus, or DSB (see Figure 1: Data Service Bus Architecture) is a universal data analytics solution. The
concept of DSB is derived from enterprise service bus (ESB), a common service-oriented architecture, which integrates
different applications by putting a communication bus between each of them, and then enabling each application to talk
to the bus. Similarly, DSB integrates different data sources rather than applications and is smart enough to know where
to grab data from massive data sources in order to deliver the best results to customers. DSB is highly versatile and
can be used for any analytics purpose.
Figure 1: Data Service Bus Architecture
* Flexible self-configuration for data
synchronization
DAO adaptors allow customers to configure the
data sources from which they want to extract
information. The data sources can be any kind of
database, internal or external, social media .
The use of Thrift in DAO adaptors makes it easier
to synchronize structured data from ODS data-
bases / data warehouses, semi-structured data
such as that found in spreadsheet tables, and
unstructured data such as that in email, word
documents, and NOSQL databases.
* Big data technologies make on-demand ana-
lytics possible
The solution described here is built on the
generally-accepted distributed computing
framework, Hadoop, that leverages the MapRe-
duce programming model to achieve a highly
scalable, distributed processing capacity. Big
Data systems extract the raw data from opera-
tional systems into a NoSQL database, HBase,
avoiding repetitive ETL processes when regula-
tory rules change. HBase is based on HDFS
(Hadoop Distributed File System). It is able to
store very large files with streaming data access,
and runs on clusters of commodity hardware.
These evolving technologies help to lower costs
while providing a more flexible alternative to the
central data repository model.
2
* The benefits of semantic web technology
DSB is able to synchronize data from any ODS database to any NOSQL database without an ETL proc-
ess, unlike a traditional data warehouse wherein the same data may appear with different identifiers
and in different formats. With DSB, semantic web technology is leveraged to establish a logical link
among related data. The way DSB uses semantic web technology to establish these logical links is
through domain ontology, which is used to maintain data relationships. What’s more, the raw data is
stored in the NOSQL database intact. The client query is broken down into a SPARQL query and a
MapReduce program, which search the RDF graph and database respectively.
What makes the DSB method superior to other methods is how it maps all of the data from the data-
base with the domain ontology to a consolidated virtual RDF graph, which can be explored though stan-
dard RDF query language, such as SPARQL.
* Machine learning (AI) boosts analytics
Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn
without being explicitly programmed.
Machine learning focuses on the development of computer programs that can teach themselves to
grow and change when exposed to new data.
Today machine learning is widely used in many areas, such as biomedicine, credit scoring, and
weather forecasting. DSB uses machine learning to make itself smarter. It makes use of customer infor-
mation as well as information that’s already in the database to develop relationships among the data,
which is linked via semantic web. In this way, DSB is able to shed light on various aspects of business:
● Generate customer insights
Target the most profitable product
Rack customer preferences
Track customer loyalty
Guide to enhance satisfaction
● Accelerate production innovation
Identify customer needs
Find new market opportunities
Refine cross-sell strategies
● Understand financial performance
Find potential investments
Predict potential risks
* Algorithm engine makes development easier
The algorithm engine is plug-and-play, which allows the users to develop their own algorithms to run in
DSB. The engine can run algorithms written in any language, allowing developers with different skill
sets to easily create new algorithms.
3
INDUSTRY SPOTLIGHT
Data Analytics Industry’s Prospect and Development
According to InformationWeek’s 2014 Salary Survey (BI & Analytics),
the rare data scientist compensation is bucking the salary trends
while the salaries of traditional BI / analytic professionals remain stag-
nant. The potential salary of a data scientist (over USD350K) points
to to “optimize” their talent via re-organization in order to achieve bet-
ter USD350K) points to a paradigm shift in utilizing the BI / analytics
in our day to day business. The survey does not suggest that enter-
prises are paying less attention to traditional BI / analytic solu-
tions. Rather, it explains the practice of many companies to
“optimize” their talent via re-organization in order to achieve better
results in resource allocation.
Retailers, financial services and telecommunications companies are
typically heavy users of advanced analytics and prefer broadly dedi-
cated teams to accomplish what they need. The healthcare field is
the newest adopter, with a 54 percent adoption rate. What makes it
different from other companies is its adoption of Big Data technology,
which is no longer limited to the data hosted internally. It now has an
unlimited data collection potential. In essence, the skills mastered by
a traditional data analyst are not necessarily comparable to the skills
of a successful data scientist.
Today, data scientists are tasked with finding the co-relations among
huge data sets. Thanks to the age of the Internet, there are no pre-
defined boundaries when it comes to the process of data collection. A
broad knowledge of the different domains of business data and an
intuitive understanding of how they can relate to one another are the
most important skills of a data scientist. Today’s enterprises are not
satisfied with complex, numbers-only reporting. Simply analyzing his-
torical data is no longer adequate. There is now a demand for ana-
lysts who are able to apply data visualization techniques, big data
modeling, and NoSQL data mining techniques. The key lies in ana-
lyzing both private and public data to make predictions about con-
sumer trends and product usage patterns. This paves the way for tre-
mendous innovation regarding the creation and marketing of products
to the public.
CHINESE BUSINESS SPOTLIGHT
Big Opportunities in China to Mine Big Data
Big data is a collection of data sets
so large and complicated that it is
difficult to process using common
database-managing tools or tradi-
tional data-processing applications.
China, with more than 1.35 billion
people, is made for big data analy-
sis of everything from health to in-
frastructure to tastes in entertain-
ment. Super-computing centers are
quickly being established in this
country of over 618 million Internet
users, one of the fastest-growing
mobile markets.
According to government projec-
tions, China’s information market
value will exceed 3.2 trillion yuan
(US$515 billion) in 2015. The value
is yet to be determined and there
is controversy over whether analy-
sis of big data truly reflects real-life
situations. Nonetheless, the term
pops up on business proposals and
negotiations across industries, as
have related terms such as data
warehouse, scalable data manage-
ment and data digging.
Only last month, a conference was
held in Beijing to promote business
opportunities regarding big
4
business opportunities regarding
big data in Guizhou, a southwest-
ern province generally considered
less developed. Big cities such as
Shanghai, Beijing and Guangzhou
also started their own big data 3- or
5-year plans in early 2013.
Last May, China’s National Devel-
opment and Reform Commission
(NDRC), a key decision-maker in
the economy, launched a model big
data service platform. It has ap-
proved many big data-related pro-
jects likely specialized industrial
zones and super-computing cen-
ters across the country.
Chinese companies, especially
Internet service providers, have
been busy mining gold from their
enormous data sets. E-commerce
giant Alibaba has also released a
few reports based on data sets
from its large user bases.
The potential business opportuni-
ties in enormous amounts of data
have attracted not just the sharks:
many small Internet service compa-
nies have also tried grabbing
shares early on.
http://www.shanghaidaily.com/
feature/news-feature/Big-
opportunities-in-China-to-mine-big-
data/shdaily.shtml
HENGTIAN NEWS
Hengtian Products Garners Two Innovation Awards at INT’L SOFT CHINA 2014
Organized by the Ministry of Industry and informationization and vari-
ous other ministries, The INT'L SOFT CHINA 2014 took place in Bei-
jing on May 29, 2014. As a National Strategic Key Software Enter-
prise, Hengtian was invited to attend the expo, where it claimed Soft-
ware Product Innovation Awards for both its Data Analysis Platform
and Enterprise Infrastructure Cloud Platform (HT Cloud). Hengtian
also attended the Key Sectors’ Demand and Supply Dialogue that
aimed to address the software needs of governmental and central fi-
nancial enterprises, as well as public companies. Also addressed
were new government policies, measures and procurement demands
regarding the domestic software industry.
”Software Drives Information Consumption, Software Spurs Economic
Transformation and Upgrades” was the theme of the expo, which ex-
hibited the most recent software developments that strive to promote
information consumption, improve the quality of life, and advance In-
formationization. The interactive and visually-interesting nature of this
event made it an unforgettable experience with its focus on
“consumption” a major highlight.
Co-Sponsored Innovative Technologies Semi-nar a Great Success
The first Innovative Technologies Seminar, co-sponsored by the Soft-
ware College of Zhejiang University and Hengtian and organized by
the Software Engineering Lab of the Software College of Zhejiang
University, successfully took place on Zhejiang University campus in
June. Nearly 100 academia and business experts attended the event
and contributed to the dialogue on eight cutting edge subjects in
cloud computing and big data.
Research scholars in the fields of software and computer science from
Zhejiang University and the Rochester Institute of Technology as well
as senior technical experts from Hengtian were invited to give
speeches on various topics including "Lightweight Cloud Foundry
Based on Docker", "Technical Dialogue on E-Payment Real-Time Risk
Monitoring", "The CEP Streaming Data Processing Engine", "Big Data
Analysis Based on Spark" and "Smart Home System Based on IoT
and the Mobile Internet". During the Q&A session, guest speakers and
attendants engaged in thought-provoking, in-depth discussions.
It provides an opportunity for external to learn the R&D work carried
out by Zhejiang University and Hengtian
5
If you would like to know more about Hengtian or the services we offer, please contact us: U.S. office in Boston: [email protected] Steve Toussaint Tel: +1 857-239-9658 China HQ in Hangzhou: [email protected] Tel: +86 571-8827-0208
HENGTIAN SERVICES SPOTLIGHT Recently, a large US institution chose Hengtian to be an analytics partner, to help their risk depart-
ment to pick up more advanced analytics capabilities. HengTian provided her advanced product 'HT
Analytics' (HTA) as a platform. Moreover, Hengtian also provided data scientists, data engineers,
and visualization engineers to help client build solutions in a cost-effective way. For more informa-
tion, please refer to the graph in next page.
6