versacons€¦ · web viewdevelop formal processes for seeking guidance from your solution...

26
Gartner - https://www.gartner.com/doc/reprints?id=1-3ZUFCTU&ct=170509&st=sb LICENSED FOR DISTRIBUTION Organizing Your Teams for Modern Data and Analytics Deployment Published: 20 March 2017 ID: G00280461 Analyst(s): Mark A. Beyer Summary Advanced analytics, modern business analytics and related data management supporting the digital business demands that data and analytics leaders recognize a new symbiosis across the infrastructure, the organization and approach. Overview Impacts The emergence of data science and self-service or "citizen" users is driving data and analytics leaders to seek more diverse solutions for infrastructure design and implementation. A diverse spectrum of analytics use cases often results in conflicts between differing infrastructure options, which forces earlier consideration of broad service-level agreements (SLAs). A new "symbiotic" relationship between advanced users and enterprise solution architects is steering data and analytics leaders toward a fresh organizational structure.

Upload: others

Post on 09-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VersaCons€¦ · Web viewDevelop formal processes for seeking guidance from your solution architects and system designers for balancing delivery priority, budgets and personnel during

Gartner - https://www.gartner.com/doc/reprints?id=1-3ZUFCTU&ct=170509&st=sb   LICENSED FOR DISTRIBUTION

Organizing Your Teams for Modern Data and Analytics DeploymentPublished: 20 March 2017 ID: G00280461 Analyst(s):Mark A. Beyer

SummaryAdvanced analytics, modern business analytics and related data management supporting the digital business demands that data and analytics leaders recognize a new symbiosis across the infrastructure, the organization and approach.

Overview Impacts The emergence of data science and self-service or "citizen" users is driving data and analytics leaders to seek

more diverse solutions for infrastructure design and implementation. A diverse spectrum of analytics use cases often results in conflicts between differing infrastructure options, which

forces earlier consideration of broad service-level agreements (SLAs). A new "symbiotic" relationship between advanced users and enterprise solution architects is steering data and

analytics leaders toward a fresh organizational structure.

Recommendations For data and analytics leaders, including chief data officers (CDOs): Facilitate user access to data in its near-native format (pretransformation) based on users' widely varied levels of

data management and analytics skills.

Page 2: VersaCons€¦ · Web viewDevelop formal processes for seeking guidance from your solution architects and system designers for balancing delivery priority, budgets and personnel during

Identify a very small group of data engineers/miners ("master" level) to classify end-user expectations, which will help guide infrastructure and deployment.

Formalize the organizational structure for collaboration between the emerging analytics center of excellence (ACE), information governance, the data science team and the enterprise architecture team to prioritize infrastructure delivery with processes, deliverables and tracking.

Develop formal processes for seeking guidance from your solution architects and system designers for balancing delivery priority, budgets and personnel during delivery. (When addressing smaller organizations, these same principles apply.)

Analysis Three supporting trends all point in the direction of distributed data utilization and management: Decentralized tools and even design to support citizen development and self-service ( people ) Logical data warehouse (LDW) information infrastructures ( technology ) that require a new mix of technology

experts with different types of users A new attitude to "design by using" to associate technologists with activity in the analyst community (

methodology ) With the emergence of "citizen" approaches to use-case-driven activities like data preparation, data discovery and the participation in programmatic and strategic roles in data management, data and analytics leaders have new options to grant greater access to information and data by more analyst users (see "How to Implement a Modern Business Intelligence and Analytics Platform" and "Technology Insight for Modern Business Intelligence and Analytics Platforms" ). Simultaneously, Gartner's LDW encourages early use of data by the organization. The LDW practice is becoming pervasive and readily apparent (see Note 1). "Design by using" is the idea that some users are more experienced in the areas of business process, system architecture, data architecture and advanced analytics, and that these users should be given more direct access to data in a state as close to the originating source structure, quality and format as possible. These users experiment with the data and actually develop their own standards through use. The lessons learned are sometimes excellent candidates for designing the more broadly used, repeatable solutions popular with less-skilled users. The combination of enabling infrastructure, "design by using" practices and user experience with broader tools makes it clear that the only questions regarding data access and availability are "How to …?" and "How fast …?" or, more specifically, "Can I get to Mode 2?" (see Note 2). Figure 1. Top Impacts and Recommendations for Data and Analytics Leaders

Page 3: VersaCons€¦ · Web viewDevelop formal processes for seeking guidance from your solution architects and system designers for balancing delivery priority, budgets and personnel during

ACE = analytics center of excellence; BICC = business intelligence competency center; SLA = service-level agreement Source: Gartner (March 2017)

Impacts and Recommendations

Page 4: VersaCons€¦ · Web viewDevelop formal processes for seeking guidance from your solution architects and system designers for balancing delivery priority, budgets and personnel during

The emergence of data science and self-service or "citizen" users is driving data and analytics leaders to seek more diverse solutions for infrastructure design and implementation Over the past three years, new roles have emerged in relation to analytics — for example, data scientist, data engineer, data analyst, citizen analyst and citizen integrator. Casual "apprentices" prefer using repositories, but analyst "journeymen" like building additional functionality in a contender approach that they "add" to existing repository models, possibly using virtualization or views. These journeymen like both the compromise of an SLA as a basis and the contender for somewhat "safe" discovery approaches. In general, while the roles have new names and, at times, more specifically defined tasks, they still fall into four main skills areas. "Casual" users (apprentices) — Casual users/apprentices use canned queries and reports, or utilize prebuilt

drill-downs and filters. They generally utilize established key performance indicators (KPIs) and metrics that support the KPIs. They export data but generally use it as is, within a more list-based approach. Casual users are the least demanding of the new roles, often exhibiting 1,000 users for every one data scientist, or for up to 90 business analysts.

Business analysts (journeymen) — Analysts will utilize existing reports and canned queries as little more than starting positions in their work. They will most definitely utilize extracts based on existing queries. Their usage pattern includes frequently creating cross-references of data and even inference values to derive missing data points. Generally, they understand business processes very well, but often will attempt analysis that exceeds their understanding of data and even statistics they attempt to utilize. These users are the most challenging group for data and analytics leaders. For every five data engineers or single data scientist, there exists approximately 90 business analysts.

Engineers or miners ("master" level) — Data engineers understand sourcing; that is, the systems and the business processes that populate those systems with data. They tend to keep in mind mental models of various data assets and business process precedent/dependency relationships whenever working through their analysis. Engineers frequently develop highly accurate inference and derived reference values. They know the limitations of the data they use and their own skills, and rarely overstep either. Engineers are actually rare (many business analysts believe they are engineers), usually about five for every 1,000 casual users.

Data scientists — Not only the data scientist, but also data-science-related roles, understand how abstract data is and quickly embrace creating their own abstractions to represent analytic outcomes. They embrace mathematics and create highly interpretive models that imply new context and new semantics for data. There is usually only one data scientist for every 1,100 users in other categories.

Page 5: VersaCons€¦ · Web viewDevelop formal processes for seeking guidance from your solution architects and system designers for balancing delivery priority, budgets and personnel during

It is important to note, as previously stated, that these categories of user are tied more directly to how the users utilize data and their knowledge of data (especially its origination in business processes) relative to their understanding — see "Making Big Data Normal Begins With Self-Classifying and Self-Disciplined Users." There may be specific roles involved (and frequently there are multiple roles in each category of user). What data and analytics leaders must account for is users' expectations from their information infrastructure — a clear mix of Mode 1 and Mode 2 in a bimodal approach. (For a table representation and comparison of these expectations, see Note 3.) Users will also form "clusters" around principal or lead analysts. In some cases, the analyst lead will develop standards that all other users are expected to follow (as with a "compliance" lead). Other groups will feed independent and even sometimes competing analyses into a central concept (e.g., relationship management). A third organizational model is one in which many types of analyses are done around a central concept with relationships to near neighbors, but without specific centralized control or even any type of deliberate rationalization (as with assets). Understanding how different user communities interact within their unit is important when it comes to "learning" their behavior and then aligning the architecture and infrastructure to their needs (see Figure 2). Figure 2. Examples of User Community Development

Page 6: VersaCons€¦ · Web viewDevelop formal processes for seeking guidance from your solution architects and system designers for balancing delivery priority, budgets and personnel during

Source: Gartner (March 2017)

Recommendations:

For data and analytics leaders:

Page 7: VersaCons€¦ · Web viewDevelop formal processes for seeking guidance from your solution architects and system designers for balancing delivery priority, budgets and personnel during

Ensure your CDO direct report develops a category model, or at least an understanding of, types of users as "populations" in the user community with the intent of creating different data access and infrastructure specifications for each community.

Form "cells" of users that have a shared horizontal function, delivery channel or operations area. Identify at least one analyst-level user to represent the cell in any form of requirements or use-case discussions.

Encourage solution architects to leverage a second level of community interaction between analyst "cell" representatives and at least one data miner (master level) to coordinate and rationalize data access and use-case requirements relative to infrastructure design and deployment.

A diverse spectrum of analytics use cases often results in conflicts between differing infrastructure options, which forces earlier consideration of broad SLAs It is possible that the most important change in deploying data management for analytics is in realizing that SLAs and organizational structure are not tied to technology solutions. Effectively, we're saying that data and analytics leaders should take their time to realign the organization. (See "Avoid a Big Data Warehouse Mistake by Evolving to the Logical Data Warehouse Now" for specific SLAs that align to data use cases, and Note 4 below for an explanation on the differences between the two.) IT infrastructure must meet the demands of the digital era, which has moved beyond conventional design techniques and into more of a social engagement process (see "Modernize Your Information Infrastructure to Support Bimodal IT" ). Of course, this means there is no longer a single "waterfall" of steps; instead, multiple paths, even loops, that combine to inform the final (production) design. Do not think of one fixed process and become anxious at its disruption and resist it. Instead, this is validation that current practices have found their premier service level. New designs and tools are required to augment existing practices and increase data utilization alongside the established infrastructure components. Gartner identifies eight major architectural requirements within a logical data warehouse (LDW) (see "Decision Point for Logical Data Warehouse Implementation Styles" ). An organization is considered to be "committed" to the LDW approach when greater than 50% of those architectural components have been deployed. As of December 2015, LDW adoption had reached approximately 11% of the then currently deployed data management solutions for analytics (new penetration and market data could be available as early as the beginning of March 2017). The LDW represents one of the highest maturity levels regarding platforms and infrastructure for analytics (see "ITScore for BI and Analytics" ). The issue is that analysts are not necessarily more skilled or more experienced in parallel with their new demands — hence the demand for a parallel organizational change.

Page 8: VersaCons€¦ · Web viewDevelop formal processes for seeking guidance from your solution architects and system designers for balancing delivery priority, budgets and personnel during

As noted in Figure 3, specific technology solutions are preferred within an LDW (see "Avoid a Big Data Warehouse Mistake by Evolving to the Logical Data Warehouse Now" for more information on repositories, data virtualization and distributed processes). It is reasonable that specific infrastructure deployment options align with differing SLAs. Instead of considering the technology basis for the infrastructure, it is best to consider the service-level requirements and their associated SLAs. In Figure 3, we introduce the idea that different user expectations create distinct service-level expectations. Figure 3. Aligning User Skills With Infrastructure Functional Role

Source: Gartner (March 2017)

Your organization should consider aligning technology components with user expectations to create three SLAs:

Page 9: VersaCons€¦ · Web viewDevelop formal processes for seeking guidance from your solution architects and system designers for balancing delivery priority, budgets and personnel during

Compromise — Using repositories to deliver high performance, predictable, prequalified and (possibly) summarized data to the least data-savvy analytics users will remain a significant demand.

Contender — These are contending models of the data, contending use cases, and often vary from one community of users to another or even within a single community. As such, they are not able to "compromise."

Candidate — Users with advanced knowledge of data sourcing, overall data architecture and business processes often seek data as near to native format and structure as possible. Data engineers and scientists actually develop candidate structures and semantics for even reading the data at hand.

For more details on these SLAs, see Note 5. After considering infrastructure and use-case alignment as depicted above, the more important aspect is that: Architects and deployment managers can make decisions on when to move a given analytic model and data access strategy from one part of the infrastructure to another. If a semantic access tier is used, the metadata within that part of the infrastructure can be analyzed to support such decisions. Metadata collected in semantic tiers includes frequency of accessing data objects (including row/column returns in structured data and prevalence of use in unfamiliar "unstructured" assets) as well as end-use-case definitions; some even include capturing processing routines and/or views/queries. Analyzing these types of metadata can serve as a useful indicator of when a pilot or prototype should remain intact as deployed, or moved to a different infrastructure component. Reorganizing communities of users to provide inputs and making this part of their role are critical (see "Create a Centralized and Decentralized Organizational Model for Business Intelligence" ). A phrase that is useful here is "go slow to go fast." Set up the tactics and environment for using Mode 2 and more agile techniques so they can succeed. Casual users need support from analysts to help identify useful data discovery that can be added to existing reports and analytics. Analysts need the technical and problem-solving skills of miners/masters to promote infrastructure and architecture changes with IT management. Each community of users will have a very different interaction level with the architecture and infrastructure teams. Recommendations:

Identify a very small group of data engineers/miners (i.e., masters) and learn how to classify end-user expectations to help guide infrastructure and deployment.

Enable users to access near-native or transformed data based on their widely varied levels of data management and integration skills.

Make sure solution architects review user categorization and the influence of both the current and planned information infrastructure strategic roadmap (e.g., all cloud, half cloud, some cloud) to determine when broadly used platforms are best versus individually preferred tools.

Page 10: VersaCons€¦ · Web viewDevelop formal processes for seeking guidance from your solution architects and system designers for balancing delivery priority, budgets and personnel during

A new "symbiotic" relationship between advanced users and enterprise solution architects is steering data and analytics leaders toward a fresh organizational structure Once the SLAs are well understood, there comes the messy business of figuring out how to deliver against them, which is the responsibility of the solution architecture team. Bimodal business intelligence (BI) and advanced analytics delivery requires the capability to move data, analysis and users between Modes 1 and 2. This is nothing new to analytics developers, who have been wrestling with changing source systems for four decades now; it has simply not been adequately formalized in the organization and has instead been delivered as different types of projects. Often, organizations fail to create a link between architecture, business use cases and expert user discovery and exploration of both data and analytics. While seeking agile approaches, they still use more of a "waterfall" approach, with requirements rolling downhill, then back up after deployment for testing and quality assurance. But it doesn't have to be that way. Fortunately, modification of what is probably an existing organization structure is all that is needed. Step One: Replace the BICC With an ACE The traditional BI competency center (BICC) is modified to become the analytics center of excellence (ACE) (see Note 6 and "The Business Intelligence and Analytics Leader's First 100 Days" ). This component relies on the maturity level of the overall organization. In the current era, the ACE is emerging less as central and more as distributed, with a focus less on specific competencies and more on multiple skill sets. The guidance here is how to evolve from current practices to the new distributed and multiskilled approach. Here, the ACE consists of mostly the three user classes of casual user, business analyst and data miner/engineer. This step is still heavily Mode 1. Step Two: Align User Service-Level Expectations to Infrastructure Capability Casual users like repositories because they are fast and consistent. Analysts may begin with repositories, but eventually want to have the flexibility of adding more data or remodeling existing data — hence they use repositories and virtual semantic tiers. Miners/engineers like near-native data, so prefer virtual semantic tiers and distributed processes, but want to eventually offload their discoveries for others to use — into repositories. Scientists prefer raw data, so almost always prefer distributed processing. Step Three: Evolve the Data Science Laboratory At the same time, your infrastructure team must begin consideration of "sandboxes" and distributed processing platforms. It is likely you will not have a laboratory, as yet. Note that, for this step, we do not indicate a "data scientist," rather "data science." Emerging data scientists might be miners/engineers who want to advance their skills,

Page 11: VersaCons€¦ · Web viewDevelop formal processes for seeking guidance from your solution architects and system designers for balancing delivery priority, budgets and personnel during

or new hires with expertise in advanced statistics and analysis, or any other combination of data and analysis skills. This is definitively Mode 2 analysis development, but it needs both modes of infrastructure. Step Four: Establish the "Pivot" Responsibility for Data Engineers/Miners The data engineer or miner has a key role. These individuals are intimately familiar with sourcing, operational/transactional systems, analytic platforms, data integration, data quality and more — hence why they are so rare. As a result, they are closely attuned to what is "possible" with both the data and within business processes. They have an important multirole function of vetting suggestions from the data science team and translating them into business terms for the office of the CDO, translating business needs into data science terms, and translating data science solutions into production requirements for the enterprise architecture team (see Figure 4). Effectively, the data engineer assists with changing modes. This pivot is depicted in Figure 4. It shows that the infrastructure (LDW) and the user evolution (the ACE) must now be resolved through a new team of engineers, scientists, the office of the CDO and solution architects to determine opportunities and priorities to change from Mode 2 ("candidate") to Mode 1 ("compromise") using the "contender" SLA region to prove out the changes in infrastructure. Figure 4. Data Engineers Become a Pivotal Role Guiding Data and Analytics Leaders

Page 12: VersaCons€¦ · Web viewDevelop formal processes for seeking guidance from your solution architects and system designers for balancing delivery priority, budgets and personnel during

CDO = chief data officer

Page 13: VersaCons€¦ · Web viewDevelop formal processes for seeking guidance from your solution architects and system designers for balancing delivery priority, budgets and personnel during

Source: Gartner (March 2017)

Step Five: Deploy Parallel Delivery Teams and Approaches The Mode 1 team builds the repositories based on the casual users and business analyst primary requirements.

The Mode 2 team focuses on an enabling infrastructure for the data engineer/miner and data scientist. Casual users will prefer a repository with "thick" data integration to prepare the data and a thin, fast BI layer

(often a dashboard or KPIs with drill-down). Figure 5 shows this as thinner and thicker ends of the various shapes. Analysts prefer less manipulation in the data integration layer and more use of the semantic tools of their BI

platform. Miners/engineers prefer the absolute thinnest integration, and will sometimes embed analytics into other

applications using that data. They may even use multiple analytics tools. Data scientists will prefer near-native or even native-captured data formats and often only need bare-metal

servers to build their own platforms. Less infrastructure-oriented data scientists will want established clusters and storage.

Analytics leaders will then get feedback from the ACE and data science team to begin the project with a solution architect following all of these delivery principles. They will estimate and obtain staffing from a project management office, working in the development environment that often mimics the discovery sandbox.

When the SLA and use cases change, the delivery team must determine the priority for changing the architecture and the resulting design principles. Eventually, solutions are vetted for data governance, quality and performance optimization concerns.

Figure 5. The Team Drives the Scope and Implementation of the Infrastructure

Page 14: VersaCons€¦ · Web viewDevelop formal processes for seeking guidance from your solution architects and system designers for balancing delivery priority, budgets and personnel during

CDO = chief data officer Source: Gartner (March 2017)

Page 15: VersaCons€¦ · Web viewDevelop formal processes for seeking guidance from your solution architects and system designers for balancing delivery priority, budgets and personnel during

Finally, remember that once a solution is deployed, it can migrate to another SLA and another part of the infrastructure. SLAs are not permanent and any use case can migrate up or down (or more appropriately) across the SLAs as a continuum; some can even be delivered in multiple infrastructure approaches. Recommendations:

Establish an organization structure that supports a process of named users who indicate when SLA requirements are no longer served by the deployed infrastructure. The CDO should leverage this team to prioritize infrastructure delivery.

Make sure that solution architects and designers always focus on the roles and process required and, when addressing smaller teams or organizations, that the principles for delivery are still pursued.

Evidence Reference survey data collected for the February 2016 "Magic Quadrant for Data Management Solutions for

Analytics." Reference survey respondents were obtained from vendor-provided contacts and asked a series of questions regarding their overall data management solutions for analytics delivery. Respondents were then screened with a qualifying battery of questions to determine the extent of their deployments, types of users and methods for delivering data, including how they combined traditional delivery with distributed processing approaches (e.g., Hadoop, NoSQL and so on) n = 298 respondents. Magic Quadrant reference survey data for data warehouse and data management solutions for analytics shows 114 of 298 respondents providing a response are pursuing a logical data warehouse that requires at least two of three required SLAs and associated technology.

Additional reference survey years were also used from the surveys conducted to support our Magic Quadrant for Data Warehouse Database Management Systems and for Data Warehouse and Data Management Solutions for Analytics published in 2013 (n = 269), 2014 (n = 265) and 2015 (n = 265).

Gartner client inquiries regarding utilization of logical data warehouse concepts (n = 726 of 3,017 on generic "data warehouse") from October 2014 to October 2016.

Gartner event attendee 1-to-1 sessions with multiple analysts at Gartner's BIIM and BIAA Summits throughout 2014, 2015 and 2016, as well as Gartner Symposium events throughout that same period.

Reference survey customers are vetted through questions regarding complexity and frequency of queries, the current life span of their data warehouse, intent to expand the warehouse, presences of current (most modern) data platform software used for the warehouse, and frequency of different types of queries to determine their maturity level. This analysis demonstrates that suppliers/vendors provide primarily their most advanced users for the survey (fully 85% of survey respondents would be categorized as significantly mature). As a result, these organizations often over report the presence of data science and data miners (engineers) and skew toward lower

Page 16: VersaCons€¦ · Web viewDevelop formal processes for seeking guidance from your solution architects and system designers for balancing delivery priority, budgets and personnel during

numbers of casual and business analyst users. Reference survey respondents also are diligent in reporting casual users as "portal" or dashboard/metrics users.

Inquiry clients are vetted for maturity based on the questions asked, types of issues discussed, overall representation they provide regarding their user community, and other factors. Inquiry clients are generally less mature (although some provide significant innovation inputs, these are among Gartner's most advanced clients). As a result, these organizations often report the absence of data science and very low numbers for data miners (engineers) and skew toward casual and business analyst users. Inquiry customers tend to forget to report casual users that access data via portals or dashboard/metrics tools.

A rationalization process that balances the skewed reports from inquiry and reference survey data resolves to the ratio you see in Figures 3 and 4 of this research note: 1,000 casual users to 90 business analysts to 5 data miners/engineers to 1 data scientist. This same process reveals that, in smaller organizations, data science users are absent and data miner/engineer staff count is also reduced or absent.

o Gartner end-user reference surveys are collected from different sources. Those that are provided by supplier/vendor organizations are shown to represent specifically selected reference customers, almost universally of two types: recent implemented customers who have excellent experience with their first implementation for well-defined and immediate requirements; or, long-standing customers who regularly update/upgrade their solutions taking advantage of new product/offering features and functionality.

o Gartner inquiry data is a much broader mix of end-user experience that more accurately reflects the distribution of experiences in the market. This is partially determined by either the organization or Gartner analysts applying the Gartner maturity models for data integration, data management solutions for analytics and data quality. The inquiry customer distribution tends to have a far larger percentage of less mature delivery customers overall.

o Gartner case studies represent a much smaller representative body, but tends to show the midlevel or higher levels of maturity for differing areas of delivery in managing data for analytics.

o The reference survey data is considered to be an over-representation of "satisfied" customers, while inquiry customers are considered to be more representative of the broader market experience. As a result, we have created a model that uses the reference data to represent approximately 20% of overall influence on the market experience of organizations and the inquiry data to represent most of the balance. Due to a gap in the midlevel maturity market, any case study data is used to determine the general market experience as customers transition from lower levels of experience to more advanced.

Note 1 Logical Data Warehouses

Page 17: VersaCons€¦ · Web viewDevelop formal processes for seeking guidance from your solution architects and system designers for balancing delivery priority, budgets and personnel during

LDWs utilize a combination of centralized data collections in more traditional repositories combined with a robust semantic tier to access data in more distributed platforms. At times, the LDW utilizes virtual data tiers, views and other semantic concepts that merely reference external data (as in a data lake). Other times, the LDW will actually manage distributed processes, initiating them, collecting their output and then rendering the results through various technology approaches to end users in their tools or additional data platforms. For more information, see "The Data Warehouse and DMSA Market: Current and Future States, 2016."

Note 2 Bimodal IT Bimodal approaches to IT architecture and implementation hold that predictable processes and systems constitute a good short-hand to describe Mode 1. Mode 2 is best-described as irregular scope and timing without previously unanticipated changes in delivery or requirements.

Note 3 Table View of User Classifications

Table 1.   Data Infrastructure Expectations by User Category

User Category

Source/lineage metadata requirements

Data structure/format Data access expectation based on the frequency and complexity of processing

Performance and efficiency demand

Science (data scientist)

Full lineage, all available taxonomy/ontology, originating business process.

Access APIs, direct access, named user rights. Common access point for retrieving multiple data assets. Full descriptors. Data profiles for frequency, distribution, sparsity/density, etc. of data values.

Infrequent when highly complex but with many starts/aborts. Frequent when low complexity, but primarily to simply do fast retrieves to support "next steps." Ability to access all data in native or near-native form.

Complex processes need not be optimized. Fast retrievals for list style data to support their initial work to develop theories. Expectation that support team will

Page 18: VersaCons€¦ · Web viewDevelop formal processes for seeking guidance from your solution architects and system designers for balancing delivery priority, budgets and personnel during

Table 1.   Data Infrastructure Expectations by User Category

User Category

Source/lineage metadata requirements

Data structure/format Data access expectation based on the frequency and complexity of processing

Performance and efficiency demand

Prefers shared and supported infrastructure without significant governance, but specifically wants unqualified personnel "locked out."

address performance and access issues.

Engineer/miner (master level)

Full lineage, all available taxonomy/ontology, originating business process. Any alternative systems that may provide conflicting data. Information regarding quality alterations to the data assets.

Direct access when possible. Near-native or native formats preferred. Unconcerned if single or multiple access points are required. Profiling information. Often extracts data to a "personal" sandbox. Can use and often desires NLP capability.

Infrequent when highly complex but with many starts/aborts. Frequent when low complexity, but primarily to simply do fast retrieves to support "next steps." Access all data in native or near-native form.

Optimization tools should always be available. User rights to manage performance tuning.

Business analyst (journeyman)

Endpoint metadata with business glossary attached. Lineage expressed as verbiage or text. All metadata is searchable.

Prebuilt structure or semantic layer that provides apparent structure for easy access. If NLP is used, it is primarily applied to filling a "structure" with data.

Frequent, often confounded by the dissonance between their analysis design (report, query or even extract) and the processing requirements demanded by their own design.

Optimization expected even when their design is flawed. Actually prefer to retrieve data without integrity for further exploration even

Page 19: VersaCons€¦ · Web viewDevelop formal processes for seeking guidance from your solution architects and system designers for balancing delivery priority, budgets and personnel during

Table 1.   Data Infrastructure Expectations by User Category

User Category

Source/lineage metadata requirements

Data structure/format Data access expectation based on the frequency and complexity of processing

Performance and efficiency demand

Access unchanged data values in some type of common form, possibly even with some view/logic layers prebuilt.

when fallacious at the outset.

Casual (apprentice)

Endpoint metadata only —descriptions of prebuilt objects with abbreviated representations of where the data comes from and its overall purpose.

Regardless of what they assert, they always seek to deliver results with a structure or format.

Frequent, not complex. Almost always prebuilt. Access to predefined, cleaned and prepared data.

High-performance optimization — best possible at all times.

NLP = natural-language processing Source: Gartner (March 2017)

Note 4 What Is the Difference Between Service Levels and Use Cases? Use cases are circumstances, environments or context that are amplified, reported on or enhanced through the application of a new model or process. Service levels are expectations of performance, the inclusion of content or the management of access privileges to meet specific minimum thresholds for one or all of the defined components.

Note 5 Understanding Different Data Management Service-Level Requirements for Analytic Use Cases

Page 20: VersaCons€¦ · Web viewDevelop formal processes for seeking guidance from your solution architects and system designers for balancing delivery priority, budgets and personnel during

Compromise: Using repositories to deliver high performance, predictable, prequalified and possibly summarized data to the least data-savvy analytics users will remain a significant demand. These repositories are compromise data models — they compromise on sourcing, compromise on data governance, quality and even more. The repositories do not necessarily have to be a singular data store, but access across many stores should be facilitated with some type of standardized access tier, language or shared infrastructure. Remember that casual/apprentice users can include your consumer and customer base doing simple reports on your portal. All analysts (even data scientists) will use this part of the solution and should have access and provide input to its design. This is easily deployed using a variety of Mode 1 approaches. Contender: Virtualized data access tiers that present views or cached data are popular with business analysts who understand business process but are not quite comfortable with data architecture and infrastructure issues. These are contending models of the data, contending use cases, and often vary from one community of users to another or even within a single community. As such, they are not able to "compromise." Most casual users will not venture into this SLA. Virtualization is not the only solution available (see "Market Guide for Data Virtualization" ). Candidate: Users with advanced knowledge of data sourcing, overall data architecture and business processes often seek data as near to native format and structure as possible. Data engineers and scientists actually develop candidate structures and semantics for even reading the data at hand — even before they build their analysis models and sometimes changing the structure of the data as a result of those analytic models. Data engineers and data scientists utilize all three SLAs equally, but here they often prefer to use their own platforms and languages, and your infrastructure and organizational structure must support that. As a result, distributed processing solutions (e.g., Search, Hadoop) are popular because these solutions allow the users to restructure data with regard to form, but also integrity (or not) and even temporal/event relationships. Candidate SLAs are clearly Mode 2 approaches.

Note 6 How Does the Traditional BICC Evolve Toward the ACE? In business intelligence and advanced analytics organizations that have reached Level 3 maturity or higher (see "ITScore for BI and Analytics" ), there is some form of a business intelligence competency center (BICC). The BICC usually consists of differing levels of users, some representation of business analyst user manager, supporting database administrators and tool experts from IT, and a representative from the architecture team. It is not necessarily even a permanent body but more of a concept — often increasing activity when new iterations or phases of the BI solution are being delivery and then decreasing activity in-between major projects. The BICC effectively works through how to build and deploy a data warehouse, how to source and integrate data, and creates standards for reporting and guided analysis deployment. Unfortunately, this has often led to "Mode 1 only" deployment (stable, repeatable, predictable in level and frequency of updates/change). While not strictly a "waterfall" deployment environment (in which all requirements are developed and confirmed prior to any development and then

Page 21: VersaCons€¦ · Web viewDevelop formal processes for seeking guidance from your solution architects and system designers for balancing delivery priority, budgets and personnel during

scope is strictly enforced, neither shrinking nor growing), it does create many of the same issues that emerge within waterfall. Since data warehouses and BI are anticipated to always expand and grow, they more appropriately follow an "iterative" approach — effectively a hybrid between waterfall and "agile" delivery methods. This creates great consternation in both agile developers (who accuse iterative of being waterfall) and waterfall developers (who accuse iterative of being agile). The truth lies in-between — a modern analytics environment recognizes that some development techniques must be more flexible and reactive (Mode 2) while others must exhibit predictability and stability (Mode 1), which introduces a demand for a modified organizational structure to support this bimodal delivery. The Analytics Center of Excellence (ACE) thus emerges with an understanding that Mode 1 still has value when the service-level expectations are met with such solutions. But the ACE now has the responsibility of determining Mode 1 and Mode 2 policies and solution choices; meanwhile, the BICC, which was more traditionally focused, gradually disperses.

© 2017 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. or its affiliates. This publication may not be reproduced or distributed in any form without Gartner's prior written permission. If you are authorized to access this publication, your use of it is subject to the Usage Guidelines for Gartner Services posted on gartner.com. The information contained in this publication has been obtained from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information and shall have no liability for errors, omissions or inadequacies in such information. This publication consists of the opinions of Gartner's research organization and should not be construed as statements of fact. The opinions expressed herein are subject to change without notice. Gartner provides information technology research and advisory services to a wide range of technology consumers, manufacturers and sellers, and may have client relationships with, and derive revenues from, companies discussed herein. Although Gartner research may include a discussion of related legal issues, Gartner does not provide legal advice or services and its research should not be construed or used as such. Gartner is a public company, and its shareholders may include firms and funds that have financial interests in entities covered in Gartner research. Gartner's Board of Directors may include senior managers of these firms or funds. Gartner research is produced independently by its research organization without input or influence from these firms, funds or their managers. For further information on the independence and integrity of Gartner research, see "Guiding Principles on Independence and Objectivity."