data modeling in looker

10
Quick iteration of metric calculations for powerful data exploration

Upload: looker

Post on 07-Jan-2017

666 views

Category:

Technology


0 download

TRANSCRIPT

Quick iteration of metric calculations for powerful data exploration

2 Data Modeling in Looker White Paper 3

At Looker, we want to make it easier for data analysts to service

the needs of the data-hungry users in their organizations. We

believe too much of their time is spent responding to ad hoc data

requests and not enough time is spent building, experimenting,

and embellishing a robust model of the business. Worse yet,

business users are starving for data, but are forced to make

important decisions without access to data that could guide them

in the right direction.

Looker addresses both of these problems with a YAML-based

modeling language called LookML. With LookML, you build your

business logic, defining your important metrics once and then

reusing them throughout a model. That means you can unleash

them on your business users to manipulate, iterate, and transform

in any way they see fit.

The Reusability Paradigm of LookML

E-Commerce Example — Starting with Total Cost of Order

A key difference of LookML is that, unlike older approaches, LookML combines modeling, transformations, and derivations at the same layer (late-binding modeling). This allows vast amounts of data to be captured in relatively inexpensive databases (mirrored or copied), and then derivations and transformations occur

much closer to, or at, query time. The traditional approach is to transform the data as it’s loaded (ETL), whereas LookML allows for transform and derivation on demand (ELT). The result is a very agile data environment where user questions can change and the data environment can better keep up.

Let’s take a look at a simple e-commerce example. We will create one dimension, the Total Cost of Order, which can then be reused and built on throughout a single LookML model.

First, a quick primer on a typical e-commerce data model, which will help answer questions about the buying and selling of items online. In this example, we’ll work with a

subset of tables: Orders, Order Items, and Inventory Items. As a business that tracks Orders, it’s probably important to determine the distribution of customer orders based on cost. In our current Orders table, we don’t have a field that tells us the cost of an order, because each order contains multiple items of varying costs. So we need to calculate a cost of an order by summing over the sale prices of the items in the order.

Ordersid created_at user_id

1 2014-04-01 5656

2 2014-04-01 7263

Order Itemsid created_at order_id Inventory_item_id sale_price

1 2014-04-01 5656 3 $12

2 2014-04-03 7263 5 $45

Inventory Itemsid created_at cost sold_at product_id

1 2014-04-01 $8.50 2014-04-05 5

2 2014-04-02 $24.00 2014-04-04 7

Measures and Dimensions in LookerLooker divides data exploration into dimensions and measures. A dimension is something you can group by, and a measure is an aggregated dimension (for example, a sum, an average, or a count).

4 5Data Modeling in Looker White Paper

Correlated SubqueriesIn a SQL database query, a correlated subquery (also known

as a synchronized subquery) is a subquery (a query nested

inside another query) that uses values from the outer query.

The subquery is evaluated once for each row processed by

the outer query.

Suppose we want to calculate a new dimension for Orders that will determine the Total Cost of Order. In this case, the field is not stored in our database, but can be calculated from the sale price of order items in the order. We’ll use is a simple technique called a correlated subquery. (For databases that don’t support correlated subqueries or when performance becomes a problem, Looker supports more complex and powerful mechanisms via derived tables.)

For any given order, the SQL to calculate the Total Cost of Order is:

SELECT SUM(order_items.sale_price) FROM order_items WHERE order_items.order_id = orders.id

We sum over the sale price associated for each item in a given order, where the order_items.order_id field matches with the primary key in the orders table. In Looker, we’d want to create this dimension in the Orders view, since it’s an attribute of an order.

- view: Order fields: - dimension: total_amount_of_order_usd type: number decimals: 2 sql:| (SELECT SUM(order_items.sale_price) FROM order_items WHERE order_items.order_id = orders.id)

We sum over the sale price associated for each item in a given order, where the order_items.order_id field matches with the primary key in the orders table. In Looker, we’d want to create this dimension in the Orders view, since it’s an attribute of an order.

- view: Order fields: - dimension: total_amount_of_order_usd type: number decimals: 2 sql:| (SELECT SUM(order_items.sale_price)

6 7Data Modeling in Looker White Paper

Tiering Total Cost of OrderWe now have a wide range of order amounts, so it probably

makes sense to bucket these values across set intervals.

Normally, if we were writing SQL, we’d have to make a CASE

WHEN statement for each discrete bucket. Conveniently,

LookML has a tier function, so we can use that.

Now let’s see this dimension in action.

- dimension: total_amount_of_order_usd_tier type: tier sql: ${total_amount_of_order_usd} tiers: [0,10,50,150,500,1000]

Notice that we can reference our existing Total Amount of Order dimension in the ‘sql:’ parameter of the measure. Now when we use the tier, we bucket orders into their respective tiers:

8 9Data Modeling in Looker White Paper

Determining Order ProfitWhat if we wanted to know more about each order, maybe

the profit? To determine the profit of an order, we will need a

Total Cost of Order dimension.

- dimension: total_cost_of_order type: number decimals: 2 sql:| (SELECT SUM(inventory_items.cost) FROM order_items LEFT JOIN inventory_items ON order_items.inventory_items_id = inventory_items.id WHERE order_items.order_id = orders.id)

In this case, our SQL sums over the cost of inventory items for a specific order.

Now, to determine the Order Profit dimension, we must subtract the Total Cost of Order dimension from the Total Amount of Order dimension. Normally, we’d have to subtract the SQL for the Total Cost of Order from the SQL for Total Amount of Order. But with LookML, we can just reference our already existing dimensions.

- dimension: order_profit type: number decimals: 2 sql: ${total_amount_of_order_usd} - ${total_cost_of_order}

When using this Order Profit, Looker will substitute the existing business logic for both the Total Amount of Order and Total Cost of Order. Let’s run a new query using the new Order Profit dimension.

10 11Data Modeling in Looker White Paper

Calculating Profit Per UserAnother valuable metric for an e-commerce business may

be Profit Per User. In Looker, we can reference dimensions

or measures from other views. In this case, to determine the

Profit Per User, we’ll reference our Count measure from the

Users view as the denominator of a measure in the Orders

view, where the numerator is our Order Profit dimension. We

use the Count measure from the Users view to scope the

count with ‘users.’

- measure: profit_per_user type: number decimals: 2 sql: 100.0 *${order_profit}/NULLIF(${users.count},0)

Now we can see how our Profit Per User varies by every order dimension. In this case, we see how it varies by order date:

12 13Data Modeling in Looker White Paper

Creating an Average Total Amount of Order MeasureWhat if we wanted a measure that computes the Average

Total Amount of Order whenever we group by a dimension

in Looker? For instance, we might group by Average

Total Amount of Order in a certain Month, by orders from

customers in a certain State, or by the Lifetime Number of

Orders of a customer. When we create a measure in Looker,

we can reuse it in many different contexts.

Let’s first build our Average Total Amount of Order measure.

- measure: average_total_amount_of_order_usd type: average sql: ${total_amount_of_order_usd} decimals: 2

Again, we can reference our already existing Total Amount of Order dimension and set the dimension type as an average. Now when we use this dimension, it will aggregate over all total order amounts within that group, calculating the average.

Here we see how the Average Total Amount of Order varies by the Lifetime Number of Orders of customers and by the Week the order was created.

15White Paper14 Data Modeling in Looker

Creating Conditional Measures — First Purchase and Return We can also create measures that calculate Total Amount

of Order based on conditions of the order, such as whether

it was a customer’s first purchase or if a return customer

made the purchase. This way, we can determine how much

revenue was generated from new or returning customers.

It’s likely we have discrete teams focused on new user

acquisition and on current user retention, so it may be

important we break these revenues apart.

- measure: total_first_purchase_revenue type: sum sql: ${total_amount_of_order_usd} decimals: 2 filters: is_first_purchase: yes - measure: total_returning_shopper_revenue type: sum sql: ${total_amount_of_order_usd} decimals: 2 filters: is_first_purchase: no

Again, both of these measures—Total First Purchase Revenue and Total Returning Shopper Revenue—take advantage of our existing Total Amount of Order. We can now directly compare both types of revenue.

16 17Data Modeling in Looker White Paper

Putting It All TogetherGiven the dimensions and measures we’ve just created,

let’s build a report that shows us Total Returning Shopping

Revenue, Total First Purchase Revenue, Average Total

Amount of Order, and Average Order Profit, broken out by

the Total Amount of Order tiered and the Week in which the

order was created. To generate such a result set, we’d have to write nearly 200 lines of SQL.

Maybe this makes sense to write one time, but what if we want to look at this by a customer’s State instead of by order Week?

Or maybe we want to see Lifetime Number of Purchases by a customer, tiered?

Data Modeling in Looker

Ready to Love Your Analytics?Come see a live demo and schedule your free trial. Call 888-960-2331 or go to:

looker.com/demo

About LookerLooker is an inventive software company that’s pioneering the next generation of business intelligence (BI). We believe in bringing better insights and data-driven decision-making to businesses of all sizes. The company has fast become the catalyst that is creating data-driven cultures at hundreds of industry-leading companies such as Yahoo!, Gilt, Warby Parker and Sony.

Looker is based in Santa Cruz, CA Follow @lookerdata or visit www.looker.com

© 2015 Looker. All rights reserved. Looker and the Looker logo are trademarks of Looker Data Sciences, registered in the United States.. Other trademarks are trademarks of their respective companies. All services are subject to change or discontinuance without notice.

As you can see, all these reports can be generated, altered, and updated—without

the need to rewrite any SQL. In LookML, we abstract the essential business logic

once, then reference it within other dimensions and measures—allowing quick, rapid

iteration of data exploration, while also ensuring the accuracy of the SQL that’s

generated. If a business user wants a new tier, just add it to the dimension. If they

want to determine revenue from users with more than 10 purchases, just create a

new measure that sums total order amount and filters on customers with more than

10 purchases. Small updates are quick and can be made immediately available to

end users. That frees you up to define the new metrics that will take your business

to the next level.