inspire 2015 - alteryx: data blending: best practices

Post on 30-Jul-2015

49 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

#inspire15

Data Blending: Best Practices

Tuesday, May 19, 2015

Ben Gomez, Senior Product Manager, AlteryxDr. Poornima Farrar, Product Manager, Alteryx

#inspire15

Agenda

• Develop Workflows Effectively• Evaluate the Data• Sample the Data

• Develop Clear Workflows• Rename Fields• Simplify the Process

• Develop Efficient Workflows• Sort Data Sparingly• Organize Data Sources• Process Near the Data

#inspire15

Effective Workflow Development

#inspire15

Effective Workflows

Evaluate the Data

• Data problems can slow down your workflow development or give you invalid results• Duplicate records• Missing values• Unexpected characters• Invalid values or ranges

#inspire15

Demo – Field Summary

#inspire15

Effective Workflows

Sample the Data

Sample limits the data stream to a number, percentage or random set of records.

Random % Sample generates a random number or percentage of records passing through the data stream.

Oversample Field samples incoming data to ensure equal representation of data values.

#inspire15

Clear Workflows

#inspire15

Clear Workflows

Rename fields

#inspire15

Clear Workflows

Simplify the Process

How would one parse an email address? name@domain.com([^@]*)(@)([^\.]*)(.*)

#inspire15

Demo - Parsing

#inspire15

Demo - Data Macros

#inspire15

Efficient Workflows

#inspire15

Efficient Workflows

Sort Data Sparingly

• Sorting is an expensive operation.• Sorting is necessary for several operations.

• When sorting, the more data in each record, the longer the sort will take

• Alteryx holds onto a sort if possible.• Formula resets the sort.• Sorting by a new field resets the sort.

#inspire15

Demo - Sorting

#inspire15

Effecient Workflows

Gathering Data Sources

http://www.alteryx.com/technical-specifications#data-sources

#inspire15

Efficient Workflows

Configuring Data Sources Format Selection

Bulk Load

#inspire15

Efficient Workflows

Configuring Data Sources

#inspire15

Efficient Workflows

Processing Near the Data

• Private/Public Server• Amazon Redshift and S3• Marketo and Salesforce

#inspire15

Summary

• Evaluate and clean your data (Field Summary Tool)• Simplify your process when possible• Rename your fields• Control your sorts • Set data aside and rejoin it later• Best: Add a Record ID field early that can be used to rejoin records

later• More Advanced: Keep track of records and join by record position

• Create Input Macros• Keep your processing close to your data sources

THANK YOU!

#inspire15

top related