magical methods for batch data processing
TRANSCRIPT
Potent Potions for Batch Data Processing
250,000 CAD files & rasters on mobile devices
Tip: Know Your Potions and Choose Wisely
Today’s Potions
1. Wildcards
2. Batch Deploy
3. Parent/child Workspaces
4. Parent/child Server Workspaces
Potion 1: One Wild<card>
Dataset
Multi-Dataset Picker
Multi-Dataset Picker
Multi-Dataset Picker
Multi-Dataset Picker
Shapefile MapInfo
Most rasters
DWG DGN SQLite
Dataset Wildcards
Extended glob syntax:
Symbol Matches
? Any single character
* Any sequence of zero or more characters
[chars] Any single character in chars.
[a-d] Any character between a and d inclusive
{a,b,...} Any of the sub-patterns a, b
/**/ 0 or more subdirectories
Time to brew Potion 1
Potion 1: Enticements
Wildcard Bulk Data Processing
Enticements ü Simple to set up
ü Can transform across file boundaries
- Needs memory & time
Potion 1: Pitfalls
Wildcard Bulk Data Processing Pitfalls
x Recovery from data errors difficult
x Feature Type vs File vs Format Issues
x No granular log x No ability to
parallelize
Potion 2: Batch Deploy
Batch Deploy Script Writer
Batch Deploy Script Writer
Batch Deploy Script Writer
Batch Deploy Script Writer
Batch Deploy Script Writer
Batch Deploy Script Writer
Time to brew Potion 2
Potion 2: Enticements
Batch Deploy Enticements
ü Simple to set up ü Runs quickly ü Can script via
command line ü Run on demand
Potion 2: Pitfalls
Batch Deploy Pitfalls
x Recovery from data errors difficult
x No granular log x Destination dataset
naming can be tricky
Potion 3: Parent/Child Workspaces
Parent/Child Workspace Ingredients
• Parent Workspace: – PathReader – WorkspaceRunner
• Child Workspace:
– FeatureWriter
Parent Ingredients
Parent Ingredients
Child Ingredients
Time to brew Potion 3
Potion 3: Enticements
Parent/Child Workspace Enticements
ü Separate transformation from workflow
ü Generate audit logs ü All authored within
Workbench
Potion 3: Pitfalls
Parent/Child Workspace
Pitfalls
x Not all writers can be used concurrently
x Slow to run each child workspace separately
x Recovery from data errors not easy if concurrent runs used
Potion 4: Parent/Child Server
Workspaces
Parent/Child Server
Workspace Ingredients
• Parent Workspace: – PathReader – FMEServerJobSubmitter – FMEServerJobWaiter
• Child Workspace:
– FeatureWriter
Parent Ingredients
Parent Ingredients
Parent Ingredients
Child Ingredients
Time to brew Potion 4
Potion 4: Enticements
Parent/Child Server
Workspace Enticements
ü Separate transformation from workflow
ü Generate audit logs ü All authored within
Workbench ü Make full use of
parallelism = FAST
Potion 4: Pitfalls
Parent/Child Server
Workspace Pitfalls
x Not all writers can be used concurrently
x Data needs to be accessible to Server Engines - Consider using Server Data Resources
x Craft your reload/audit plan
Summary ● Many ways to handle bulk data moves
● Choose your potion wisely - each has pluses and minuses
● FME Server is the most robust automation choice
Questions?
Batch processing tutorial: fme.ly/b59