building tools for the hadoop developer matt winkler @mwinkle
TRANSCRIPT
BUILDING TOOLS FOR THE HADOOP DEVELOPER
matt winkler@mwinkle
Making Hadoop easy to provision, secure, manage and use
Windows Azure Parallel DataWarehouseAppliance
Windows Server Single NodeDeveloper
Environment
Authoring Jobs App Integration
Building Developer Tools
Core Hadoop
Consistent REST API’s
Breadth of Clients (Java, JS, .NET, etc)
Authoring frameworks and languages
End User Tooling (IDE’s, Analyst tools, Command lines)
ConnectivityProgrammabilitySecurityLoosely coupled
LightweightLow cost to extendScenario oriented
Innovation flows upward
New compute modelsPerf enhancements
Extend breadth & depthEnable new scenariosIntegrate with current tool chains
Existing Ecosystem Actively contributing to:
Core Pig Hive HCatalog
Branching to other projects Simple one-box developer install
on Windows
.NET
Map/Reduce LINQ to Hive Client API’s
WebHCat Ambari WebHDFS Azure
Visual Studio Tooling Local debugging support
JavaScript MRjs – Map/Reduce in JavaScript Node.js client API’s
WebHCat WebHDFS Ambari Azure
Management
UI Tooling Cluster usage Job authoring Result consumption in common tools
PowerShell & Cross platform scripting
API Surface RDFE – Azure provisioning Ambari – Cluster monitoring WebHCatalog – Metadata and job submission WebHDFS, Blob Storage – Storage
Management Portal
>_Scripting
(Windows, Linux and Mac)
REST API
Sources http://hadoopsdk.codeplex.com http://www.github.com/windowsazure
NuGet packages Microsoft.Hadoop.MapReduce Microsoft.Hadoop.Hive Microsoft.Hadoop.WebHDFS => WebClient
NPM packages Azure Azure-cli Hadoop REST clients pending…
open
Big Data. Small Data. All Data.
www.microsoft.com/bigdata
Learn More:February 289:05am Microsoft Keynote
by Dave Campbell
9:45am Microsoft Keynoteby Kate Crawford
10:40am Microsoft Sponsored Session
in Partnership with Ascribe
11:30am Dave Campbell’s Office Hours