getting data science with r and arcgis - esri€¦ · the data scientist’s toolbox spatial...

Post on 05-Jun-2020

27 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

GettingDataSciencewithRandArcGISShaunWalbridge

MarjeanPobuda

DataScience

DataScienceAmuch-hypedphrase,buteffectivelyisabouttheapplicationofstatisticsandmachinelearningtoreal-worlddata,anddevelopingformalizedtoolsinsteadofone-offanalyses.Combinesdiversefieldstosolveproblems.

DataScienceWhat’sadatascientist?

“Adatascientistissomeonewhoisbetteratstatisticsthananysoftwareengineerandbetteratsoftwareengineeringthananystatistician.”—JoshWills

DataScienceUsgeographicfolksalsorelyonknowledgefrommultipledomains.Weknowthatspatialismorethanjustanxandycolumninatable,

andhowtogetvalueoutofthisdata.

DataScienceLanguagesPython(SciPystack,Jupyter,scikit-learn,…)

C++(Tensorflow,Shark,MLC++)Java(SparkMLlib,Weka)R( )

Manyworkflowsrequirecombiningcomponentsfrommultipleenvironments.

MLtaskview

DataScienceJobs

R4Stats,DSjobreport

Industrystandardforpackagemanagementinthedatasciencecontext,builtby

StartedwithPython,butasshownintheRsegmentoftheplenary,itcanbeusedtosupportR,andhybridworkflowswhichconnectmultiplelanguages.

TechnologypartnerofEsri,haveatalktomorrow:

Thurs10:30AM,MesquiteG-H

ContinuumAnaltyics

ExploringContinuumAnalytics’Open-SourceOfferings

R

Esriand ?IntegrationviaArcGIS–RbridgeJoined andMoretocome—GIShashistoricallybeenmorecoupledwithPython

RConsortium RFoundation

Why ?Powerfulcoredatastructuresandoperations

Dataframes,functionalprogrammingUnparalleledbreadthofstatisticalroutines

ThedefactolanguageofStatisticiansCRAN:6400packagesforsolvingproblemsVersatileandpowerfulplotting

WeassumebasicproficiencyprogrammingSeeresourcesforadeeperdiveintoR

RDataTypesyou’reusedtoseeing…

Numeric-Integer-Character-Logical-timestamp

…butothersyouprobablyaren’t:

vector-matrix-data.frame-factor

Datatypes

DataFramesTreatstabular(andmulti-dimensional)dataasalabeled,indexedseriesofobservations.Soundssimple,butisagamechangerovertypicalsoftwarewhichisjustdoing2Dlayout(e.g.Excel)

DataTypes#Createadataframeoutofanexistingsourcedf.from.csv<-read.csv("data/growth.csv",header=TRUE)

DataTypes#Createadataframefromscratchquarter<-c(2,3,1)person<-c("Goodchild","Tobler","Krige")

met.quota<-c(TRUE,FALSE,TRUE)df<-data.frame(person,met.quota,quarter)

DataTypesR>dfpersonmet.quotaquarter1GoodchildTRUE22ToblerFALSE33KrigeTRUE1

0D:SpatialPoints1D:SpatialLines2D:SpatialPolygons3D:Solid4D:Space-time

spTypesEntity+Attributemodel

R—ArcGISBridge

R—ArcGISBridge

R—ArcGISBridge

ArcGISdeveloperscancreatetoolsandtoolboxesthatintegrateArcGISandRArcGISuserscanaccessRcodethroughgeoprocessingscriptsRuserscanaccessorganizationsGIS’data,managedintraditionalGISways

https://r-arcgis.github.io

R–ArcGISBridgeStoreyourdatainArcGIS,accessitquicklyinR,returnRobjectsback

toArcGISnativedatatypes(e.g.geodatabasefeatureclasses).

Knowshowtoconvertspatialdatatospobjects.

PackageDocumentation

Demo:GettingStarted

ArcGISvsRDataTypesArcGIS R ExampleValue

AddressLocator

Character AddressLocators\\MGRS

Any Character

Boolean Logical

CoordinateSystem

Character "PROJCS[\"WGS_1984_UTM_Zone_19N\"...

Dataset Character "C:\\workspace\\projects\\results.shp"

Date Character "5/6/20152:21:12AM"

Double Numeric 22.87918

ArcGISvsRDataTypesArcGIS R ExampleValue

Extent Vector(xmin,ymin,xmax,ymax)

c(0,-591.561,1000,992)

Field Character

Folder Character fullpath,usewithe.g.file.info()

Long Long 19827398L

String Character

TextFile Character fullpath

Workspace Character fullpath

AccessArcGISfromRStartbyloadingthelibrary,andinitializingconnectiontoArcGIS:

#loadtheArcGIS-Rbridgelibrarylibrary(arcgisbinding)#initializetheconnectiontoArcGIS.OnlyneededwhenrunningdirectlyfromR.arc.check_product()

AccessArcGISfromRFirst,selectadatasource(canbeafeatureclass,alayer,oratable):

Then,filterthedatatothesetyouwanttoworkwith(createsin-memorydataframe):

ThiscreatesanArcGISdataframe–lookslikeadataframe,butretainsreferencesbacktothegeometrydata.

input.fc<-arc.open('data.gdb/features')

filtered.df<-arc.select(input.fc,fields=c('fid','mean'),where_clause="mean<100")

AccessArcGISfromRNow,ifwewanttodoanalysisinRwiththisspatialdata,weneedittoberepresentedasspobjects.arc.data2spdoestheconversion

forus:

arc.sp2datainvertsthisprocess,takingspobjectsandgeneratingArcGIScompatibledataframes.

df.as.sp<-arc.data2sp(filtered.df)

AccessArcGISfromRFinishedwithourworkinR,wanttogetthedatabacktoArcGIS.Writeourresultsbacktoanewfeatureclass,witharc.write:

arc.write('data.gdb/new_features',results.df)

AccessArcGISfromRWKTtoproj.4conversion:

Interactingdirectlywithgeometries:

Geoprocessingsessionspecific:

arc.fromP4ToWkt,arc.fromWktToP4

arc.shapeinfo,arc.shape2sp

arc.progress_pos,arc.progress_label,arc.env(readonly)

DataSciencewithR

HadleyStackDeveloperatRStudio,ProfessoratRiceUniversityggplot2,scales,dplyr,devtools,manyothersNew,incollaborationwithWesMcKinney:

HadleyWickham

feather

StatisticalFormulas

DomainspecificlanguageforstatisticsSimilarpropertiesinotherpartsofthelanguage

formodelspecificationconsistency

fit.results<-lm(pollution~elevation+rainfall+ppm.nox+urban.density)

caret

LiterateProgramming

packages:RMarkdown,Roxygen2Jupyternotebooks

Ibelievethatthetimeisripeforsignificantlybetterdocumentationofprograms,andthatwecanbestachievethisbyconsideringprogramstobeworksofliterature.—DonaldKnuth,“LiterateProgramming”

DevelopmentEnvironments

néeIPython

Bestofclasstoolsforinteractingwithdata.

RToolsforVisualStudio

dplyrPackageBatting%.%group_by(playerID)%.%summarise(total=sum(G))%.%arrange(desc(total))%.%head(5)

Introducingdplyr

RChallengesPerformanceissuesNotageneralpurposelanguageLackspurelyUImodeofinteraction(e.g.plotsmustbemanuallyspecified)Programmeronly.Thereisshiny,butRisfirstandforemostalanguagethatexpectsfluencyfromitsusers

R–ArcGISBridgeDeepDive

BuildingRScriptTools

Demo:R-ArcGISbridge

HowToInstallInstallwiththeRbridgeinstallDetailedinstallationinstructions

WhereCanIRunThis?

WhereCanIRunThis?Now:

First, 3.1orlaterArcGISPro(64-bit)1.1orlaterArcMap10.3.1orlater:

32-bitRbydefault64-bitRavailableviaBackgroundGeoprocessing

ArcGISServer10.3.1+/ArcGISEnterprise

installR

What’sNext?CondaformanagingRenvironments

StartingatPro2.0,canbeinstalledasanyotherpackageRastersupport

Resources

RLookingforapackagetosolveaproblem?Usethe .

TonsofgoodbooksandresourcesonRavailable,checkouttheenginetofindresourcesforthelanguagewhichcanbe

difficulttolocatebecauseofthename.

CRANTaskViews

RSeek

RPackagesbyHadleyWickham

SpatialR/DataScience Afreeand

accessibleversionoftheclassicinthefield,ElementsofStatisticalLearning.

AnIntroductiontoStaisticalLearning(PDF) website

GettingStartedinDataScience

ArcGIS+RDemoof

CamPlouffe(EsriCA)ranan ,coversmaterialsinmoredepth.

UCPlenaryDemo:StatisticalIntegrationwithRSSN:spatialmodelingonstreamnetworks

RArcGISWorkshop

MaterialsCourses:

Books:

KonstantinKrivoruchko(GAcreator)

Toobigtoprint.Tonsofusefulstuff,coversbothRandArcGISextensively.

HighPerformanceScientificComputingTheDataScientist’sToolbox

SpatialStatisticalDataAnalysisforGISUsers

PackagesClusteringdemocoversmclustandsp.

Tree-basedmodels,e.g.Timeseriesdata,e.g.

CARTLittleBookofR

RArcGISExtensions

CombinesPython,R,andMATLABtosolveawidevarietyofproblems

AnRflavoredlanguageforspatialanalysis

RArcGISBridgeMarineGeospatialEcologyTools(MGET)

GeospatialModelingEnvironment

ConferencesuseR2016isbeingheldJuly5-7inBrussels,Belgium

Manyhappeningaroundworld,someupcomingones:ODSCEastMay3-5inBostonODSCWestNov2-4inSanFrancisco

useR!Conference

OpenDataScienceConference(ODSC)

Closing

OutreachResourcesandoutreach–connectthedots,wantthistobeoutreachsowecanbuildupmoreR+ArcGISpeoplewhoaren’tascommonasourcorelanguagefolks.Futureoftheproject,questions

CommunityOpensourceproject,differentethosContributionsarethecurrency

Thatsaid,majoruptakeinthecommercialspace:MicrosoftR(boughtRevolutionAnalytics);RStudio

ThanksRteam:DmitryPavlushko,SteveKopp,MarkJanikas;today’sspeakers

GeoprocessingTeamContactUs

RateThisSessioniOS,Android:Feedbackfromwithintheapp

WindowsPhone,ornosmartphone?Cuneiformtabletsaccepted.

top related