best practices for sensor networks and sensor data...

59
Best practices for sensor networks and sensor data management Citation ESIP EnviroSensing Cluster (2014). Community Wiki Document, “Best practices for sensor networks and sensor data management”, Federation of Earth Science Information Partners. http://wiki.esipfed.org/index.php/EnviroSensing_Cluster. (wiki document accessed 12-1-2014). Contributors (as of December 2014) Each chapter has a lead editor who is responsible for periodically compiling comments and contributions into stable versions of this document which will be archived as PDF versions and can be found here. If you contribute to this document by editing or adding text, images or comments you agree to the use of that material in the regularly published PDF versions of this document. Please add your name to the list of contributors if you feel you made a significant contribution. Corinna Gries, University of Wisconsin-Madison, North Temperate Lakes LTER Don Henshaw, USFS Pacific Northwest Research Station, Andrews Forest LTER Scotty Strachan, University of Nevada, Reno Renee F. Brown, University of New Mexico, Sevilleta LTER & UNM Sevilleta Field Station Christopher Jones, National Center for Ecological Analysis and Synthesis Christine Laney, University of Texas at El Paso, Jornada Basin LTER Branko Zdravkovic, University of Saskatchewan Richard Cary, University of Georgia, Coweeta LTER Jason Downing, University of Alaska Fairbanks, Bonanza Creek LTER Adam Kennedy, Oregon State University, Andrews Forest LTER Mary Martin, University of New Hampshire, Hubbard Brook LTER Jennifer Morse, University of Colorado Boulder, Niwot Ridge LTER Fox Peterson, Oregon State University, Andrews Forest LTER John Porter, University of Virginia, Virginia Coast Reserve LTER Jordan Read, US Geological Survey Andrew Rettig, University of Cincinnati Wade Sheldon, University of Georgia,, Georgia Coastal Ecosystems LTER

Upload: others

Post on 14-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Best practices for sensor networks

and sensor data management

Citation

ESIP EnviroSensing Cluster (2014). Community Wiki Document, “Best practices for sensor

networks and sensor data management”, Federation of Earth Science Information Partners.

http://wiki.esipfed.org/index.php/EnviroSensing_Cluster. (wiki document accessed 12-1-2014).

Contributors (as of December 2014)

Each chapter has a lead editor who is responsible for periodically compiling comments and

contributions into stable versions of this document which will be archived as PDF versions and

can be found here. If you contribute to this document by editing or adding text, images or

comments you agree to the use of that material in the regularly published PDF versions of this

document. Please add your name to the list of contributors if you feel you made a significant

contribution.

Corinna Gries, University of Wisconsin-Madison, North Temperate Lakes LTER

Don Henshaw, USFS Pacific Northwest Research Station, Andrews Forest LTER

Scotty Strachan, University of Nevada, Reno

Renee F. Brown, University of New Mexico, Sevilleta LTER & UNM Sevilleta Field Station

Christopher Jones, National Center for Ecological Analysis and Synthesis

Christine Laney, University of Texas at El Paso, Jornada Basin LTER

Branko Zdravkovic, University of Saskatchewan

Richard Cary, University of Georgia, Coweeta LTER

Jason Downing, University of Alaska Fairbanks, Bonanza Creek LTER

Adam Kennedy, Oregon State University, Andrews Forest LTER

Mary Martin, University of New Hampshire, Hubbard Brook LTER

Jennifer Morse, University of Colorado Boulder, Niwot Ridge LTER

Fox Peterson, Oregon State University, Andrews Forest LTER

John Porter, University of Virginia, Virginia Coast Reserve LTER

Jordan Read, US Geological Survey

Andrew Rettig, University of Cincinnati

Wade Sheldon, University of Georgia,, Georgia Coastal Ecosystems LTER

Page 2: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Scope

This document on best practices for sensor networks and sensor data management provides

information for establishing and managing a fixed environmental sensor network for on or near

surface point measurements with the purpose of long-term or “permanent” environmental data

acquisition. It does not cover remotely sensed data (satellite imagery, aerial photography, etc.),

although a few marginal cases where this distinction is not entirely clear are discussed, e.g.,

phenology and animal behavior webcams. The best practices covered in this document may not

all apply to temporary or transitory sensing efforts such as distributed “citizen science”

initiatives, which do not focus on building infrastructure. Furthermore, it is assumed that the

scientific goals for establishing a sensor network are thought out and discussed with all

members of the team responsible for establishing and maintaining the sensor network. i.e.,

appropriateness of certain sensors or installations to answer specific questions is not discussed.

Information is provided here for various stages of establishing and maintaining an environmental

sensor network: planning a completely new system, upgrading an existing system, improving

streaming data management, and archiving data.

Below are chapters of a living document to which contributions can be made by anybody

interested in this subject. Please post questions, answers, experiences with particular

software/hardware/setup, comments, additions, edits, resources, and publications. Please use

common online etiquette. If conflicting views arise they should be discussed in the

EnviroSensing e-mail list.

Page 3: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

backtoEnviroSensingClustermainpage

Contents1PlanningProcess2ImplementationFeasibility3AssemblingtheTeam4OverviewofChapters

PlanningProcessThroughoutthisdocumentitisemphasizedthattheinitialplanningisextremelyimportant,aswellastheinclusionofexpertiseinmanydifferentareas,scientificandtechnical,intheearlydiscussionandplanningphasebeforeaproposaliswritten.Ifallfieldsofexpertisearenotconsulted/incorporatedpriortomakinglocation,budget,deployment,andtimelinedecisions,criticalinterdependenciesarelikelytobeoverlooked(e.g.powerrequirements,topographicconstraints,constructiontoolsrequired,etc.).

Although,thediscussionhereisgearedtowardmaintainingsensornetworksoveranextendedperiodoftime,planningisequallyimportantforshortterminstallations.Experiencehasshownthatmanyshortterminstallationshavebecomelongtermevenifthatwasnotintendedinitiallyandmanysmallinstallationshavebeenexpandedtocovermoreareaormeasuremoreparameters.

Clearly,sensornetworkdeploymentsaredrivenbyambitioussciencequestions.However,goodplanningcanhelpanticipatelimitationsandpreventtimeissuesfrombecomingthedrivingforce.Focusingontheoverarchingimperativesofgooddesign,properplacement,organizeddataflow,andawelltrainedandmotivatedteam,willresultinsuccessfulimplementationandcontinuedmaintenance.Compromisedinstallationsdiminishtheimpactsoftheoriginalstudy,candrainoperatingbudgetsunnecessarily,andinhibitleveragingofthescienceforfutureworkandfunding.

ImplementationFeasibilityDuringtheexperimentorprojectdesignphase,definingtheprimarymeasurementobjectiveisthefirststeptoplanninganobservationsiteandplatform.Answeringthesegeneralquestionsishelpfulbeforeaddressingspecifictechnicalissues:

Whereisthegeographicareaofinterest?Whatarethemeasurementsofinterest?Whatisthedesiredaccuracyandfrequencyofmeasurement?Howcriticalarethesensormeasurements?Candatagapsbetolerated?Issensorredundancynecessary?Whattypeofexperimentalmanipulationisdesired(ifany)?Whattypesoflocalizedtopographyarelikelytoyield“representative”measurementsatthetimefrequencyofinterest?Whatisthetotalfundingamountforpersonnel,travel,tools/equipment,fees,andscienceinstruments?Whatistheexpectedscope/lifetimeofthedeployment?Willitbeexpandedinthefuture?Considerscalingpossibility(moresites,moresensors)evenifitisnottheimmediategoal.Evaluatecommercialturnkeyinstallationsvs.systemsdevelopedfromcommercialoropensourcecomponents.Considerations:cost,skills,maintenance,longevityofthecompanyprovidingthewholesystemoreachcomponent,functionality,interoperability,accesstocontinuedsupport.

Page 4: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

AssemblingtheTeamSeveralverydifferentareasofexpertisearerequiredtosuccessfullyplan,install,andmaintainsensingsystems.Someoftheseroles/skillsetscancertainlybeprovidedbyasingleindividualorindividuals.

Roleswithinateamestablishingandmaintaininganenvironmentalsensornetwork:

Scientist-determinesthetypeofdataandsamplingfrequencyneededtoanswerthescientificquestionswithinbudgetlimitations.

Sensorsystemexpert-knowsthetypesofsensorsandplatforms,theirinstallationandprogrammingneededtoanswerscientificquestions.IsfamiliarwithspecificclimateandterrainissuesandQA/QCapproachesinthefield.

Fieldlogisticsexpert(formajorsiteconstruction)-familiaritywithtransport,construction,weather,tools,andsuppliesforconstruction

Fieldconstructionandfabricationexpert -understandsconcrete,metalstructure,towerdesign,fencing,underwateranchors,floatingdevices,loadestimates

Fieldworkers/assistants-manypeopleareneededforremoteconstructiontasks,sensorwiring,initialsitesetup,cablemanagement

Fieldtechnician-familiarwithmaintenancetasksincludingminorrepairs,maintainingacalibrationschedule,otherregularsensormaintenancetasks.Fieldtechniciansneedtohaveagoodunderstandingofthescienceapplicationandtheenduser,theyneedtobecomfortablewithtechnology,andapplyingknowledgefromoneareatoanother,havecreativeproblemsolvingandcriticalthinkingskillsandpayattentiontodetail.Theyshouldhavebasicelectricalandmechanicalknowledge(e.g.,multimeteruse,basicequipmentinstallation,repairandprogramming).Dependingonsiteconditionstheyalsoneedtobecertifiedintower/rock/treeclimbing,boathandling,SCUBAdiving,respectivesafetytraining,andenjoyskiing,hiking,off-roaddrivingetc.plusneedtobeskilledinGPSorienteering,navigation,andbasicmapmaking.

Communications/datatransportexpert/LicensedCommercialradiooperator(ideal,butnotrequired)-needstobefamiliarwithmovingdigitaldataoverwiredorwirelessnetworksfromremotepointstoprojectserversandshouldhavebasicknowledgeofradiocommunication(e.g.,technician-levelamateurradiolicense,basicantennatheory,IPnetworking)

Networkadministrator/Systemadministrator-isresponsiblefornetworkarchitecture,redundancyofsystemsfromdatacentertofieldsites,backup,datasecurity

Softwaredeveloper-skillsinpreferredprogramminglanguage

Datamanager-needstobefamiliarwithmeansofdocumentingproceduresformaintainingcommunicationbetweenallrolesinvolved,specifically,meansfordocumentingfieldeventsandtheirramificationforthedataquality.Needstoknowapproaches/softwareformanaginghighfrequencystreamingdata,standardQA/QCroutinesforsuchdata,approachestodocumentingdataprovenanceanddataarchiving(spacerequirements,backup,storageofdifferentQ/Clevels)andhavedatabase/softwarepackageprogramming/configurationexpertise

Datatechnician-needstobethoroughandreliableduringtaskslike‘eyeon’qualitycontrol,manualdataentryetc.

OverviewofChapters

Page 5: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Thefollowingchapterscontainedinthisguidearestructuredtoprovideageneraloverviewofthespecificsubject,anintroductiontomethodsused,andalistofbestpracticerecommendationsbasedonthepreviousdiscussions.Casestudiesprovidespecificexamplesofimplementationsatcertainsites.

SensorSiteandPlatformSelectionconsidersenvironmentalissues,siteaccessibility,systemspecifications,sitelayout,andcommonpointsoffailure.DataAcquisitionandTransmissionisconcernedwiththeacquisitionofsensordatafromthefield,whileensuringtheintegrityofthosedata.Also,remotecontrolofthesystem.SensorManagementTrackingandDocumentationoutlinestheimportanceofcommunicationbetweenfieldanddatamanagementpersonnelasfieldeventsmayalterthedatastreamsandneedtobedocumented.StreamingDataManagementMiddlewarediscussessoftwarefeaturesformanagingstreamingsensordata.SensorDataQualitydiscussesdifferentwayssensordatamaybecompromised,howtoautomaticallycontrolforitinthedatastream..SensorDataArchivingintroducesdifferentapproachesandrepositoriesforarchivingandpublishingdatasetsofsensordata.

Page 6: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Sensor Site and Platform Selection

Considers environmental issues, site accessibility, system

specifications, site layout, and common points of failure.

Page 7: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

backtoEnviroSensingClustermainpage

Fig.1Typicalenvironmentalsensordeploymentwithscience,support,andcommunicationsystems.Photo2013ScottyStrachan,NevCANSheepRangeBlackbrushstation

Contents1Contacts2Reviewers3Overview4Introduction5Methods

5.1Environmentalconcerns5.2Siteaccessibility5.3Scienceplatformselection5.4Supportsystemspecification5.5Sitelayout

6BestPractices6.1CommonPointsofFailure

7CaseStudies

ContactsTheleadeditorsforthispagemaybecontactedforquestions,comments,orhelpwithcontentadditions.

ScottyStrachan-DepartmentofGeography,UniversityofNevada,Reno-scottyatdayhike.netAdamKennedy-AndrewsForestLTER-adam.kennedyatoregonstate.edu

ReviewersThispagewasreviewedby:

JasonDowning,BonanzaCreekLTERInformationManager,on5/2/2014RichardJasoni,AssociateResearchEcologist,DesertResearchInstitute,on6/30/2014

Overview

Page 8: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Selectionofexactlywhereandhowtoacquiredataviain-situsensingeffortsisacrucialpointinthescienceprocesswhereenvironmentalresearchisconcerned.Decisionsmadewhenchoosingsites,sensorpackages,andsupportinfrastructureinturnplaceboundariesonwhatthefinalsciencedeliverablescanbe.Datatypes,quantity,andqualityaremoreorlesssetinstoneduringthisprocess.Initialcosts,timeframes,andsustainabilityarealsodeterminedbythesechoices.Selectionsneedtobemadebasedonthedesiredscienceproducts,butalsoinconsiderationofawidearrayofvariablesincludinglandownership,access,equipmentbudget,long-termmaintenancecapability,previousresearch,andconstruction/deconstructionlogistics.

Settingupterrestrialsensingsystemsisamajorinfrastructure/personnelcommitmentwithbudgetaryandenvironmentalconcerns,andeveryefforttowardsmaintainingarobust,low-impact,andlong-termdatastreamshouldbemade.Becauseeachregionpossessesuniquegeography,thereisno“onesizefitsall”solution.Instead,aseriesofdecisionsneedstobemade,withthegoalsandcapabilitiesoftheresearchteamdefinedinthecontextofclearly-articulatedsciencequestionsandobjectives.

Introduction

Fig.2Progressionofworkinselectingasiteanddesigningasciencedeployment.

Identifyingboththedeploymentstrategy(site,process)aswellasthephysicalhardware(sensorplatformsandsupportinfrastructure)forenvironmentalsensingisusuallyadauntingtask.Akeyobjectiveoftheresearchteamshouldbetokeepthesciencecontextinviewduringthisprocess,aslogisticrealitieswilloftenclashwith"ideal"scientificconditions.Veryoftenthedecisiontreeforchoosingexactlocationsanddeploymentschemesisdependentoninteractingfactors(suchaspermitting/geography/access;Fig.2).Thereisalsoavastarrayofpossiblesensor/hardwarepackagesavailableforamultitudeofscienceapplications.

ItiscriticalthatPrincipalInvestigators(PI’s),logisticaltechs,andsensorspecialistsworktogethertodevelopspecificdeploymentplansandalternatives,ideallyinthepre-proposalstage.Planningtopicsmustincludescienceobjectives,operatingbudgets,proposedlocations,seasonalweatherpatterns,powersources,communicationsoptions,landownership,distancefrommanaginginstitutions,availablepersonnel/expertise,andpotentialexpansion/future-proofing.Allofthesecategoriesareequallycriticalfordiscussionasproposedinstrumentationprojectsmovetowardsimplementation.

MethodsSitevisits,permit/agreementnegotiations,equipmentspecifications,anddeploymenttimelinesneedtobe

Page 9: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

initiatedconcurrentlybecauseallphasesofdeploymentareinterdependent(Fig.2).TheP.I.,togetherwiththetechnicalpersonnel,shouldidentifysitesforsensorandequipmentdeploymentbasedonscienceneeds,localtopography,permit/agreementavailability,logisticalaccess,andavailabilityofservices(suchaspowerandcommunications).Portionsoftheplan(suchassomepurchasingdecisions)shouldremainflexibleuntiltheprecisesites,permits/agreements,anddataflowplanhavebeenpositivelydetermined.

Environmentalconcerns

Environmentalconditionshaveconsiderablebearingonscienceapplication,platformdesign,constructionlogistics,accessrestrictions,equipmentreliability,andmaintenancecost/longevity.Conditionsforinsitusensingcanvarytremendouslyfromregiontoregion;therefore,siteandequipmentselectionmustbeconcideredonacase-by-casebasis.

Localtopographicvariablesinclude:northernversussouthernexposure,whichcanaffecthoursofdirectsunlightandsnowpersistence;andvalley/sinkversusridgelinesettings,whichcanaffectdailytemperaturecycleandwindcharacteristics.Thedifferencesinairflow,windexposure,coldsinks,snowdrifts,skyexposureforsolarpanels,andpossibleradio/communicationspathwaysareallimportantvariableswhenselectingasiteandwhattypeofequipmentwillbedeployed.

Dominantvegetationconditionsandpotentiallong-termgrowthcanaltersensorreadingsviashadingeffects,affectingtemperature,radiation,andsnow-relatedmeasurements.Radiocommunicationsarealsoaffectedbyvegetation,withmostmicrowavefrequenciesusedbyhigh-speeddataradiosbeingstronglyattenuatedbytreesandbrush.Vegetationcanalsobealong-termhazardintheformsoffirefuelsanddeadfalls.

Visibilityandthevisualimpactofdeploymentsshouldbeconsideredforbothsecurityandaestheticconsiderations.Sometimesreductionofvisualimpactisrequiredbylandowners,butingeneralitissimplygoodpractice.Metalstructurescanbecamouflagedwithpainttoreducevisibility,structureheightsmaybereducedtoblendwithvegetation,andgrounddisturbancecanbekepttoaminimumtoavoidbiasingcertaintypesofmeasurementsanderosion.

Dominantweatherconditionsdeterminewhatlevelsofseasonalaccessareavailable,whatstructuraldesignsshouldbeused,andwhatsortofequipmentshouldbepurchased.Extremetemperatures,tropicalstorms,lightning,snowdepth,riming/ice,UVexposure,highhumidity,windspeeds,saltwaterexposure,flooding,andstreamdepthvariationareallexamplesofconditionswhichwillinfluencedesignanddeploymentplans.

Wildlifecanprovidehazardconsiderationsorbeaffectedbyproposeddeployments.Birdperchingandflightpaths,cattle,soilinvertebrates,rodents,andlargemammalscanalldisturborbeaffectedbysensorsandequipmentinstalledinthefield.Landownerswillhaveregulationsorpreferencesconcerningthesefactors,andproactivestepsarenecessaryonthepartofthescienceteamtominimizethesehazards.

Sensitivitytolocalpoliticalandsocialissuesneedtobeconsidered,asobjectivesciencedatashouldconstructivelyservethelocalpopulationsaswellasthescientistsandfundingagencies.

Sitesecurityisaprimaryconcernwhenplanningtodeploysensorsandequipmentintothefield.Humantheft/vandalismisapotentialcauseofsensordisturbanceorfailure.Whileremotedeploymentsarenearlyimpossibletosecurephysically,measuressuchascamouflaging,informativesigns,fencing,andlockboxesmaybeemployedtomitigatehostileorirresponsiblepassers-by.

Hazardstosensorsincludenaturaldisturbance/disasterssuchaswildfire,flooding,extremewinds,andmasswasting.Plannersshouldbeawareofallthesepossibilitiesandatleastexaminethelikelihoodsoftheseeventatsiteswhichhavebeenevaluatedfromthescientificpointofview.

Siteaccessibility

Page 10: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Fig.3Seasonalaccessmayvaryhighlydependingonlocation,limitingthetypesofmaintenancepossibleatanygiventime.

Locationsforin-situsensingmustbeaccessedfordatacollection,survey,construction,andmaintenanceoverthelifeoftheproject.Seasonalconditions,roads,andtopographydeterminewhattypesofaccessmaybeusedduringdifferenttimesoftheyear.Categoricalconsiderationsinclude:

Vehicularaccess.Commercialvehicle/equipment,2WDauto,4x4truck,ATV,snowmachine,boat,helicopter.Non-motorizedaccess.Hiking,skiing,packanimals,snowshoeing.Accessimprovements.Roadbuilding,trailbuilding,traildemarcation,safetyrails,harnessanchorpoints.Seasonalaccess.Defineaccessbyspring/summer/fall/winterseasons.Thisisdirectlyrelatedtolocalweather/topographicalconditions.Constructionaccess.Heavyequipment,specialequipment,heavyloads,andheavyfoottrafficarealllikelypossibilitiesdependingonmonitoringdesign.Minimalimpactconsiderations.Cantraffic/accessbedirectedinawaytominimizeenvironmentalimpact(e.g.erosion,vegetation)?Solutionsincludeboardwalks,bridges,raisedsteps,delineatedpathways.

Scienceplatformselection

Fig.4Scienceinstrumentationspecificationmustbedrivenbysciencequestionsandenvironmental/logisticalconstraints.

Oncethesciencequestionshavebeenestablishedandsiteconditionsareknown,anitemizedlistofsensorandsupportsystemplatforms/hardwaremaybeassembledthatbestfitstheapplicationandbudget.Primaryconsiderationsincludereliability,comparabilitywithothersimilarfieldsystems,technological

Page 11: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

(e.g.programming)requirements,budget,andsystemflexibility(e.g.upgrades,expansion,telemetryoptions).Accuracy,precision,andexpectedperiodofusepriortocalibrationorreplacementmayalsobeaconsideration.Insomefieldsofstudy,thereareonlyoneortwoalternativestochoosefromintermsofscientificinstrumentation,whereasinotherstherecanbemanychoices.Optionscanbenarrowedbyresearchingwhatequipment/standardsareusedbyexistinginstallationstowhichcomparabilityisdesired.Onceadataacquisitionplatformandsensorarrayischosen,remainingsupportsystemsarethendesignedaroundthiscoreequipment.

Supportsystemspecification

Thesubsystemsofinfrastructure,electricalpowersupply,anddatacommunicationsshouldallbedesignedtobestsupportthescienceplatformsinallseasonsoverthelongterm.Whilesomevendorsoffer“all-in-one”packagessuppliedwithstandardinstrumentation,itisbestfortheresearchteamtoassesswhetherthesesolutionsareadequatefortheirchosensiteandobjective.Quiteoftenseveralsciencequestionsarebeingaddressedinlargerdeployments,andmultiplehardwaresolutionsfromseveralvendorsmustbecombinedintoonedeployment.Thesupportsystemsshouldbespecifiedandscaledappropriately.

Physicalinfrastructures–thesearethebuilding-blocksofanyremotedataacquisitionsite,includingtripods,towers,poles,buoys,solarpanelracks,storageboxes,fencing,concretepads,andthelike.Quiteoftenasingletripodortowerdoesnothaveadequatespaceorstructuralintegritytosupportallofthesensors,antennas,solarpanels,batteries,andotheritems,soatypicalsitedesignincorporatesmultiplestructuralcomponents.

Powergenerationandstorage–forsustaininglong-termreliabledatastreams,powerindependenceiscritical.Stationsshouldbecapableofgeneratingandstoringtheirownpowerlocally,aswellastakingadvantageofanygridorotheravailablepowerthatiswithinbudgetanddesigncriteria.BecausethemajorityofrelatedelectronicsareultimatelypoweredbyDCvoltage,havingapowergenerationsystemandDCbatterybankforeverysite(andsometimesdiscretesubsystems)isrecommendedtominimizethelossofpowerandtheresultingdatagaps.IndependentgenerationsourcesaremostcommonlyPVarrays(solar),wind,orwaterturbines.Forreasonsofcost,reliability,andmaintenanceissues,PV(solar)isrecommendedastheprimaryon-sitegenerationsourceifenvironmentalconditionsallow.Incorporatingsimplicity,redundancy,andexcesscapacityisimportantforlong-termreliability.

Datacommunications–Useofreal-timecommunication(inadditiontolocalstoragecapacity)isdesirableinordertotransmitdata,monitorsystemhealthperformance,troubleshootproblems,andminimizedatagaps.Thisisusuallyperformedusingradiocommunications(whethervendor-specificorbuildingageneral-purposefieldIPnetwork).Communicationssystemsneedtoberobust,secure,andshouldhavelowpowerrequirements(referto“DataAcquisitionandTransmission”BestPracticesforfurtherdetail).

Constructiondetails–Whenselectinganddesigningthesensorandsupportsystems,manydetailsneedtobeconsideredwhengeneratingspecificationsandpurchasinghardware.Wiresshouldbeprotectedinconduitandstorageenclosurestoavoidexposuretodamageandseasonaldegradation.Wirelengths,enclosuresizes,andmountinglocationsshouldbeplannedforaccordingly.Anchoringforsupportstructureshouldbedesignedtowithstandworst-caseweather/environmentalconditions.Useofcorrosion-resistantmetalsforstructureandhardwaresuchasgalvanizedsteelandaluminumwillgreatlyreducefailureorongoingmaintenanceproblems.

Sitelayout

Page 12: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Fig.5Carefullyplanningasitelayoutinadvancecanpreventsurprisesandsetbacksduringinstallation.

Sitelayoutatfirstmightseemtrivial,butisveryimportantwhenconsideringinteractionsofthevarioussubsystemsthatcaninfluencesensor/equipmentreliabilityanddataquality.Sciencequestions/objectivesshoulddrivetheplacement/separationofsensorstooptimizemeasurementquality,followedbyplacementofsupportsystemsandadditionalstructure.Solararraysneedtobeangledforsunexposure,minimalshading,andsnowshedding.Theimpactsofsitestructureonmeasurementssuchaswindeddies,incoming/outgoingradiation,cameraviewsheds,orprecipitationcatchzonesneedtobeconsideredaswellasaestheticimpactsiflocatedinaregionthatisfrequentlyvisitedbythepublic.Poweranddatacablerunsshouldbeprotectedandkeptasshortaspossible;voltagedropoverlongrunscanbeaconsiderationinlayoutanddesign.Stipulationsinsitepermitsmaybedriversofsitelayoutandconstruction.Oncethesitelayoutisdesignedandmapped,specificationofconstructionmaterials,sensorcables,andothersuppliesmaybeoptimized.

BestPractices

Fig.6Theapproachtodeploymentshouldbeasdurable,reliable,andflexibleaspossibletoaccommodateunforeseenconditionsandchangingsciencequestionsortechnologyimprovementsoverthelongterm.

Selectionofdeploymentsites,sensorpackages,andsupportsystemsareinteractingprocesseswhichcanrequiresomeiterationbeforearrivingatthefinaldeterminations.Unlessthesciencequestionsareextremelynarroworexceptionalinnature,itisunlikelythatanyoneofthesedecisionscanbemadeinavacuumwithoutconsideringtheothers.Withthisinmind,thefollowingoverarchingrecommendationsshouldbeemphasized:

P.I.consultationwithsystem/hardware/constructionspecialistswhileintheproposalphasewillminimizebudgetsurprisesorplatformcompromiselaterintheprocess.

Page 13: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Dataqualityandlongevityshouldbetheultimategoalswhendesigningthedeployment.Makingchoicesformorerobustandwidely-usedcoresystemsandsensorswillensurethatdatacomparabilityismaximizedandhardwareproblemscorruptingdataorcreatinggapsareminimized.Purchaseofreliableandknownequipmentisnotasexpensiveasrepairing/replacingequipmenthalfwaythroughthestudyorlosingvaluabledata.

Whendataqualityandcontinuityisparamount,useofreplicatesensorsorstationsmayberequired.

Planningforreal-timeconnectivityiscrucialforreducingfieldmaintenancetimeanddatagaps.

Optimalsiteselectiontoanswersciencequestionscanoftenbeimpededbypermitrequirementsandlandownerpreferences.Startingtheconversationwithlandownersearlyintheprocessmayimprovethechanceofgettingthelocations/deploymenttypesthataredesired.

Standardizingsensorandsupporthardware,software/programming,andstructuraldesignsacrossmultiplesitesminimizesmaintenanceissuesaswellasconstructioncostsanddesigntime.

Assessingaccesscapabilitiestothesiteswillallowforplanningofemergencymaintenanceaccess,procedures,andcosts.

Overbuildingstructure,powercapacity,andsiteinfrastructure(e.g.cabling,networking)willpreventproblemsinthecaseofunforeseeneventsorsiteexpansion.

CommonPointsofFailure

Powerproblemsareoneofthemostfrequentcausesoftotalsystemfailure.Batteryfatigue,looseconnections,andelectricalshortsneedtobeanticipatedandpreventedwherepossible.Powersystemsneedtobeprotected,over-engineered,andreplicatedwhereverpossible.

Temperatureextremesofheatorcoldcancauseelectronicormechanicalfailureofindividualsensorsandsystems.Insulatingenclosures,ventilatingenclosures(activeorpassive),andplacementofequipmentinshelteredzonescanhelpalleviatetheseproblems.

Humidityandcondensationcanbeaseriousissueforelectronicslongevityandcircuitperformance(includingaccuracy).Inzonesofhighaveragehumidity,sealingenclosuresandprovidingsomemeansofreducinghumidity(e.g.desiccantpackets)isdesirable.

Sensorscanbedisruptedbywildlife.Hardeningofsensorsystems(e.g.,armoringcables,fences)canhelpwithsomeproblems.Near-realtimedatafeedsallowrapiddetectionofproblemsthatwilloccur.

Lightningstrikesornear-missesareacommonproblematexposedormountainoussites.Extensivegrounding(e.g.exposedcopperwirenetwork)anduseofsurgeprotectionthroughoutthepowersystemandatendsoflongpoweranddatacablerunswillcompartmentalizethesiteelectricallyandprotectasmanycomponentsaspossible.

Lackofdatastoragereplicationcancauselossofdata.Incorporatinghighcapacitystorageon-site(datalogger)aswellasoff-site(database),thisproblemcanbemitigated.

Personnelturnovercoupledwithlackofprocessandhardwaredocumentationcanleadtodatadiscontinuityorequipmentfailure(seeSensorManagementandTrackingforadditionaldetails).

CaseStudies

Page 14: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

NevCANTransectsorWalkerBasin(Scotty)---Tobecompleted,willincludeastationdesignandsystems,maintenance/accessplan,dataflow,andsomephotos/diagrams.AndrewsResearchSites(Adam)---TobecompletedSevilleta-Reneetocompletewithmultiplecasestudyexamples

Page 15: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Data Acquisition and Transmission

Concerns the acquisition of sensor data from the field and

remote control of the system, while ensuring the integrity

of those data.

Page 16: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

backtoEnviroSensingClustermainpage

Contents1Overview2Introduction

2.1Considerations2.1.1CollectionFrequency2.1.2Bandwidth2.1.3Protocols

2.1.3.1Hardware2.1.3.2Network

2.1.4Line-of-sight2.1.5Power2.1.6Security

2.1.6.1PhysicalSecurity2.1.6.2NetworkSecurity

2.1.7ReliabilityandRedundancy2.1.8Expertise2.1.9Budget

3Methods3.1Manual3.2Unidirectional

3.2.1GeostationaryOperationalEnvironmentalSatellite(GOES)3.2.2MeteorBurstRadio3.2.3IridiumSatelliteservice

3.3Bidirectional3.3.1ISMbandradionetwork3.3.2Cellular3.3.3Vendor-specificradionetwork3.3.4Satelliteinternet3.3.5Licensedradio3.3.6MeshNetworks3.3.7Wired

4BestPractices5CaseStudies6Resources

6.1GOES7References

OverviewTraditionally,environmentalsensordatafromremotefieldsitesweremanuallyretrievedduringinfrequentsitevisits.However,withtoday'stechnology,thesedatacannowbeacquiredinreal-time.Indeed,thereareseveralmethodsofautomatingdataacquisitionfromremotesites,butthereisinsufficientknowledgeamongtheenvironmentalsensorcommunityabouttheiravailabilityandfunctionality.Moreover,thereareseveralfactorsthatshouldbetakenintoconsiderationwhenchoosingaremotedataacquisitionmethod,includingdesireddatacollectionfrequency,bandwidthrequirements,hardwareandnetworkprotocols,line-of-sight,powerconsumption,security,reliabilityandredundancy,expertise,andbudget.Here,weprovideanoverviewofthesemethodsandrecommendbestpracticesfortheirimplementation.

IntroductionTheclassicmethodofacquiringenvironmentalsensordatafromremotefieldsitesinvolvesroutinetechniciansitevisits,inwhichs/heconnectsalaptoptoadatalogger,anelectronicdevicethatrecordssensordataovertime,andmanuallydownloadsdatarecordedsincethelastsitevisit.Oncethetechnicianreturnstothelab,s/heisthenresponsibleformanuallyuploadingthesedatatoaserverforlaterprocessingandarchival.

Whilemanualacquisitionmethodsaregenerallyeffective,therearemanyreasonstoautomateenvironmentalsensordataacquisition.Forinstance,ifthesiteisnotvisitedfrequentlyenough,thedataloggermemorycanbecomefullanddependingonhowthedataloggerisprogrammed,sensordatawilleitheroverwriteitselforstoprecordingentirely.Thissituationoftenoccursatremotesitesthatbecomeperiodicallyinaccessibleduetoenvironmentalconditions,suchasheavywintersnowpack.Second,theburdenofresponsibilityfornotonlythesuccessfulretrievalofthesensordata,butalsothesubsequentuploadtoaserverforsafekeeping,liessolelyonthetechnician.Moreover,withanyinstrumentedsite,thereistheinherentpotentialforsensororpowerfailure.Automateddataacquisitionsystemsallowtechnicianstolearnofsuchissuespriortovisitingthefieldsite,reducingthepotentialfordataloss.Finally,automateddataacquisitionmethodssavehundredsofpersonhoursandvehiclemilesthatwouldhaveotherwisebeenspentmanuallyacquiringdataortroubleshootingunanticipatedproblems,thusimprovingtheoverallqualityofthedata.

Bidirectionalcommunicationmethodshavetheadditionaladvantagesofallowingtechnicianstoremotelychangesystemsettings,testconfigurations,andtroubleshootproblems.Thesemethodsalsoopenthefieldtoawidevarietyofdevicesthatmaybedeployedataremotefieldsite,suchascontrollablecameras,on-sitewirelesshotspots,andIP-enabledcontrolorautomationequipment.

Page 17: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Considerations

Thedecisionofwhichsensordataacquisitionmethodtouseatagivensiterequiresthecarefulconsiderationofmanyfactors,forwhichweprovideanoverviewhere.

CollectionFrequency

Whatisthedesiredcollectionfrequency?Howimportantisreal-timeaccessibility?Forinstance,thedatacouldberetrievedinnearreal-time(everyfewminutestoeveryfewhours)orjustonceortwiceperday.Highfrequencydatasetsorimagesshouldbecollectedmorefrequently.

Bandwidth

Bandwidthcanbeanimportantconsideration,particularlywhenhighfrequencydataarebeingcollected.Willcamerasbeutilizedatthesite?Whereisbroadbandpointofpresence(POP)located?Doesequipmentworkwithrequiredbandwidth?Morefrequentcollectionintervalsrequirelessbandwidthpertransmissionarearerecommendedforhighfrequencydatasetsorforimages.

Protocols

Hardware

Manydataloggersonlyhaveserial(RS232)ports,thereforerequiringaserial-to-ethernetconvertertointerfacewithautomatedacquisitioninstrumentation.USB.

Network

PublicIPnetworksareadvantageousoverprivateIPnetworksinmanycasesbecausetheycanbemanagedfromanywherethereisaconnectiontotheInternet.RemoteaccesstoprivateIPnetworksrequiresadvancednetworkexpertisetoprovisionportforwardinginfirewallsand/orVPN.

Line-of-sight

Fig.Anexampleofanear-Line-of-Sight(nLoS)condition,whereinterveningterrainand/orvegetationcaninterferewiththeradiosignal.Inthiscase,theantennaheightsatbothendsareactuallyatthe8mlevel,mitigatingtheeffectsomewhat.Thelinkisoperational,albeitwithareducedReceivedSignalStrengthIndication(RSSI)duetothepresenceofanobstructioninthelinksFresnelzones.

Evaluationofenvironment,topography,andvegetation.CanbeinitiallydeterminedusingLOScalculators,whichuseDEMmodels,butmustbegroundtruthed.Oftenrequiresarepeaterinfrastructure.Choosingrepeaterlocationsinvolvesmanyofthesameconsiderationsforchoosingsiteselection.Distancetorepeaterisafactor.AutomatedsensordataacquisitionmethodsrequiremanyofthesamesiteselectionconsiderationsdiscussedinSensorSiteandPlatformSelection,particularlywhenselectingrepeatersites.

Power

Howimportantisreal-timeaccessibility?(e.g.,whatisdesiredcollectionfrequency?).Whatarethetransmissiontypepowerrequirements,onsitebuffersize.Redundancyispreferred,especiallyinveryremotesites.Ifpowerisdisrupted,willsystemresumeoperations?

Security

PhysicalSecurity

Page 18: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Forphysicalsecurityconsiderations,refertoSensorSiteandPlatformSelection.

NetworkSecurity

Itisrecommendedthatencryptionkeys,suchasWPA2encryption,beconfiguredtopreventunauthorizedaccessofdataacquisitionequipmentorsensordata.AprivateIPnetworkcanfurtherhelptopreventunwantedaccess,butalsopreventseasyremotemanagementbynetworkadministratorsunlessaVPNisinstalled.

ReliabilityandRedundancy

oftransmissionmodeandofequipment.Also,networkinfrastructure.

Expertise

Someacquisitionmethodsareplug-n-playwithsubstantialvendorand/orcommunitysupport,whileothersrequireafairamountofhardwareandnetworkexpertisetoconfigureandmaintain.AllacquisitionmethodsrequirefundamentalknowledgeofIPnetworkingalongwithbasicelectronics,radio,andantennatheory.

Budget

Costsofimplementingadataacquisitionandtransmissionmethoddependonexistinginfrastructure,initialsetupcostsincludingpersonnel,personnelcosts,specificallytechnicianmaintenance,andrecurringcosts,suchasmonthlyrecurringcostswithcellulartransmission.

MethodsTherearethreegeneralcategoriesofremotesensordataacquisitionmethods:manual,unidirectionaltelemetry,andbidirectionaltelemetry.Eachhasadvantagesanddisadvantagesintermsofinfrastructure,cost,reliability,requiredexpertise,andpowerconsumption.

Manual

Thismethodinvolvesscheduledvisitstothesitebyafieldtechnician,whousesaserial-to-computerconnectionand/orflashmemorytransferofenvironmentalsensordatatotheirlaptoporsimilardevice.Uponreturningfromthefield,thetechnicianisresponsibleformanuallyuploadingthesedatatoaserver.Thisacquisitionmethodissimpleandmaybetheonlyoptionwhensiteinstrumentationgenerateslargedatafiles.However,thismethodprovidesnoreal-timedataaccessandtherefore,noknowledgeofinstrumentationfailures.Moreover,thereliabilityofthismethodiscompletelydependentonthetechnician.

Unidirectional

Unidirectionalsensordataacquisitionmethodsinvolveregularlyscheduledwirelessdatatransmissionfromaremotesitetoaserver,withnooffsiteabilitytocontrolorchangesensorsettings.Theseinclude...

GeostationaryOperationalEnvironmentalSatellite(GOES)

Fig.Atypicalcircular-polarizedGOESantennaforone-waybursttransmissionoflimiteddata

Thismethodispreferredinveryremoteandpotentiallyruggedareaswhereotherautomatedtransmissionmethodswouldnotwork.Whileitdoesnotrequireline-of-sitetoarepeaterlikemostothertransmissionmethods,itdoesrequireaviewtothesouthernsky.Additionally,theGOESmethodhasalowpowerrequirement.However,GOEShasseveraldisadvantages,includingahighinitialinvestment(<$5K)andrequirestrainingandlicensing.Moreover,lessthan100valuescanbetransferredperhour,makingitdisadvantageousforsitesthatsampleathighfrequencies.

Page 19: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

DatatransferspeedforGOESsystemsistypicallylimitedto1200bitspersecondwith10secondtransferassignmentsoccurringonceeveryhour.Duringeach10secondperiod,onecantransferupto1500bytesofdata(12,000bits/8)includingthe53byteGOESheaderstring.Inotherwords,maximum1447byteswithtimestampsandmeasuredvaluescanbetransferredtothesatelliteduringonetransmissioninterval.Mostoften,GOESmessagesareorganizedinatimeorderedformatsimilartothefollowingexample:

0105E59013190131824G30+1NN196WXW00517 0 13:00:00 23.7,43,5,245,-55.1,5,245,23.7,23.7,12.8 1 12:30:00 23.7,43,-55.1,204,1011.09,0.000,0.0,24.7,0.270,-0.456,-0.997,-0.416,-2.687,23.5,0.00,214.81,0.00,5,245 1 12:45:00 23.7,43,-55.1,204,1011.11,0.000,0.0,24.7,0.249,-0.468,-0.994,-0.436,-2.650,23.5,0.00,214.82,0.00,5,245

Here,firstlinerepresentstheGOESheaderstringthatincludestheaddress,dateandUTCtimeofthetransfer(13:18:24),signalinformation,satelliteinformation,messagelengthandsomeothercharacters.Intheexampleabove,thelinesthatfollowcarrythetimestampandvalueinformationfromthesensorsets0and1.Asthelengthofeachcharacterinthesensorsetstringis1byte,wecanseethatourGOESmessagehasapproximately280bytesusedfrom1447bytesthatareatheoreticalmaximumforthetransfer.However,inordertoaccommodatethepossibledifferencesbetweenthestationsendingtime,decoders,andscheduledreceptiontime,weneverwanttoreachthisvalue.

ProspectiveusersoftheGOESsystemmustfillouttheSystemUseAgreement(SUA)formand,uponapproval,receiveandsigntheMemorandumofAgreement(MOA)fromtheNOAA'sSatelliteandInformationService(NESDIS).AftertheMOAisapproved,NESDISwillissueachannelassignmentandanIDaddresscodetotheapplyingorganization.Non-U.S.governmentandresearchorganizationsmustbesponsoredbyaU.S.governmentagencyinordertoapplyforthispermission.Uponapproval,allusersmustpurchaseequipmentthathasbeencertifiedtobecompatiblewiththeGOESDataCollectionSystem.AsofMay2013,GOEStransmittersmustconformtothecertificationstandardsversion2(alsoknownasCS2).ThischangewasimplementedtodoublethenumberofGOESchannelsonthesamebandwidth.Asaresult,oldGOEStransmittersthatareonlycompatiblewiththeCS1standardcannotbeusedfornewNESDISassignments.ForassignmentsobtainedpriortoMay2012,CS1transmitterswillbesupporteduntil2023.IfyouconsiderbuyingtheusedequipmentforGEOStransmission,makesurethetransmittersarecompliantwiththeCS2standard.

MeteorBurstRadio

LikeGOES,thismethoddoesnotrequireline-of-sightandhasalowpowerrequirement.However,itrequiresalargeantenna,arrangementofservice,andhasaveryslowtransmissionrate.ItworksbyreflectingVHFradiosignalsatasteepangleoffthebandofionizedmeteoritesthatexist50to75milesabovetheEarth.SeeSNOTELandITUCaseStudiesformoreinformation.

IridiumSatelliteservice

Iridiumprovidestheonlycompleteglobalsatellitecoverage.ThenewIridiumPilotisavailableuntil2016.ThenextgenerationofIridiumisexpectedtobeimplementedaroundthattimeframe.ThePilotisveryeasytoinstallandmaintainwithawaterproofbodyandUSBinterface.Withthissimpleinterfacealaptopcanbeconnectedandsurfingthewebwithinminutes.Recently,thecosthasbecomemoreaffordablewithaperdatausagecoststructure.SinceIridiumoperatesintheLbanditisnearlyimpervioustoweather.Iridiumisusedprimarilyformarinecommunication.

Bidirectional

Thismethodinvolvesbidirectional(andtypicallywireless)transmissionofdatafromaremotesitetoaserver,withtheabilitytomodifydataloggerprogramsand/orsensorsettingsremotely.Thesemethodsgenerallyrequireline-of-sightandsecurityconsiderations(bothnetworkandphysical).Sometimes,canbepurchasedfromanInternetServiceProvider(ISP)ifthereiscommercialcoverageinthearea,orcanbemanuallyinstalledinremoteareas.Often,connectivitycanbeextendedtocomputersonsite.Combinationofseveralmethodsmayberequiredincertainsituations.

ISMbandradionetwork

Fig.Threedifferentantennatypesusedforbi-directionalmicrowavebandcommunication:a)5.xGHz24"dual-polaritydish,30dBigain;b)2.4GHzsingle-polaritygrid,24dBigain;andc)5.xdual-polaritypanel,23dBigain.Thehigherthegainvalue,themorenarrowtheantennadirectivity,increasingsignalstrengthinthedesireddirectionandrejectingadjacent

Page 20: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

interference.Eachofthesedesignshasprosandcons,dependingontheapplication.

(unlicensed,900MHz,2.4GHz,5.xGHz):TheISMbandradiosarecommonlyreferredtoas"WiFi"radios(eventhoughthesearegenerallyusedasbackhaulsandnotwide-areaaccesspoints)andcomeinavarietyoffrequencies.Thismethodhasmanyadvantagesinthatitisnonproprietary,hasnorecurringcosts,usespublicradiofrequencies,allowstransmissionoflargedatasets,utilizesinexpensivehardware,isnotrestrictedtoasinglevendorordevicetype,andhasincreasingcompatibilitywithmanydevices.However,itrequiresline-of-site(LoS)ornear-line-of-sight(nLoS),anetworkinterfaceonloggersanddevices,andbasictoadvancednetworkadministrationskills.Theseradiosarealsosubjecttointerference,particularlyinmorepopulatedareas,andcanhavehigherpowerrequirementsthanothertransmissionmethods.

Cellular

Thismethodhasprolificcoverageandminimalongoingmaintenance.However,itrequiresareliablecellularnetworkbepresentandcomeswithmonthlyrecurringcosts.Occasionallyacontractmayberequiredunlesscanbenegotiatedthroughuniversityororganization.

Vendor-specificradionetwork

Vendorspecificradionetworksuseproprietaryprotocolsandaretypicallymoreexpensivethansomeotheracquisitionmethods,buthavetheadvantageofbeingrelativelyeasytosetupandmaintain.Forexample,Freewave

Satelliteinternet

Thismethodcangetlimited2-wayconnectivityintoaremotesite,albeitathighmonetarycostsandsignificantpowerconsumption.Ithasslowuplinkspeeds,highlatency,requiresasubscription,andon-sitevendorsetupisrequired.

Licensedradio

Thismethodisexpensiveandrequiresapurchaseofalicensedfrequency.

MeshNetworks

Ameshnetworkisanetworktopologyinwhicheachnoderelaysdatathroughoutthenetwork.Ameshnetworkwhosenodesareallconnectedtoeachotherisafullyconnectednetwork.Duetotheinherentredundancyinmeshnetworkdesign,meshnetworksaretypicallyquitereliable,asthereisoftenmorethanonepathbetweenasourceandadestinationinthenetwork.Meshnetworksaretypicallywireless,butcanbewired.Meshnetworksarenotverycommon,especiallyatlargespatialscales,sinceeverydevicemustbeconnectedtoeveryotherdevice.Theinitialinvestmenttobuildsuchanetworkisconsiderablyhigherthanotheracquisitionmethods.Meshnetworks,eitherpartiallyorfullyconnected,aremostcommonlyusedindistributedsensornetworks.

Wired

Whileallmethodsdiscussedutilizewirelesstransmissionprotocols,wiredbidirectionaltransmissionmaybepossibleviain-groundoraerialcopperorfiberoptics.

BestPracticesThinkaboutdataacquisitionaspartofsitedesign.Itismoreexpensivetoaddtelemetrytoapreexistingsitethantointegratewithinitialsiteconstruction.Makesuretoincludeacquisitionmethodpowerconsumptioninthesitepowerbudget,oraseparatepowersystemwillberequired.UsesoftwaretoolswithradioorahandheldspectrumanalyzertosurveyRFconditionson-site.Forinstance,urbanareasaretypicallynoisierwithrespecttoRFinterference,andforWi-Fitransmissionmethods,5Ghzfrequenciesarepreferred.Useabidirectionaltransmissionmethodtoprovidemorecontrolandflexibility.Over-engineerpowersystem,especiallywhenpoweringrepeatersandothersitesinhardtoreachareas.Useequipmentthatcanconservepower(sleepmode)Provideadequatelocalstoragefordisruptedtransmissions.Adequate“offlogger”localstorageisrecommendedtoavoidlosingdatawhen/ifloggerisreset.Provideredundancy,suchthatwhenonelinkgoesdown,thesiteisstillremotelyaccessible.Thisisrelatedtonetworkarchitectureplanning-multiplegeographic/hardwarepathsalongbackhaulroutestofieldhubsishighlydesirable.Examplesinclude:parallelbackhauls,multipleinternetpointsofaccess,"failover"paths.Havinga"backdoor"intothenetwork,evenoverreducedspeedlinks,canallowatechtoremotelytroubleshootproblemsonthemainlinks.Standardizetransmissionprotocolacrossallsitestoprovideeasiernetworkmanagement.Matchradioband,power,antenna,andbandwidthtoapplication.Forinstance,whenasitegenerateshighfrequencysensordata,highbandwidthandhighdatacollectionfrequencyarerecommended.UseanarrowbandwidthforyourRFdevices/coordinatefrequenciesbetweenradiosystemsThoroughlydocumentallsitecoordinates,IPaddresses,maps,radioazimuth,zenith.WhenusinganIPbasedacquisitionmethod,usepublicIPaddressesforeasierremotemanagementofdevices.

CaseStudies

Page 21: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

NevCAN:NevadaClimate-ecohydrologicalAssessmentNetwork-UniversityofNevada,Reno(UNR);DesertResearchInstitute(DRI),UniversityofNevada,LasVegas(UNLV)SevilletaWirelessNetwork-SevilletaLongTermEcologicalResearch(LTER)ProgramandSevilletaFieldStation;DepartmentofBiology;UniversityofNewMexico(UNM),Albuquerque,NewMexico,USAVirginiaCoastReserveLTERWirelessNetwork-VirginiaCoastReserveLongTermEcologicalResearch(LTER)ProgramSNOTELITU

Resources

GOES

NewNESDISAssignmentsCS2StandardCompliance

References

Page 22: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Sensor Management Tracking and

Documentation

Outlines the importance of communication between field

and data management personnel as field events may alter

the data streams and need to be documented.

Page 23: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

backtoEnviroSensingClustermainpage

Fig.Documentationofsensorinstallation,maintenance,andrelatedsystemsiscriticaltolong-termdatausability.

Contents1Overview2Introduction3Methods

3.1Whatshouldbetracked3.1.1Documentationatsetuptime3.1.2Infrastructureeventstotrack3.1.3Siteleveleventstotrack3.1.4Sub-componenteventstotrack

3.2Howtotracktheinformation4BestPractices

4.1Documentspecificinformationduringnormaloperations4.2Maintainingtherecordsandlinkingtoaffecteddatastreams4.3Managingsensorconfigurations

5CaseStudies

OverviewAutomatedobservationsystemsneedtobemanagedforoptimalperformance.Maintenanceoftheoverallsensorsystemincludeanythingfromrepairs,replacements,changestothegeneralinfrastructure,todeploymentandoperationofindividualsensors,andseasonaloreventdrivensitecleanupactivities.Anyoftheseactivitiesinthefieldmayaffectthedatabeingcollected.Therefore,consistentanduniformrecordsofmaintenance,service,andchangestofieldinstrumentationandsupportinginfrastructureserveasmetadataforlongtermqualitycontrolandevaluationofthesensordata.

Inthischapter,wedescribethetypesofmanagementrecordsthatshouldbekeptandthevariousmethodsforcollecting,maintaining,communicating,andconnectingthisinformationtothedata.Itisimportanttocreatetrackinganddocumentationprotocolsearlyonbecausetheseprotocolswillsupportandguidecommunicationsandworkbetweenfieldanddatamanagementpersonnel.

Realtimemonitoringofsystemhealthandalertingsystemsarediscussedinthemiddleware,qualitycontrol,andtransmissionsectionsofthisdocument.Althoughsomeoftheseparametersdonotaffecttheactualdataquality,trackingofthesesystemperformancediagnosticdatamaybehelpfultodetectpatterns

Page 24: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

andpreventfuturedataloss,interveneremotely,andschedulesitevisitsmoreeffectively.Calibrationproceduresandschedules,maintenanceactivities,andreplacementschedulesarehardwarespecificandwillnotbecoveredhereindetail.

IntroductionDataarecollectedtodetectchangesintheenvironment,effectsoftreatments,disturbancesetc.,andinalldatacollectiongreatcareistakentonotmaskthesignatureofeventsofinterestwithimpactsfromunavoidable,samplingrelateddisturbances.Fieldnotesareusuallyassociatedwiththerawdatatobeabletodiscernanaturaleventofinterestfromamanagementevent.Datacollectionapproachesusingautomatedsensingnetworksarebecomingmorecomplexwithmanypeopleinvolvedinthedatagathering,management,andinterpretationactivities,andcommunicationamongallinvolvedpartiesisbecomingmoreimportantandmorechallenging.Fieldnotescanbeausefulvehicleforthiscommunication.Everyoneusingolderlong-termdataknowsthevalueoffieldnotebookstohelpunderstandandinterpretadataset.Fieldnotesareequallyvaluabletofutureusersforasensordatastream,particularlyifthenotesareinterpretedsuchthatinformationisintegratedwiththedataviadataqualifyingflagsandmethoddescriptioncodes.

Currentlytherearenostandardsforflagcodesetsorfordefiningwhicheventsshouldbeflaggedandhowtoefficientlycommunicatewithdatausers.Hereweattempttopresentalistofeventsthatareusefultotrackandthathavebeenhelpfulinthepasttoguidedatausersintheinterpretationandevaluationofthedata.Tomanagethisinformationtheconceptsofa‘logicalsensor’,a‘physicalsensor’,a‘method’and‘eventcodes’haveprovenuseful.

A‘logicalsensor’orasensordatastreamcanbedefinedbyalocation,height/depth,andmeasurementparameter,regardlessofwhatexactphysicalsensororhardwareisusedtologmeasurements.Anexamplewouldbe‘airtemperatureat3mabovethegroundatsiteA’.However,overtimethe‘physicalsensor’willhavetobecalibrated,eventuallyreplaced,andanewtypeofsensormaybechosentoprovidemoreaccuratemeasurements.Ifhardwareisswappedoutfortechnicalreasons,thedatastreamstillrepresentsthesitelocationforthatmeasurement,andthenotionofa‘logicalsensor’allowsidentificationofaconsistentdatastreamovertime.

Changesinthetypeofsensoror‘method’mightbetrackedwithamethodcodeassociatedwiththelogicalsensor.Ofcourseshouldareplacementsensorbesignificantlydifferentsuchthatthepastandnewdatastreamarenotcomparable,thenanewlogicalsensorstreamshouldbeinitiated.Eventssuchasroutinecalibrationmightbeflaggedwithan‘eventcode’ratherthanachangein‘method’,evenifthiseventhaslastingeffectsonthedata,i.e.,moreaccuratedata.Aneventcodemayserveasameanstolinktoindividualfieldnotesfortheevent.‘Physicalsensors’shouldalsobeindividuallyidentifiablebylocationandtrackedthroughacalibrationorreplacementschedule.

MethodsWhatshouldbetrackedBasicinformationonthesiteandhardwareconfigurationneedtoberecordedatinstallationtime.Duringnormaloperationseventtrackingcanbedoneatseverallevelsofgranularitywithrespecttoaresearchprogram.Forexample,itmaybedoneattheleveloftheentireinfrastructure,atasite,oratasub-componentofasite.Theinformationabouteacheventneedstobepropagatedorconnectedtoallrelevantdatastreams.Followingareexamplesofwhatshouldbetrackedateachoftheabovelevels,intermsofimpactontherecordeddata:

Documentationatsetuptime

Page 25: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Locationlat,long,elevation(and/ordepth),direction(e.g.camerafacingnorth),Locationfromacertainreferencepoint(e.g.towerbase)SitedescriptionSitephotoswithmetadata,photosofprocedures(howdoyouchange...),photoofsensor(sootherscaneasilyrecognize)ManufacturersspecsandIDofinstruments(make,model,serialnumber,range,precision,detectionlimit,calibrationcoefficient)Instrumentation(e.g.datalogger,multiplexer,sensor)wiringdiagrams(thisshouldbepartoftheloggerprogramcomments,aheadersectionwiththewiringdescriptionchannelbychannel)Powerwiringdiagrams(e.g.howmanysolarpanels,aretheyhookedupinseriesorparallel,etc.)NetworktopologyandIPaddressesSoftwareusedforcalculatingmeasurements(otherthandatalogger)Instrumentationdeploymentdate(the“golive”date)

Infrastructureeventstotrack

Changestodataloggers,multiplexers,ordataloggerprograms(dataloggerprogramsmaybearchived)Powerproblems,includingbatteryvoltageEnclosuretemperatureandhumidityPlatformmaintenance(e.g.,towerinspection,tramlineleveling,etc.)Samplingprotocolchanges(e.g.,timing,routinechangingorupgradingofsensorparts,instrumentchangeorreplacement)RF/networkperformancedegradation(preventssome/alldatafrombeingtransmitted;trackhealth/statusofIPnetworkdevicesusingSNMPstreamstoNagios,etc.)

Siteleveleventstotrack

Sitedisturbance(e.g.,animal,human,weathercaused)Sitevisits(presenceofpeoplemaychangemeasurements)Sitemaintenance(e.g.,cuttingbrush,cuttingtrees,etc.)Changestosensornetworkdesign,includingadditionsordeletionsofsensors

Sub-componenteventstotrack

Here,weincludecomponentslikeindividualtelemetry,powersystems,instruments,sensorcomponents,etc.Whileeachcomponentdoesn’taffectthewholesystem,theystillmayinfluencetheinterpretationofthemeasurements.TotrackindividualcomponentsasystemofIDsmaybedevelopedforallcomponentsandsupportedbyBarcodes,Geo-LocationTagsandMicrochipEncodedSensors.

SensorfailuresSensorcalibrationsSensorremovalSub-sensoraddition,removal,orchange(pluggablesub-sensorpositionswithinthemainsensorneedtobenotedandkeptconsistent)Sensorinstallation(replacement)Sensormaintenance(cleaning,changeofparts)SensorfirmwareupgradesEnclosuretemperatureandhumidityRepositioningofsensor(e.g.,moveupduringwintertobeabovesnowlineNormal(nonextreme)disturbancesastheyarenotedandremoved(e.g.,sticksinweirs)Methodologychanges(e.g.,temperatureradiationshieldchange)

Howtotracktheinformation

Page 26: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Minimallydocumentingorloggingsiteeventsorproblemsmightbeinatablestructuresuchas:

SiteID DataloggerID SensorID datetimebegin

datetimeend category notes person

controlledvocabulary

However,usuallyalotmoreisrecordedateachsitevisit-seeusecases.Acontrolledvocabularyisveryimportanttocategorisetheeventforlaterinterpretationandflagginginthedatasetandshouldbeestablishedasearlyaspossiblewithprojectspecificterms.Severaldatabasestructurestomaintainthisinformationandconnecttotheactualdataarecurrentlybeingproposedanddiscussedbelowinusecases.

BestPracticesEstablishanddocumentproceduresandprotocolsforsitevisits,installationofnewsensors,maintenanceactivities,calibrations,communicationbetweenfieldanddatapersonnel.Suchprotocolsmayincludepre-designedfieldsheetsorsoftwareapplicationsonfielddataentrydevices,bothofwhichshouldbesynchronizedwithacentralstoragesystemtowhichallpartieshaveaccess.Observationsinthefieldmayalsobemadeandrecordedbyresearchersandfieldpersonnelnotdirectlyinvolvedinthesensorsystemmaintenance,andprovisionsshouldbemadetocapturethatinformationandcommunicateittoresponsiblestaffmembers.

Inadditiontocapturingthefieldeventsmentionedaboveitisgoodpracticeforthedatamanagementstafftoregularlymonitorthedataandconferwiththefieldcrewwhenanomaliesarenoticed.Thisfrequentlywillbringupadditionalinformationthatneedstoberecordedinthefield.Itisalsogoodpracticetohavethedatamanagementstaffvisitthesite,periodicallyassistwithfieldmaintenanceactivitiestobetterunderstandandinterpretfieldnotesandgenerallyinteractwiththefieldstaff.

Allphysicalsensorsshouldbeuniquelyidentifiable.Thismaybeachievedbyrecordingaserialnumber,attachingabarcode,usingintelligentsensorswhicharecapableofstoringtheirownmetadataandwhichcanbeaccesseduponconnection.Thisisparticularlyimportantforsensorsthataremovedaroundorarepulledformasscalibrationandredeployed.SensorlocationandcalibrationschedulesshouldbetrackedbyeachsensorwithID.

Documentspecificinformationduringnormaloperations

Eitherapre-designedfieldsheetoradataentryapponafielddevice(tablet,laptop,etc.)helpsremembereverydetailtorecord.Itisalsohelpfultodefinealistoftermstodescribethemostcommonproblemsinaconsistentwayforlateranalysis.DocumentsiteID,date,time,person(s),siteconditions,tasksperformedeverytimeasiteisvisited.Whenupdatingdataloggerprograms,useanewprogramnameforeverychange.Itisadvisabletosaveolddataloggerprograms.Useachangelogsectioninadataloggerprogramcommentheadertonotedate,author,anddescriptionofdifferencesfromlastdataloggerprogram.i.e.versioning/revisioncontrolForsensorspecificeventsnotethesensorID(BarCodes,Geo-LocationTags,MicrochipEncodedSensors(NEON'Grape'),orintelligentsensorsthatstoreandprovidetheirownmetadatauponconnection).

Maintainingtherecordsandlinkingtoaffecteddatastreams

Asmentionedearlier,thisrecordkeepingisaneffortincommunicationbetweenfieldanddatapersonnelaswellascommunicatingeventstofuturedatausers.Henceagoodpracticeistopermanentlylinkthisinformationtothedataset.Thismaybeachievedondifferentlevels-adescriptioninametadatadocument,anindicatorofamethodforadataseriesoreachdatavalue,aflagindicatingaonetimeevent

Page 27: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

atacertaindatavalue.Asaminimumaffecteddatashouldbeflaggedinadifferentcolumnwithinthedatatable.

Followingtheconceptofalogicalsensor,certaineventsshouldtriggerthestartofanew‘method’descriptionwhenthedatastreamisaffectedmorethanregularcorrectionscanaccommodate(e.g.,newsensorusingadifferentmethodsofmeasurement).Inthiscaseitisgoodpracticetoruntheoldandthenewsensorsidebysideforawhiletocompare.Nohardandfastguidelinesareavailablefordecidingwhenamethodchangeoccursandwhenawholenewlogicalsensorstream(i.e.,differentdatasetordatatable)shouldbestarted.TheseconceptsarewellimplementedintheCUAHSIODM,pleaseseethosedocumentsforfurtherdiscussion.

Mostevents,however,canbehandledbywelldocumentedflags(sensorcalibration,sitemaintenanceactivities,disturbances,etc.).Fordocumentation,flagsinthedatafileshouldlinktoadatabasewithmoreextensiveexplanationsoftheevents.

Managingsensorconfigurations

Anumberofsensorsprovidecoremeasurements,butwillalsoprovidetheabilitytoexpandthesensorviaoneormorepluggableports.Whenasub-sensorisconnected,thedatafromthesub-sensorareusuallyaddedtothemaindatastreamasavoltagemeasurementthatgetsconvertedtothemeasurementparameterunitspost-transmission.Trackboththenumberofsub-sensorsandtheirportpositions,sinceachangetoeithermaycauseproblemsinprocessingthedatastreaminmiddlewareapplications.Forinstance,awatersamplerlikeaCTDmayprovideportstoconnectsubsensorsfordissolvedoxygenorturbiditymeasurements.NotethattheDOsub-sensorshouldalwaysbeconnectedto,say,voltageport1,andtheturbiditysensorisalwaysconnectedtovoltageport2,andvoltageport3isempty.

SeealsomiddlewarecapabilitiesandQA/QCproceduredocumentationinthoserespectivesections.

CaseStudies

Casestudy:DatamodelfortrackingsensorsandsensormaintenanceattheUtahWaterResearchLaboratory(J.Horsburgh,September2013)

ThedatabasedesigndiagramdepictsthedatamodelasitisusedattheUtahWaterResearchLaboratory,UtahStateUniversity.ItwasdevelopedbyJ.Horsburghandhisresearchteam.CurrentlyeffortsareunderwaytoextendtheCUAHSIODMtostorethiskindofmetadatabasedontheexperiencewiththisdatamodel.

Casestudy:TwoexamplefieldsheetsfromtheHJAndrewsExperimentalForestinOregon.

HJAndrewsstreamgagechecksheetHJAndrewswatershedchecksheet

Page 28: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Streaming Data Management Middleware

Discusses software features for managing streaming

sensor data.

Page 29: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

backtoEnviroSensingClustermainpage

Fig.1Thepositionofmiddlewareinagenericsensordatamanagementsystem.

Contents1Overview2Introduction

2.1ResearchAgenda2.2TechnologicalRequirements2.3PersonnelSkills

3MiddlewareClassifications3.1Classificationbyfunctionality3.2Classificationbyproprietyandtype

4BestPractices5CaseStudies

5.1MarmotCreekResearchSite,RockyMountains,Canada5.2UniversityofTexasatElPaso'sSystemEcologyLab,Jornadaresearchsite,NM

6MiddlewarePackageDescriptions7References

OverviewMiddlewarearesoftwarepackagesandproceduresthatresidevirtuallybetweendatacollectors,suchasautomatedsensors,anddata‘consumers’,suchasdatarepositories,websites,orothersoftwareapplications.Middlewarecanbeusedtoperformtaskssuchasstreamingdatafromdataloggerstoservers,archivingdata,analyzingdata,orgeneratingvisualizations.

Manymiddlewarepackagesareavailablefordevelopingacomprehensive,reliable,andcost-effectiveenvironmentalinformationmanagementsystem.Eachmiddlewareoptioncanhaveauniquesetofrequirementsorcapabilities,andcostscanvarywidely.Asinglemiddlewarepackagemaybeusedifitincludesalloftheuserrequirements,ormultiplemiddlewaremaybebundledintoadatamanagementsystemiftheyarecompatibleorinteroperablewitheachotherandtherestofthedatacollectionandmanagementsystem.

Thissectiondescribesmultiplemiddlewarepackagesthatarecurrentlyavailable,andprovidesexamplesofhowdifferentsoftwareandproceduresarebeingusedtocollect,analyze,visualize,anddisseminatesensor-suppliedenvironmentaldata.

Page 30: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

IntroductionTherearemultiplefactorsthatmayaffectthechoice,use,andperformanceofmiddleware.Thesefactorsmaybeclassifiedaccordingtoagroup’sresearchagenda,technologicalrequirements,andpersonnelskillsets.

ResearchAgenda

Theresearchagendaofagroupisamajordeterminantofthetypeofmiddlewaresystemneeded.Agroupfocusedononlyoneorafewnarrowlyfocusedresearchquestionsmayneedfewertypesofsensorsandconsequently,fewersoftwaremodulesmaybeadequatetostreamlinedataprocessingfromcollectiontotheendgoal.Ateamthatinvestigatesmultiplequestionsspanningmultipleresearchdomainsislikelytousemorediverseand/orlargersetsofsensors.Theremaynotbeasinglemiddlewarepackagethatcanmeetalloftheneedsofaresearchgroup.Inthiscase,multiplepackageswillneedtobelinkedintoaworkflow.

TechnologicalRequirements

Thetechnologicalrequirementsofaresearchprogrammayvaryfromsimpletocomplex.Iftheresearchcanbedonewithsensorsfromasingle,well-managedcompany,theproprietarysoftwarepackagedwiththepurchasedsensornetworkmaybeadequateforatleastamajorportionoftheinformationmanagementsystem.Forexample,forCampbellScientificdataloggers,their“LoggerNet”softwareintegratescommunication,datadownload,displayandgraphicsfunctions.However,somedataloggersandsensors(particularlyinnovativeones,custom-built),mayneedcustom-writtensoftware.Itisimportanttoplantimeandbudgetsforrequiredsoftwareupgrades,licensing,additionalpackages,support,andmaintenance.Systemsthatcostlessintheoutsetmaynotalwaysbecheaperoverthelongrun.Itisalsoimportanttoconsiderhowtobestmeetinfrastructureandbandwidthrequirements,whiledeployingmiddlewareonavarietyofserversorlaptopcomputersinthefieldorlabsetting.Dependingonthedataandhardwareinfrastructurecharacteristics,eachmiddlewareoptioncanintroducebenefitsordrawbackstotheoverallsystemfunctionality.

PersonnelSkills

Anotherkeyfactortoconsideristheskillsetofthepersonnel.Acomplexdatamanagementsystemmayrequiremultiplepeople,eachwithauniqueskillsetsuchasdatabasedesign,systemarchitecture,webprogramming,etc.Itisimportanttocorrectlyidentifyeachperson’sskillsetandroleindatamanagementtasks.Itmayalsobenecessarytoplanforadditionalhiresorjob-trainingtoaddressesvariousscenariosandsolutions,toidentifyappropriatesalaries,andtobudgetenoughtimeforsoftwaredevelopmentandsystemadministration.Moredetailsaboutthepersonnelrolesandskillscanbefoundinthe“Rolesandrequiredskillsets“section.

MiddlewareClassifications

Classificationbyfunctionality

Middlewarecanbeclassifiedwithrespecttothefunctionalitytheyprovide,suchas:

Controllinginstrumentationanddatacollection:Modulesmaybeusedtocontrolsamplingintervals,managetheevent-triggered(burst)orcontinuoussamplingregimes,communicateandtransferdatabetweentheinstrumentationandothersystemcomponents.

Datamonitoring,processing,andanalysis:Modulesmayprovidealarmmanagement,performautomatedQA/QCondatastreams,orrunderivativecalculationsincludingaverages,aggregationandaccumulation,datashiftingandtransformation,filteringoftimeseriesrecordswithrespecttothedates,valuerange,location,station/variabletype,orothercriteria.

Exportandpublishingofdata:Modulesmayprovidefunctionalitytoexportsensordatatodifferent

Page 31: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

formats(e.g.,ASCII,binary,orxml),differentarchives,makedatadiscoverablethroughgeospatialcatalogues,orpublishthedatathroughwebservices.

Datavisualization:Modulesmayprovidevisualization(e.g.,tables,graphs,sonograms)ofgeospatialand/ortimeseriesdatafromsensorarraysorworkflowstructures.

Documentation:Modulesmaybeusedtodocumentfieldeventsthroughpaperlesscollectionoffielddata,integratesensordataanddocumentation(seesensortracking&documentationsection),orhandlesensorcalibrationrecords.

Othersupportedfunctionality:Modulesmaybeusedtoprovideaccesstoexternaldata(e.g.,ODBC,JDBC,OLEDB),toconnectorchainothermiddlewarecomponents,ortoimplementmobileapplications.

Classificationbyproprietyandtype

Middlewarecanalsobeclassifiedbysoftwareproprietaryrightsandwhethertheyareconsideredapplicationsorplatforms.Accordingly,wecanidentifydifferentgroupsofmiddleware:

ProprietarydatamanagementapplicationsandplatformsProprietaryresearchapplicationsLimitedopensourceapplications(freepackagesthatcanbeusedwithproprietarysolutions)OpensourcedatamanagementapplicationsandplatformsOpensourceresearchapplicationsandprogramminglanguages

Someoftheapplicationsandplatformslistedaboveareoftenidentifiedasasoftwareofchoiceformanydifferentorganizations.Moredetailsabouteachofthesecomponentsareprovidedinthenextsectionofthisdocument.

BestPracticesChoosingthemiddlewarecomponentsthatwillbestfitthetasksandworkenvironmentcanbechallenging.Inadditiontothepersonnelrolesandskills,budget,andinfrastructureconsiderationsalreadydiscussedintheIntroductionandotherchaptersofthisbestpracticesguide,itisimportanttobeawareofthewholesensormanagementprocessinordertoidentifythesuitablemiddlewarecomponents.Insomecases,aproprietarymiddlewaresoftwarewillrequiredaspartoftheinformationmanagementsystemiftheinstrumentationonlyoutputsdatainaproprietaryformat.Inothercases,multipleopensourcesoftwarepackagesmaybesuitableforchainingintoacomprehensivesystemthatmanagesdatafromcollectiontofinalarchivingandsharing.

Somestepsinselectingmiddlewareare:

Page 32: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Fig2.Sensormanagementworkflow.Simplesensormanagementconfigurationispresentedinblue;optionalsystemcomponentsareshowningrey.

1. Identifyyourobjectives.Whatdoyouwantthemiddlewaretodo?2. Assemblealistofcandidatesoftware.3. Ratethecandidatesbasedoncapabilities,cost(keepinginmindthatasimple-to-usebutexpensive

packagemaycutcostsinthelong-term),stability,andeaseofusewithrespecttothepersonnelskillsavailableonyourteam.

4. Ifnosinglesoftwareproductcanmeetalltheobjectives,testtoseehowwelldifferentcandidatesoftwareintegratewithoneanothertoperformtheneededfunctions.

Duringthisplanningstage,considerthefollowingrecommendations:

Identifyworkflowcomponentsanddescribetheirfunctionalrequirementsfromtheinstrumentationtothearchiveleveloforganization(seeFigure2).Somecomponentscanbeoptionalorpartofthemorecomplexsolutions.Planforrobustexecutionandchoosesoftwareandhardwarecomponentsthatcanhandlethelossofconnectivity,power,orotherfailuresrelatedtoharshenvironmentaloroperationalconditions.Choosereusable/sharablecomponents.Keepfielddeploymentofmiddlewareassimpleaspossible(keepoutoffieldifpossible).Useasfewmiddlewarecomponentsaspossiblebasedonresearchgrouprequirements.Documentanddiagramtheentireworkflowandupdateasneeded.

CaseStudiesWepresentseveralrealworldcasestudiesinthatvarywidelyinthetypesofecosystemsthatsensorsaredeployedinandincomplexityoftheinformationmanagementsystem.Somecasestudiesincludeproprietarysoftwareonly,someincludefreeoropen-sourcesoftware,andsomeincludeboth.

MarmotCreekResearchSite,RockyMountains,Canada

Page 33: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Fig.3.MarmotCreekresearchsite'sPacBusNetworkwithmixeddataloggersandRaventoRF401Base

IntroductionMarmotCreekresearchsiteislocatedontheeasternslopesofRockyMountainsinAlberta,Canada.Thesiteisdominatedbytheneedleleafvegetationandpoorlydevelopedmountainsoils.Precipitation,snowdepth,soilmoisture,soiltemperature,shortandlongwaveradiation,airtemperature,humidity,windspeed,andturbulentfluxesofheatandwatervapourdatasetsarecollectedandusedforthehydrologicalmodellingoftheMarmotCreekBasin.TimeseriesrecordsareobtainedatHayMeadow,UpperClearing,VistaView,FiseraRidge,andCentennialRidgehydro-meteorologicalstationsequippedwithdifferentsensorconfigurationsandCampbellScientificdataloggers.

CommunicationequipmentandmethodsThetelemetrynetworkconsistsofoneRavenCDMAcellularmodemandRF401spreadspectrumradiomodemlocatedattheUpperClearingbasestation,fouradditionalRF401modemslocatedateachoftheMeteorologicalstationsservicedbytelemetry,andthedesktopcomputerlocatedattheUniversityofSaskatchewan.Theradiosconnectedtothedataloggersateachofthemeteorologicalstationstalktothebasestationonanongoingbasis.AllofthedataloggersandRF401radioshavePacBusaddressesandtheyoperateasPacBusNodes.Also,dataloggersaresettooperateasroutersenablingroutinginsidethisnetworkthroughthevariouspaths.ThetelemetrynetworkconfigurationispresentedinFigure3.

DatacollectionandprocessingAttheintervalsprescribedwithintheLoggerNetapplicationrunningonadesktopcomputer,dataiscollectedfromthemeteorologicalstations.TheRavenCDMAtransfersdatautilizingadynamicIPaddressanditsstaticaliasassociatedthroughtheAirlinkIPmanagersoftware.TheuniquePakBusaddressisassignedtoeachofthedataloggersinthistelemetrynetwork.Inmostcases,loggerdatafilesattheoff-sitelocationwillbeappendedonadaily,four-hourlyandhourlybasis.Inadditiontothescheduledintervals,fielddatacanbedownloadedondemandthroughtheLoggerNetapplication.

LoggerNet“TaskMaster”utilityisusedtoexecutecustomprogramsaftereachsuccessfulcollectionofthefielddata.Also,theutilitycanbeusedtostartscheduledexecutionsofdifferentprogramsandoperations.ForMarmotCreekrecords,TaskMasterisusedtorenamethecollecteddataloggerfilesanduploadthemtotheFTPserver.

DatapublishingFielddatadownloadedtotheoff-sitecomputerareaccessedbytheRTMCPROLoggerNetutility.Lastmeasuredvaluesaremappedtothespecifiedlocationsonawebpage.ThewebserverhostsdifferentRTMCfilesfordailysummaryinformation,stationdatatables,alarms,andotherrecords.Themaininterfacecontainsindividualwindowsforthemainscreenwebpageaswellasthescreensforindividualstations,weeklydatagraphs,andsiteinformation.RTMCfilesinterfacewiththewebpageviatheRTMCWebServerdesktoputility.

ReferenceCentreforHydrology,UniversityofSaskatchewan.UniversityofSaskatchewanHydrologyFieldDataRetrievalandManagementManual.2009.PDFfile.

Page 34: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

UniversityofTexasatElPaso'sSystemEcologyLab,Jornadaresearchsite,NM

Middlewareused:Hobolink,MySQL,R,ArcGIS,HTML5/Javascriptwebsite

IntroductionTheSystemsEcologyLab(SEL)attheUniversityofTexas,ElPasostudiespatternsandcontrolsofland-atmosphericwater,energy,andcarbonfluxesinbotharcticanddesertbiomes.AttheUSDA-ARSJornadaExperimentalRangeinsouthernNewMexico,SEL’sresearchsitecollectsdatausing>100automatedsensors(madebyCampbell,Onset,Decagon,PPSystems,andothers),andmanualfieldobservations.Sensorsaremountedonaneddycovariancetower,eightconnectedmini-towers(whichtogetherformawirelesssensornetwork),andacartmountedona110mlongtramline.>4GBofdataiscollectedperweekfrommicromet,hyperspectral,andgasfluxsensors,aswellascameras(detectchangesinphenology).Thisresearchsiteisalsousedtohelpdevelopandtestnewcyberinfrastructureandinformationmanagementconceptsandtools.Forthiscasestudy,wefocussolelyonmeasurementsmadebythe8-nodewirelesssensornetwork.

Fig.4.SEL-Jornadaresearchinformationsystemframeworkintermsofdataflow(symbolizedbyarrows).Webservicesareusedbyweb-basedapplicationstoquerydatafromdatabases.

DataCollectionandProcessingThe8-nodewirelesssensornetworkiscomposedexclusivelyofOnset’sHobodataloggers(8)andsensors(62).Eachdataloggerispoweredbyitsownsolarpanel.Sensorsmeasureprecipitation,leafwetness,PAR,solarradiation,andsoilmoisture.DataarerelayedtotheJornadaHeadquartersandsenttoOnset.ThedataareavailableforvisualizationanddownloadingviaHobolink(http://www.hobolink.com).Theonlineserviceallowsuserstosetupalertsforsystemmalfunctionsandautomatedreportingofdata.Ourteamdevelopedadatabaseschema(usingMySQLforimplementation)thatusesacorecommonconceptamongalldatasets-ameasurementonafocalentitybyanobserverataspecificlocationandtime-toorganizedata.Aroundthiscore,thereareothertablestostoremetadatasuchasprojectinformation,maintenancerecords,andrelatedfiles.ThewirelesssensornetworkisimportedintothedatabaseviacustomSQLscripts.Withinthedatabase,scheduledqueriescandobasicdataquality-checkingandflagging,andgeneratetablesofsummarizeddata(e.g.,dailymeans)thatcanbeaccessedshortlyaftertherawdatawereimported.

DataPublishingOurteamwrotewebservicesinPythontoquerythesummarizeddataanddelivertheresults(inJSONformat)toaJavaScript/HTML5websiteinwhichAmCharts’JavaScriptChartslibrary(http://www.amcharts.com/;freefornon-commercialuse)areusedtoplotthedataandgeneratereportsincludingthedataandpertinentmetadata.ThiswebsitealsorendersamapofthesiteandsensorsviaanESRIArcSDEgeodatabaseandArcGISServer.

MiddlewarePackageDescriptionsInthissection,severalmiddlewarepackagesaredescribedasabasicintroductiontothepackages.Nospecificendorsementorcriticismisimplied,anditisimportanttokeepinmindthatsoftwarepackagesareoftenrevised.

AquaticInformaticsAquarius:AQUARIUSisasoftwareforwatertimeseriesdatamanagementthatprovidesfunctionalitytocorrectandqualitycontroltimeseriesdata,buildratingcurves,transformandvisualizethehydrologicaldata,aswellastopublishthefielddatainreal-time.Toaccomplishthesetasks,AQUARIUSusesthreemaincomponents.DataAcquisitionSystemenablesaccessingtherealtimedatafromthefieldinstrumentationeitherastheextensiontotheEnviroSCADAorthroughthehotfolders.AquariusServerisawebcontrolleddatamanagementplatformthatenablescentralizedaccess

Page 35: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

tothedatabasestoreddatasets.Also,theserverisusedtopublishthedataeitherthroughthepublicwebportalorasaRepresentationalStateTransfer(REST)webservicethatsupportstheWaterMLrepresentationofthedata.Finally,AQUARIUSWorkstationprovidesasetofdataprocessingtoolstoimportandprocessthedata,createratingcurves,andapplyQA/QCprocedurestothetimeseriesrecords.

AquariussystemcomponentsAquariusmodelingapproaches

CampbellScientificLoggerNet:LoggerNetisthemainCampbellScientificsoftwareapplicationusedtosetup,operate,andmanageasensornetworkthatusesCampbellScientificequipment.LoggerNetusesserialports,telephonydrivers,andEthernethardwaretocommunicatewithdataloggersviacellularandphonemodems,RFdevices,andotherperipherals.LoggerNetalsoincludesasuiteoftoolssuchastexteditorsforcreatingCampbellScientificdataloggerprograms,andmethodsforreal-timemonitoring,automateddataretrieval,datapost-processing,datavisualizationandmonitoringofretrievedinformation,anddatapublishingoptions.Moreadvancedfeatures,suchasexporttotoMySQLorSQLServerdatabases,arealsoofferedthroughadditionalLoggerNetapplicationsnotincludedinthestandardversion(LoggerNetDatabase,LoggerNetAdmin,LoggerNetRemoteetc.).

LoggerNet4.1InstructionManual

CUAHSIHIS:TheConsortiumofUniversitiesfortheAdvancementofHydrologicScience,Inc.'sHydrologicInformationSystem(CUAHSIHIS)isanadvancedwebservicebasedsystemcreatedtosharethehydrologicdata.ThesystemiscomprisedofhydrologicdatabasesrunningtheCUAHSIObservationsDataModel(ODM)andservershostedbydifferentorganizationsthatareconnectedthroughthewebservices.CentralizedHISmodulesareusedfordatapublication,access,anddiscovery,whilelocal(andcentral)modulesprovidetoolsfordataanalysisandvisualization.Overall,CUAHSIHISisusedtostoretheobservationdatainarelationaldatamodel(ODM),accessthedatathroughinternet-basedWaterDataServicesthatpublishtheobservationsandmetadatausingaconsistentWaterMarkupLanguage(WaterML),indexthedatathroughaNationalWaterMetadataCatalog,andprovideadiscoveryofdatathroughamapandkeywordsearchsystem.

CUAHSIHIScomponentsCUAHSIHISlistofpublicationsDevelopmentofaCommunityHydrologicInformationSystem

DataTurbineInitiative:DataTurbineisarealtimestreamingdataenginethatactsasablackboxtowhichdataproviders(sources)senddataandconsumers(sinks)receivedatafrom.DataTurbineisimplementedasamulti-tierjavaapplicationwithserversacceptingandservingupthedata,sourcesloadingthedataontotheservers,andsinkspullingthedataforvisualizationandanalysispurposes.Eachofthesecomponentscanbelocatedonthesamemachineordifferentcomputersandcancommunicatewitheachotherovertheinternet.Dataisheterogeneousandthesinkscouldaccessanytypeofdataseamlessly.Whilenewdataisloadedtotheserver(s),olddataisbeingerasedinordertofreethereceivingbuffers.

DataTurbine–SensorNetworksWorkshopUnderstandingDataTurbine

GCEDataToolbox:TheGeorgiaCoastalEcosystems(GCE)DataToolboxisasoftwarelibraryformetadata-basedprocessing,qualitycontrol,andanalysisofenvironmentaldata.ItisdesignedandmaintainedbyWadeM.Sheldon,Jr.oftheGeorgiaCoastalEcosystemsLTERandisavailablefreeofcharge,butdoesrequireaMATLABlicense.TheToolboxcanbeusedforawidevarietyofenvironmentaldatamanagementtaskssuchas:importingrawdatafromenvironmentalsensorsforpost-processingandanalysis;performingqualitycontrolanalysisusingrule-basedandinteractiveflaggingtools;gap-fillingandcorrectingdatausinggatedinterpolation,driftcorrectionandcustomalgorithms/models;visualizingdatausingfrequencyhistograms,line/scatterplotsandmapplots;summarizingandre-samplingdatasetsusingaggregation,binning,anddate/timescalingtools;synthesizingdatabycombiningmultipledatasetsusingjoinandmergetools;miningnear-real-timeorhistoricdatafromtheUSGSNWIS,NOAANCDC,NOAAHADSorLTERClimDBservers;

Page 36: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

harvestingandintegratingchanneldatafromDataTurbineservers.Thissoftwareishighlymodularandcanbeusedasacomplete,lightweightsolutionforenvironmentaldataandmetadatamanagement,orinconjunctionwithothercyberinfrastructure.Forexample,newlyacquireddatacanberetrievedfromaDataTurbineorCampbellLoggerNetDatabaseserversforqualitycontrolandprocessing,thentransformedtoCUAHSIObservationsDataModelformatanduploadedtoaHydroServerfordistributionthroughtheCUAHSIHydrologicInformationSystem.

GCEToolboxoverview(GeorgiaCoastalEcosystemsLTER)

KistersWISKI:WISKIsoftwarepackageisatoolforhydrologicaldatamanagement.WISKIisaWindowsbasedclient/serversystemhostedthroughtheMSSQLorOracledatabases.Thesoftwarecombinesdatamanagementfeatureswithtoolstocollect,store,analyze,visualize,andpublishtheobservationdata.Typicaldatainputsourcesareremotedatacollectedfromthefielddataloggers,dataimportedfromthirdpartiesviainputfilesindifferentformats,recordsobtainedfromdigitizationofgraphicalcharts,ormanualinputs.MainWISKImoduleincorporatesthedatamanagementfunctionalityaswellasthedischargeandratingcurvetoolsthatworkcloselywithotherKisterssoftwarecomponentsincludingKiWQM(waterquality),KiWIS(datapublishingthroughwebservices),SODA(telemetryhardwaremoduleforremotedatacollection),KiDSM(taskscheduler),Modelingapps(Link-and-Nodeandstatisticalforecast),ArcGISextensions,WebPublicandWebPro(webserverpublishingapplications).

WISKIsystemoverviewWISKImodules

NexSensiChart:NexSensiChartisaWindows-baseddataacquisitionpackagedesignedforenvironmentalmonitoringapplications.iChartsupportsinterfacingbothlocally(directconnect)andremotely(throughtelemetry)withmanypopularenvironmentalproductssuchasYSI,OTT,andISCOsensors.AdditionallyitcaninterfacewithaNexSensiSICandsubmersibledataloggers.Thesoftwaresimplifiesandautomatesmanyofthetasksassociatedwithacquiring,processing,andpublishingenvironmentaldata.

[http://nexsens.com/pdf/nexsens_wqdata_spec.pdfNexSensWQDataandiChartsoftwareoverviewiChartsoftwareproductspotlight(LakeScientist)NexSensdatawebsite(BucknellUniversity)iChartquickstartguide

OnsetHobolinkandHoboware:OnsethastwomainsoftwareapplicationstosupportitsHobodataloggersandsensors..Hobolinkisanonlineservicesthatprovides5-minutedatafromitsdataloggers,multiplegraphsofdatastreams,customizeableinterface,settingsforautomatedalertsforsensormalfunction,andcustomizeabledatareportingfeatures.Hobowareisadownloadablepackagethatprovidesmorefunctionality,suchaslinechartsformorethanonedatastream,chartingtypesthatareunavailableinHobolink,etc.

HOBOwareProvs.HOBOwareLiteListoffeaturesHOBOware®User’sGuide(Datavisualizationandanalysis)HOBOlink®User’sGuide(DataaccessandcontrolofHOBOdevices)

VistaDataVision:VDVisadatamanagementsystemwithtoolstostoreandorganizedatacollectedfromavarietyofdatalogger“dat”files.Thesoftwareoffersdifferentvisualization,alarming,reporting,andwebpublishingfeatures.Dataloggerfilesareparsed,imported,andstoredintotheMySQLrelationaldatabasefromwherethedatacanbecustomqueriedandexportedorpublishedonawebserver.NumerousaccesscontroloptionsareavailablesoVDVuserscanhavecustomizedaccesstospecificstationorsensordata.

VistaDataVisionbrochuresandmanualsVistaDataVisionversioncomparisonVistaDataVisionReview(LTER)

YSIEcoNet:EcoNetsoftwareworkswithYSImonitoringinstrumentation.ThesoftwareoffersdeliveryofdatafromthefielddirectlytotheYSIwebserver.Nodesktopapplicationsareusedandalldataare

Page 37: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

storedontheremoteYSIcomputer.Systemuserscanaccessvisualization,reports,alarms,andemailnotificationtoolsdirectlyontheYSIserver.

EcoNetsystemoverviewEmbeddingEcoNetdata

Thefollowingtablesdescribefeaturesofmiddlewarepackagesknowntotheauthorsofthewiki.Thesetablesdonotimplyendorsementorcriticismofanygivenproduct,andmayreflectolderversionsofproductsthancurrentlyexist.

Basic:Thesoftwarehasbuilt-inbutbasicfeaturescomparedtotheoverallmarket.Standard:Thesoftwarehasbuilt-infeaturesthatarestandardwithcomparisontotheoverallmarket.Advanced:Thesoftwarehasbuiltinadvancedfeaturescomparedtotheoverallmarket.Custom:Thesoftwaredoesn'thavebuilt-infeatures,butaprogrammercandevelopthem.None:Thesoftwaredoesn'thavethefeature,anditcannotbecustom-developed.Has:Thesoftwarehasbuiltinfeatures,butthelevelcomparedwiththeoverallmarketisunknowncurrently.Unknown:Thecapacityofthesoftwareisunknown.

Table1.Middlewarebasicfeatures:licensing,cost,inputandexportdataformats,andrequiredlevelofprogrammingexpertise.

Program Licensing Cost Inputdataformat Exportdataformat

Neededprogramming

expertiseAntelopeOrb Proprietary Pay ASCII,Binary ASCII,

Binary Advanced

Aquarius Proprietary Pay Advanced

ArcGIS Proprietary Pay ASCII,shapefiles ASCII,shapefiles Advanced

B3 Opensource Free ASCII ASCII NonetoBasicBigSenseandLtSense Opensource Free Binary CSV,JSON,

TXT,XML Advanced

CosmCUAHSIHIS Opensource Free ASCII XML,

WaterML Standard

DataTurbine Opensource Free ASCII,Binary ASCII,Binary Advanced

EddyPro Proprietary Pay Binary ASCII,Binary Standard

GCEToolbox

MATLABisproprietary,Toolboxisopensource

MATLABispay,Toolboxisfree

ASCII,Binary,database

ASCII,Binary,.mat,database

ToolboxisStandard,MATLABisAdvanced

Hobolink(Onset) Proprietary Free Proprietary ASCII,

Proprietary None

Hoboware(Onset) Proprietary Pay Proprietary ASCII,

Proprietary None

Kepler Opensource Free ASCII,Binary ASCII,Binary BasictoAdvanced

LakeAnalyzer

Proprietary/Opensource Free ASCII ASCII Basic

LoggerNet(Campbell) Proprietary Pay Proprietary

ASCII,database Standard

Page 38: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Nexsen'sTechnology Proprietary Pay Unknown Unknown Unknown

PandasPythonisFree,PandasisFreeandOpenSource

Binary,encoded,np.array,database,markup

Binary,encoded,np.array,database,markup

AdvancedandCustom

Pegasus Unknown Unknown Unknown Unknown Unknown

R Opensource Free ASCII,Binary,database

ASCII,Binary,database

StandardtoAdvanced

SAS Proprietary Pay ASCII,Binary,database

ASCII,Binary,database

StandardtoAdvanced

Taverna Opensource Free Unknown Unknown StandardtoAdvanced

VistaDataVision Proprietary Pay ASCII ASCII Unknown

VizTrails Opensource Free ASCII ASCII BasictoAdvancedWaterMLsupport Unknown Unknown Unknown Unknown Unknown

WISKI Proprietary Pay ASCII ASCII AdvancedYSIEcoNet Proprietary Pay Unknown Unknown Unknown

Table2.Middlewaredatahandlingfeatures:hardwarecommunication,abilitytodoqualityassuranceandcontrol(QA/QC),abilitytostreamdatatoarchives,datavisualization,datatransformationandanalysis,andabilitytogeneratecustomSQLqueriesorotherscripting.

Program Hardwarecommunication

QA/QCcapacity

Capacitytostreamtoarchive

Datatransformationandanalysis

Datavisualization

CustomSQLqueries/Scripting

AntelopeOrb Custom Custom Custom Custom Custom CustomAquarius Has Advanced Advanced Advanced Advanced Advanced

ArcGIS Unknown Advanced Unknown Advanced Advanced StandardtoAdvanced

B3 None Advanced None Has Has NoneBigSenseandLtSense Custom Custom Has Has Unknown Unknown

Cosm Unknown Unknown Unknown Unknown Unknown Unknown

CUAHSIHIS Custom Advanced Advanced(ODM)

Advanced(HydroDesktop,ODMTools,TSA)

Advanced(HydroServerTSA,HydroDesktop,externalprograms)

Advanced(HydroDesktop,ODMTools)

DataTurbine Custom Custom Custom Basic(NEESRDV)

Basic(NEESRDV) Has

DataFrames.jl ThroughCandPythonlibs

Has,stats.jl,numpy

Hasthroughcode.native

HasthroughGadfly,Matplotlib,D3,orWinston

Has Has

EddyPro Unknown Has Unknown Has Has Unknown

GCEMatlab Custom Advanced StandardtoAdvanced

Advanced(withMatlab) Advanced Advanced

Page 39: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Hobolink(Onset)

Basic None Basic Standard None None

Hoboware(Onset) Advanced Has Unknown Advanced Standard None

Kepler Custom Custom Custom Custom Custom CustomLakeAnalyzer None Basic None Has Has None

LoggerNet(Campbell) Advanced Basic Basic Basic None None

Nexsen'sTechnology Has None Basic Basic None None

Pegasus Unknown Unknown Unknown Unknown Unknown Unknown

RCustomwithopen-sourcetech

Custom Custom Advanced/Custom Advanced/Custom Custom

SAS None Custom Custom Advanced Advanced CustomTaverna Unknown Custom Custom Custom Custom CustomVistaDataVision None Basic Standard Standard Standard Basic

VizTrails Unknown Custom Unknown Custom Custom CustomWaterMLsupport Unknown Unknown Unknown Unknown Unknown Unknown

WISKI Has Advanced Advanced Advanced Advanced AdvancedYSIEcoNet Has None Basic Basic None None

Table3.Middlewareotherfeatures:Taskautomation,capacityformulti-tierarchitecture,websitepublishing,streamingthroughwebservices,supportformodeling.

Program Taskautomation

Multi-tierarchitecture Websitepublishing

Streamingthroughweb

service

Supportformodeling

AntelopeOrb Has Standard Custom Custom Unknown

Aquarius None Advanced Advanced Unknown UnknownArcGIS Advanced Unknown Advanced Advanced AdvancedB3 Unknown None None None HasBigSenseandLtSense Has Has Has(viaRESTfulservices) Has Unknown

Cosm Unknown Unknown Unknown Unknown UnknownCUAHSIHIS

Advanced(ODMSDL) Advanced Advanced(HydroServer,

Website,HydroSeek) Advanced Has(throughexternalprograms)

DataTurbine Has Standard Basic Standard NoneEddyPro Has Unknown Unknown Unknown Unknown

GCEMatlab Advanced None Advanced None Advanced(withMatlab)

Hobolink(Onset) Basic None Advanced Has None

Hoboware(Onset) Basic None None None None

Kepler Custom Unknown Custom Custom Advanced/Custom

Page 40: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

LakeAnalyzer

Custom(Matlab)

None None None Basic(linktoGLMmodel)

LoggerNet(Campbell) Standard Basic Basic None None

Nexsen'sTechnology Basic Basic Basic None None

Pegasus Unknown Unknown Unknown Unknown UnknownR Custom None Custom(shinypackage) Custom Advanced/CustomSAS Custom None Custom Custom Advanced/CustomTaverna Custom None Unknown Custom Advanced/CustomVistaDataVision None Standard Advanced None None

VizTrails Custom None Custom Custom Advanced/CustomWaterMLsupport Unknown Unknown Unknown Unknown Unknown

WISKI Advanced Advanced Advanced Advanced HasYSIEcoNet Basic None basic None None

References“OGCWaterMLStandardRecommendedforAdoptionasJointWMO/ISOStandard.”OpenGeospatialConsortium,10Dec.2012.Web.14May2013.

Page 41: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Sensor Data Quality

Discusses different ways sensor data may be

compromised and how to control for it in the data stream.

Page 42: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

ReturntoEnviroSensingClustermainpage

Contents1Contacts2Overview3Introduction4Methods

4.1SensorQualityAssurance(QA)4.2QualityControl(QC)ondatastreams

4.2.1Dataqualifiers(dataflags)4.2.2Dataqualitylevel4.2.3Datacollectioninterval

4.3DataManagement5BestPractices6CaseStudies7References8Resources

ContactsTheprimaryeditorsforthispagemaybecontactedforquestions,comments,orhelpwithcontentadditions.

DonHenshaw–U.S.ForestServiceResearch,PacificNorthwestResearchStation–don.henshawatoregonstate.eduMaryMartin–HubbardBrookLTER,UniversityofNewHampshire–mary.martinatunh.edu

OverviewAnewgenerationofenvironmentalsensorsandrecentmajortechnologicaladvancementsintheacquisitionandreal-timetransmissionofcontinuouslymonitoredenvironmentaldataprovidesamajorchallengeinprovidingqualityassurance(QA)andqualitycontrol(QC)forhigh-throughputdatastreams.Deploymentsofsensornetworksarebecomingincreasinglycommonatenvironmentalresearchlocations,andthereisagrowingneedtoaccesstheselargevolumesofdatainnearreal-time.However,thedirectreleaseofstreamingsensordataraisesthelikelihoodthatincorrectormisleadingdatawillbemadeavailable.Additionally,asresearchapplicationsbegintorelyonreal-timedatastreams,thecontinualandconsistentdeliveryofthisinformationwillbeessential.Thisincreasingaccessanduseofenvironmentalsensordatademandsthedevelopmentofstrategiestoassuredataquality,theimmediateapplicationofqualitycontrolmethods,andadescriptionofanyQA/QCproceduresappliedtothedata.

TraditionalQCsystemstendtooperateonfile-basedcollectionsofenvironmentaldatafromfieldsheets,fieldrecordersorcomputers,ordownloadeddataloggerfiles.Manuallyappliedtoolsandtechniquessuchasgraphicalcomparisonsareusedtoprovidedatavalidation.Documentationistypicallynotwell-organizedandnotdirectlyassociatedwithdatavalues.Theapplicationofthesesystemsmustbalancetheneedforreleasewithoutmonthsoryearsofdelayversusthedeliveryofwell-documented,highqualitydata.However,withincreasingdeploymentofsensornetworks,theseoldersystemsfailtoscaleorkeeppacewithuserneedsassociatedwithhighvolumesofstreamingdata.ComprehensiveandresponsiveQCsystemsareneededthataredesignedtoreducepotentialproblemsandcanmorequicklyproducehighqualitydataandmetadata.MethodsdescribedhereforbuildingaQCsystemwillincludeidentificationof:

preventativemeasurestobetakeninthefieldqualitychecksthatcanbeperformedinnearreal-timenecessarydatamanagementpractices

Page 43: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

IntroductionAteamapproachisnecessarytobuildaQCsystemandmultipleskillsandpersonnelareneeded.TheQCsystemwillbeginwithsystemdesignandpreventativemeasurestakeninthefieldandcontinuethroughdataqualitycheckinganddatapublishing.Aleadscientistwillproposeresearchquestionsanddescribethetypesofdataandnecessaryquality.Expertiseinfieldlogistics,sensorsystemsandwirelesscommunicationswillplayaroleinsitedesignandconstruction.Asensorsystemexpertwillprovideknowledgeofspecificsensorsandprogrammingskillstoestablishqualitycontrolchecking.Fieldtechnicianswithstrongknowledgeoftheoverallscientificgoalsandcommunicationskillscanhelptoarticulateissuesanddiscoversolutions.Adatamanagerwillbeneededtoguidedeliveryandarchivalofdocumenteddataproducts.Communicationamongallpartiesisnecessaryforthemosttimelydeliveryofwell-documentedandhighqualitydata.

AllteammemberswillbeneededtodefineaQCworkflowthatisusefulindescribingproceduresandpersonnelresponsibilitiesasthedataflowsfromfieldsensorstopublisheddatastreams.AQCsystemmustallowforaniterative,qualitymanagementcycletoaccommodatefeedbacktopolicies,procedures,andsystemdesignasdatacollectionscontinueovertime.Asystemwilldependoncommunicationamongteammemberstoassurethatnotedsensordatacollectionandtransportissuesandproblemsareaddressedquicklyanddocumentedinthedatastream.Anactive,well-documentedQCsystemwillhelptoestablishuser-confidenceindataproducts.

Automatedorsemi-automatedQCsystemsareneededthatcanadequatelyreviewandscreensourcedataandstillprovideforitstimelyrelease.Automatedqualitycontrolprocessessuchasrangecheckingcanbeperformedinnearreal-timeandasystemcanassigndataqualifiercodes,orflags,foranysensorvaluewhenproblemsoruncertaintyoccursinthedatastream.However,theseprocessescanoftenonlyindicatepotentialproblemsinthedatastreamthatstillrequiremanualreview.AcomprehensiveQCsystemisonlyachievableasahybridsystemdemandingbothautomatedQCchecksandmanualinterventiontoassurehighestdataquality.

Forthischapterwewilldefinequalityassurance(QA)asthosepreventativeprocessesorstepstakentoreduceproblemsandinaccuraciesinthestreamingdata.Thesewillincludesensornetworkdesign,protocoldevelopmentforroutinemaintenanceandsensorcalibration,andbestpracticeproceduresforfieldactivitiesanddatamanagement.Qualitycontrol(QC)primarilyreferstothetestsprovidedtocheckdataqualityandtheassignmentofdataflagsandothernotationstoqualifyissuesanddescribeproblems.QCsystemreferstothiscompletesetofQA/QCpreventativeandproduct-orientedprocesses.

Methods

SensorQualityAssurance(QA)

Qualityassurance(QA)referstopreventativemeasuresandactivitiesusedtominimizeinaccuraciesinthedata.Forexample,schedulingregularsitevisitsandmaintenanceprocedures,orcontinuouslymonitoringandevaluatingsitesensorbehaviorcanpreventsensorfailuresorleadtoearlydetectionofproblems.Designingnetworkswithredundantsensormeasurementsprovidesanadditionalmeanstoqualitychecksensordataandassurecontinuityofmeasurement.Ofcourse,thetimeandexpensetoconducthigh-levelmaintenanceproceduresorimplementefficientandredundantdesignsmaybelimitedbyprojectbudgets,butmaybewarrantedbytheimportanceofthedata.HerewedescribeQAmeasurescategorizedbydesign,maintenance,andpractices:

1. Designa. Designforreplicatesensors.Co-locatedsensorsindependentofthedataloggerandincludedinthedataflowcanbeusefulchecks.Forexample,checktemperaturemeasurementsmightbemadealongsideaCampbellthermistorwithaHOBOpendant,SDI-12temperaturesensor,oranalogthermocouple.Ideally,threereplicatesensorsareusedsothatsensordriftcanbe

Page 44: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

detected(withtwosensorsitmaynotbeobviouswhichsensorisdrifting).b. Assureanadequatepowersupply.Powerconsiderationsmightincludeaddingalowvoltagecutoff(LVD)topreventlogger“brown-out”,oraddingpoweraccessorieswithswitchedpowersupply(e.g.CSIlogger,IPrelay)toprogrammaticallycontroloptionaldevices(radios,power-cycleloggers).

c. ProtectallinstrumentationandwiringfromUVlight,animals,humandisturbance,etc.suchaswithflexconduitorenclosures.

d. Implementanautomatedalertsystemtowarnaboutpotentialsensornetworkissuesorcertainevents,e.g.,extremestorms.Forexample,automatedalertsmightsignallowbatterypower,indicatesensorcalibrationisneeded,orindicatehighwindsorprecipitation.

e. Addon-sitecamerasorwebcams.Webcamscanbeusedtorecordweatherorsiteconditions,animaldisturbanceorhumanaccess.

2. Maintenancea. Scheduleroutinesensormaintenance.Routinesitevisitsfollowingstandardprotocolscanassurepropermaintenanceactivities.

b. Standardizefieldnotebooks,checksheetsorfieldcomputerapplicationstoleadfieldtechniciansthroughastandardsetofproceduresandassurethatallnecessarytasksareconducted.Thesenotebooksorapplicationscanserveasanentrypointfortechnicalobservationsregardingpotentialproblemsorsensorfailures.

c. Scheduleroutinecalibrationofinstrumentsandsensorsbasedonmanufacturerspecifications.Maintainingadditionalcalibratedsensorsofthesamemake/modelcanallowimmediatereplacementofsensorsremovedforcalibrationtoavoiddataloss.Otherwise,sensorcalibrationscanbescheduledatnon-criticaltimesorstaggeredsuchthatanearbysensorcanbeusedasaproxytofillgaps.

d. Anticipatecommonrepairsandmaintaininventoryreplacementparts.Sensorscanbereplacedbeforefailurewheresensorlifetimesareknownorcanbeestimated.

e. Assureproperinstallationofsensors(correctorientation,cleanwiring,solidconnectionsandmounting,etc.).Protocolsforinstallingnewsensorswillalsoassurethatkeyinformationisloggedregardingasensor’sestablishment(SeeManagementsection).

3. Practicesa. Maintainanappropriatelevelofhumaninspection.Developthecapabilitytoeasilyviewreal-timedataandexamineregularly(daily/weekly).Regularinspectioncanhelpidentifysensorproblemsquicklyandmightallowforfewersitevisitations.Certainproblemssuchasvisibleextremespikes,intermittentvalues,orrepetitivevaluescanbeeasilyviewedinrawdataplots.

b. Spotcheckmeasurementswithareferencesensorcanberoutinelyusedforsomemeasurements,i.e.temperature,snowdepth,etc.toverifytheperformanceofinsitusensors.

c. Aportableinstrumentpackagethatcanberotatedamongsensorsitescanbeusefulinidentifyingproblems.Theportablepackagemightrunalongsideinstalledsensorsoverafixedperiod(dailyorlongercycle)toinspectfordriftingorfailingsensors.Thistypeofco-locationmightbedonetoauditsensorperformanceonanannualorperiodicbasis.

d. Recordthedateandtimeofknowneventsthatmayimpactmeasurements(seeManagementsection).Ideally,thesenotescanbeenteredorcapturedforautomatedaccess.Forexample,sensorsareknowntodemonstratealternativebehaviorduringsitevisitsormaintenanceactivities,andlightortripsensorsmightbeusedinrecordingsensoraccess.

e. RoutinelysynchronizethetimeclockondataloggerswiththepublicNetworkTimeProtocol(NTP)server(http://www.ntp.org/).

f. Provideareferencetimezoneandavoidchangingdataloggertimestampsfordaylightsavingstime.ManywouldarguethebestpracticeistooutputdatainCoordinatedUniversalTime(UTC),whichisparticularlyusefulwhendataspansmultipletimezones.However,mostlocalusersofthedatapreferseeingoutputinlocalstandardtimebecauseitcorrespondstolocalecologicalconditions,i.e.,oceantidesorsolarnoon,andmayeasetroubleshootingorfield-basedchecking.AnotherstrategyistoprovidethelocaloffsetfromUTCwithinthedatastreamtoallowsimpleconversiontoUTC,orallowuserstoquerythedataandchoosewhatevertimezonetheywouldliketoreceivethedatain.ISO8601(http://www.iso.org/iso/home/standards/iso8601.htm)isaninternationalstandardcoveringtheexchangeofdateandtime-relateddataandprovidestimezonesupport.Forexample,2013-09-

Page 45: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

17T07:56:32-0500providestheoffsetfromanESTtimezone,however,lackofsupportinmanyinstrumentsandsoftwarepackagesisadrawbacktoitsuse.Recently,RESTservicesareconstructedtoallowthereturnofdatetimevalueswithanimplicittimezoneoffsetenablingconvenientsharingofdatawithtimestampflexibility.

g. Ensurethatfilesstoredontheloggeraretransmittederror-freetothedatacenterforimport(useerror-correctedprotocolslikeFTP,YmodemandHTTP).Schedulemanualfiledownloadandpost-importchecksifnon-error-correctedprotocolsareusedasaninterimmeasure.

QualityControl(QC)ondatastreams

QualityControlofdatastreamsinvolvesautomatedorsemi-automatedprocesseswherebyvaluesandassociatedtimestampsarecross-checkedagainstpredeterminedstandardsandseparateconcurrently-collecteddatastreams.QCtakesplacepost-collectionduringthestreamingprocessorafterdataisassimilatedintoacentraldatabase.Someprocessescanbeperformedin“nearreal-time”,oratthetimethedatastreamsarebroughtintothedatabase,anddatacanbereleasedas“provisional”afterthisinitialinspectiontosatisfyimmediateuserneeds.Otherprocessesmayrequiresomedelaysuchastrendanalysisforsensordriftdetection.Resultsofthesetestsaretypicallyaccountedforinadataqualifierflagforeachvalue.Manualinspectionandresolutionofsuspectorproblemdataisalsoanecessarystepbeforedataisreleasedwith“provisional”tagsremoved.Revisedorcorrecteddataversionscanbepublishedatalaterdate,anditisimportanttoprovidedocumentationonthetypesofqualitychecksconductedwitheachreleaseofthesedata.

Threecategoriesofautomatedorsemi-automatedQCprocessescanbedescribed:

1. independentevaluation,wherebyasingledatapointischeckedagainstpredeterminedstandards(suchasrangechecks)

2. point-to-pointevaluation,wherebyasingledatapointiscomparedtootherconcurrently-observeddatapoints(suchasreplicatesensors)

3. many-point,ortrendanalysis,wheresometimeframeofobservationsareexaminedstatisticallyoragainstotherdatatrends.Thefirsttwoareessentiallynearreal-timechecks,whereasthethirdcaninvolvetimeframesseveralordersofmagnitudelongerthanthemeasurementinterval.

Nearreal-timeprocessinginvolvesautomatedcheckingofeachdatapointanditsassociateddateandtime.Dataqualifiercodes,ordataflags,willbeassignedbasedonthesechecks.Theseautomatedchecksandflagassignmentsareessentialinprocessingthemassvolumesofdatastreamingfromsensornetworks,butarenotsufficient.Humaninspectionofdataiscriticalandparticularlymightfocusondatapointsthatareflaggedbyanautomatedsystem.ThefollowingterminologycorrespondswithqualitycontroltestslistedinCampbelletal.2013.

Themostcommonandsimplestcheckstoimplement

1. Timestampintegritychecks–ensuresthateachdate-timepairissequential.Withfixedintervaldataitispossibletocross-checktherecordedandexpectedtimestamp.

2. Rangechecks-ensuresthatallvaluesfallwithinestablishedupperandlowerbounds.Boundscanbeestablishedbasedonthespecificsensorlimitations,orcanbebasedonhistoricalseasonalorfinertime-scalerangesdeterminedforthatlocation.Separateflagsmightbeassignedtoqualifyimpossiblevalues(basedonsensorcharacteristics)versusextremevaluesthatareoutsideofthehistoricnormsbutwithinthesensoroperatingrange.

Othercheckscanbeemployedfornearreal-timeorinpost-streamingQC

1. Persistence-checksforrepeated,unchangingvaluesinmeasureswhereconstantchangeisexpected.

2. Spikedetection-checksforsharpincreasesordecreasesfromtheexpectedvalueinashorttimeintervalsuchasaspikeorstepfunction.Thesetestsoftenemploystatisticalmeasuressuchasthestandarddeviationoftheprecedingvaluesindetectingoutliersorspikesthatexceed2-3sigma(standarddeviations)fromwhatisexpected.Analternativealgorithmistochecktoseethatthe

Page 46: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

medianvalueofpointst,t+1andt-1isnotmorethanafixedmagnitudefrompointt.3. Internalconsistency–plausibilitychecksforconsistencybetweenrelatedmeasurementssuchasthatthemaximumvalueisgreaterthantheminimumvalue,orthatsnowdepthisgreaterthanitssnowwaterequivalence.Thesechecksmayalsoexaminevaluesthatarenotpossibleunderknownconditionssuchasincomingsolarradiationrecordedduringnighttime.

4. Spatialconsistency–checksforsensordriftorfailurebasedonintersitecomparisonsofnearbyidenticalsensors.Theintegrationofseveraldatastreamsmaybepossibleinpost-processinganddriftingmaybedetectedbasedonknowncorrelationsorpriorconditioningwithredundantornearbysensors.

Dataqualifiers(dataflags)

TheQCsystemmustbeabletoassignoneormorecodestoeachdatapointbasedontheresultofQCtestsorotheravailableinformation.DataflagsmaybeassignedduringtheinitialQCteststhatareintendedtoguidelocalreviewinidentifyingerroneousorproblematicdata(e.g.,invalidvaluesoutofrangeorbelowdetectionlevel),ormightbeflagsthatindicatesite-specificevents(e.g.,lowbatteryvoltage,anicingorothereventorsitecondition,ornotificationofaduedateforsensorcalibration).Theseinternalflagsmayusearichervocabularyoffine-grainedflagsthanwhatisnecessarytosharepublicly.Reviewinginternalflagsisnecessarytoresolveissuesthatmaybeevidentinthedatabeforethesedataaremadeavailableinfinalpublishedversions.Somesystemsmightemploya“rejected”flagasameansofpreservinganoriginalvaluebutallowcapabilitytowithholdthatvaluefrompublicuse.

Externalflagsprovidedinpublisheddatawilllikelybeamoregeneral,simplersuiteofflagsbettersuitedforpublicconsumption.Multipleinternalflagswouldbemappedintothismoregeneralflagset.Whilemanyvocabulariesareinuse,anexamplesuiteofexternalflagsfollows:

A:AcceptedE:EstimatedM:MissingQ:QuestionableSpecificationofuncertainty

The“Accepted”flagshouldbeassignedtovalueswherenoapparentproblemsarediscovered,buttheQCteststhatwereappliedshouldbedescribed.The“Accepted”flagislikelylesscommonlyusedthansimplyleavingtheflagblank.Iftheblankflagisuseditshouldbeincludedinthelistofflagsanddefined,e.g.,“noQCtestswereapplied”or“norecognizableproblems”or“provisionaldata”.Ablankflagcanbeincludedinanenumeratedlistingofvalidflagsbutmaynotbethebestpracticewithinsomemetadatastandards.A“Provisional”flagisnotlistedherebutmaybeappropriate.Alternatively,“provisional”datamightbeindicatedwithina“qualitylevel”attributeontherecordlevelorfilelevelratherthanassociatedwithanindividualmeasurement(SeeDataQualityLevelsectionbelow).

ExamplesofQualityFlagSets(listedcodesmayonlyrepresentasubsetofeachflagset)

AndrewsLTER WISKI(Univ.ofSaskatchewan) HFRLTER VCRLTER SeaDataNet

A-Accepted 10-Rejected M-Missing Blank-OK 0-noQC

E-Estimated 15-Disregard E-Estimated

Q-Questionable

1-Goodvalue

M-Missing 20-Manuallyedited Q-Questionable M-Missing

2-Probablygood

Q-Questionable 25-Simulated R-RangeError 4-Badvalue

Measurementspecific,e.g.,B-Belowdetection 30-Filled S-Data

Spike6-BelowDetection

Theevaluationofextremevaluesmaybenefitfrom“expertinspection”thatcanbebuiltintotheQC

Page 47: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

system.Historicalrangescanbedevelopedforsiteswithlong-termsensormeasurementsatannual,seasonalorfinertimescales.Forremotesitesthataredatasparsetheserangesmaybeaprimarytoolforascertainingdataquality,and,forexample,aQCsystemmayflagvaluesthatfalloutsideoftwostandarddeviationsoflong-termmeans.Whereothernearbyinsitumeasurementsareavailableorwherenationalsurfacestationnetworksareavailable,qualitychecksmaybeimprovedthroughcomparisonofvalues.Accesstomultipleclimateelementsmayprovidetheabilitytocreaterelationshipsamongstationsandallowspecificationofuncertaintyforallvalues.EvaluationofaQCsystem’sperformanceindetermininguncertaintyorinestimatingvalueswillbeimportantinmakingsystemimprovementsandpotentiallyallowingaretrospectivere-applicationofqualitycontrol(Dalyetal.2005).

Wherespecificationsofuncertaintycannotbedetermined,valuesmaybedeemed“Questionable”byanautomatedsystem.Ultimately,manualevaluationmayberequiredandadecisionmadeastowhetheradatapointcanbereleasedas“Accepted”versusremovingfromthedatastreamandlistingas“Missing”versusleavingthevalueflaggedas“Questionable”.AsDalyetal.2005pointsout,“intheend,thefundamentaldilemmawithnearlyallqualitycontrolisatensionbetweentherelativemeritsandcostsofaccidentallyrejectinggooddata,oraccidentallyacceptingbaddata,andatradeoffisusuallyinvolved”.

Wheredataaremissing,anoptionmightbetofillgapswith“Estimated”data.FromCampbelletal.2013,“fillingthesegapsmayenhancethedata’sfitnessforusebutcanpossiblyleadtomisinterpretationorinappropriateuse,andcanbeacomplexendeavor.Thedecisionaboutwhethertofillgapsandtheselectionofthemethodwithwhichtodosoaresubjectiveanddependonfactorssuchasthelengthofthegap,thelevelofconfidenceintheestimatedvalue,andhowthedataarebeingused”.

Dataqualitylevel

ThelevelofQCtestingappliedtoasetofdatashouldbewell-describedandtransparenttothedatauser.Publishingofdataisindependentofdataquality,andusersneedtobeabletoquicklyidentifyitsqualitylevel,forexample,todiscernwhetherthedataisunchecked,rawdatavs.thoroughlyinspectedandreviewed.GroupssuchasNEONandCUAHSIhaveassignedaqualityleveltodataproductsincludingoriginalrawdata,initiallyinspectedandflaggedrawdata,publishedrawdata,andestimated,gap-filledorothersyntheticproductsinvolvingmodel-basedorscientificinterpretation(Seereferencesindata_quality_level.pdf).Whilethesegroupsdonotnecessarilyagreeontheactuallevelassignment,therearesomegeneralconceptsofqualitylevelthatcanbeagreeduponandarerepresentedhere:

Level0(raw)-Unfiltered,rawdata,withnoQCtestsappliedandnodataqualifiers(flags)applied-Typically,theseareoriginaldatastreamsthatarenotpublishedbutthatshouldbepreserved.Dataqualityflagsarenotassigned.Conversionofrawmeasurementvaluestomoremeaningfulunitsmaybeacceptable,e.g.,thermocoupletableconversionsofmillivoltstodegreesC.

Level1(provisional)-Provisionaldatareleasedinnearreal-timewithinitialQCtestingapplied-PreliminaryQCtestsordatacalibrationareapplied,potentiallyinnearreal-timethroughautomatedscripts.Dataqualifiersareassignedandmaybeforinternaluseintendedtoguidefurtherreviewofthedata(SeeDataqualifierssubsection).Alldataqualifiersshouldbewell-defined.Rangeanddate-timecheckingarecommonlyappliedtothisprovisionallevel.TheQCtestsappliedshouldbewell-described.

Level1(published)-Publisheddatawithadelayedreleaseafterautomatedandmanualreview-QCtestingiscompleteandsuspectdatahasbeeninspectedandflaggedappropriately.Eachvalueisassignedadataqualifierandthesetofflagsmaybeamoresimplesetdevisedforpublicuseofthedata.Impossibleormissingvalueswouldbeassignedanappropriatemissingvaluecodeandadataflagof“Missing”.Datawouldnolongerbeconsideredprovisionalandwouldbeunlikelytochange.

Level2(gap-filled)-Gap-filledorestimateddatainvolvinginterpretation-Thisisqualityenhanceddatawherecarefulattentionhasbeenappliedtoestimateorfillgapsindataortootherwisebuildderiveddatatoaccommodatedatauserneeds,forexampleestimategapsinasensorstreamusinganearbysensor.Asgap-fillingtypicallyinvolvesinterpretationandmayemploymultiplemodelsoralgorithms,otherversionsoflevel2datamaybeusedinpractice.Methodsemployedingap-fillingorderivingdatashouldbewell-described.

Page 48: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Aggregatingdatafromonetime-steptoanother,e.g.,creatingdailysummarydatafrom10minutedata,thatdoesnotinvolveanyinterpretationinthatsimplemeans,maximum,andminimumsaredeterminedwouldnotnecessarilyalterthequalitylevel.Thatis,meandailytemperaturedeterminedfromlevel1(published)datawouldstillretainaqualitylevel1.However,interpretationmaybeinvolvedwhendetermininganappropriatequalifierflagforthedailymean.Forexample,ifsomeofthe10minuteobservationsaremissingatwhatpointdoesthedailymeanalsobecomemissing(e.g.,morethan20%aremissing)orbecomequestionable(e.g.,morethan5%aremissing).ThistypeofprocessingmayyielddailymeanvaluesthatarebestdescribedasLevel2asinterpretationisinvolved.

Datacollectioninterval

Dataloggersofferthecapabilitytoeasilyoutputmeandatavaluesatmultipletimesteps,e.g.,10minutes,hourly,daily.SavingvaluesatmultipletimestepsmaypresentanextracomplicationintheQCprocessasseparatetablesareusuallystoredforeachtimestep.Whenasinglesensormeasurementisreportedatseparatetimesteps,conflictingQCresultsmayoccurifbothstreamsareQC’dindependently.Onestrategytosimplifythisproblemistooutputmostoralldataintheshortestcommontimestepandusepost-processingtostatisticallyaggregatethedataatlongertimesteps.Forexample,asystemmightQCandoutputthe10minutedataandthenaggregatehourlyanddailyvaluesfromthisfinerresolution10minutedatastream.Dataloggersmighttypicallycalculateandoutputdaily(24-hour)datastreams,butaccurateQCmaybeimpossibleastheexactvaluesusedinthisaggregationareunknown,andtheaggregationmaybeonlyrepresentingasubsetofvalues,e.g.,iftherewasapowerdiscontinuitytothelogger.However,theremaybecaseswheretheoutputofdailyvaluesbytheloggerareimportant.Forexample,aninstantaneousmaximumorminimumvaluebasedonasingleloggersamplewouldnotbecapturedthroughthisaggregation,andadailyminimumormaximumbasedona10minuteorhourlymeanoutputmaydiffersignificantlyfromtheinstantaneousvalue.

DataManagement

TimingofQCsystemprocesses

AutomatedQCsystemproceduresprovidethemosttimelyandefficientprocessingofstreamingdata.Theuseofsystemproceduresprovidesconsistentassignmentofdataflagsandremovesmuchofthesubjectivityinherentinmanualassignment.Ideally,theQCsystemwillbeemployedeverytimedataisacquired,e.g.,every10minutes,andsecondarilyoperateonhourlyordailytimeperiods.Morecomprehensivevisualorprogrammaticchecksortheassignmentofuncertaintyusingnearbyorotherrelatedsitesmightoccuratalatertime.Thefrequencyandtimingofamanualorvisualreviewprocesseswilldependonthedataflowatthesite,softwarestack,anddataprocessingcapabilities.Thenecessarytimeframefordatadeliveryofprovisionalversusfullyprocesseddatashouldbeconsidered.

DocumentationoftheQCprocesses

ThedocumentationofQCprocessesshouldidentifythenearreal-timestreamingQCmethodsincludingassumptionsandthresholds,andadditionalalgorithmsorvisualmethodsapplied.IfnoQCisappliedthatshouldbemadeapparent.DescriptionsofdataprocessingandQCworkflowsarealsousefulindescribingdataprovenanceandallworkflowversionsshouldberetained(Seeexampleworkflow).Datameasurementattributesandqualifierflagsshouldbedefined.

Page 49: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Fig.1ExampleofageneralmonitoringdataflowmodelfromNevadaResearchDataCenter,ScottyStrachan,UniversityofNevada,2014

TheapplicationoftheQCtestsemployedoranyalgorithmsappliedtoaggregate,estimateorgap-filldatashouldbedescribedforalldatalevels,anddatalevelscanpotentiallybedefinedinconjunctionwithadatareleasepolicy.Ideally,dataateachlevelshouldbelocallyarchived.Level0rawdatashouldberetainedlocallyinitsoriginal,unmanipulatedstate.Level1(published)orlevel2datamaybethebestcandidatesformoreformalarchiving.Datasetsshouldbetransparentlytaggedwithadataqualitylevelasdataarereleased.

Sensordatadocumentation

Developanduseacommonvocabularyandsyntaxforsensormeasurementattributenamesandfilenamingconventions.Researchorganizationswithmultiplesensorsitesmeasuringcommonsetsofparameterscangreatlyimproveefficiencyandmoreeasilyemployautomatedmethodswhenacommonvocabularyisemployed.Thesenamingconventionsshouldbeplannedfromtheoutsetintodataloggerprogramsandothersoftwareemployedwithinthedataflow.

Dataqualifierflagsprovidedocumentationforeachmeasuredvalueandshouldbeplacedalongsidethevalueasdatafilesareproducedforarchivalstorage.Anadditionalattributeormethodcodemayalsobeaddedtonoteshiftsinmethodorinstrumentationorotherkeychangesincollectionprocedures.Inclusionofamethodcodedirectlywithinthedatafileplaceskeydocumentationclosetothedatavalueandismorevisibletothedatauser.Inlong-termdatastreamswherethequalitylevelmaychangeovertime,e.g.,periodsoftimewheregap-fillingisemployed,adataqualityattributemightbeusedtoassigndataqualityattherecordormeasurementlevel.

BestPracticesReorganizedfrom:Campbellet.al.2013.

SensorQualityAssurance(QA)

MaintainanappropriatelevelofhumaninspectionReplicatesensors,n=3isoptimalSchedulemaintenanceandrepairstominimizedatalossHavereadyaccesstoreplacementpartsRecordthedate,time,andtimezoneofknowneventsthatmayimpactmeasurementsImplementanautomatedalertsystemtowarnaboutpotentialsensornetworkissues

QualityControl(QC)ondatastreams

Page 50: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

EnsurethatdataarecollectedsequentiallyPerformrangechecksonnumericaldataPerformdomainchecksoncategoricaldataPerformslopeandpersistencechecksoncontinuousdataComparedatawithdatafromrelatedsensorsUseflagstoconveyinformationaboutthedataEstimateuncertaintyinthevalue,iffeasibleCorrectdataorfillgapsifitisprudent

Datamanagement

AutomateQA/QCproceduresRetaintheoriginalunmanipulateddataIndicatedataqualitylevelwitheachreleaseofthedataProvidecompletemetadataDocumentallQA/QCproceduresthatwereappliedandindicatedataqualitylevelDocumentalldataprocessing(e.g.,correctionforsensordrift)Retainallversionsofworkflowsandmetadata(dataprovenance).

CaseStudiesWearelookingforcasestudiesthatwilldescribesomecompleteQCsystems,QCprocessingandgeneralsetup(e.g.,numberandtypeofsensors,dataloggers,telemetry,etc.)ExamplesusingGCEToolbox,VistaDataVision,R,etc.wouldbeusefulGeneralworkflowexamplefromNevadaResearchDataCenter

ReferencesCampbell,JL,Rustad,LE,Porter,JH,Taylor,JR,Dereszynski,EW,Shanley,JB,Gries,C,Henshaw,DL,Martin,ME,Sheldon,WM,Boose,ER.2013.Quantityisnothingwithoutquality:AutomatedQA/QCforstreamingsensornetworks.BioScience.63(7):574-585.http://www.treesearch.fs.fed.us/pubs/43678

Taylor,JRandLoescher,HL.2013.Automatedqualitycontrolmethodsforsensordata:anovelobservatoryapproach,Biogeosciences,10,4957-4971doi

Daly,C,Redmond,K,Gibson,W,Doggett,M,Smith,J,Taylor,G,Pasteris,P,Johnson,G.15thAMSConf.onAppliedClimatology,AmericanMeteorologicalSoc.Savannah,GA,June20-23,2005.pdf

ResourcesQCResources

Campbelletal.2013Biosciencehttp://www.treesearch.fs.fed.us/pubs/43678TaylorandLoescher2013.Biogeoscienceshttp://www.biogeosciences.net/10/4957/2013/bg-10-4957-2013.pdfdoiDalyetal.2005.15thAMSConf.onAppliedClimatology,Amer.MeteorologicalSoc.http://ams.confex.com/ams/pdfpapers/94199.pdfCUAHSIhttp://wdc.cuahsi.org/wdc/Docs/ODM1.1DesignSpecifications.pdfNOAASatelliteandInformationService(NationalClimateDataCenter)http://www.ncdc.noaa.gov/oa/climate/ghcn-daily/Carbondioxideinformationanalysiscenterhttp://cdiac.ornl.gov/epubs/ndp/ushcn/daily_doc.htmlSeaDataNethttp://www.seadatanet.org/Standards-Software/Data-Quality-ControlDataQualityAssessment:StatisticalMethodsforPractitionershttp://www.epa.gov/quality/qs-docs/g9s-final.pdf

Page 51: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Flagsetexamples

NOAANationalClimaticDataCenterhttp://www.ncdc.noaa.gov/oa/hofn/coop/coop-flags.htmlCampbelletal.2013Bioscience(Seep.580)http://www.treesearch.fs.fed.us/pubs/43678

Dataqualitylevel

NEONhttp://www.neoninc.org/documents/513CUAHSIhttp://his.cuahsi.org/documents/ODM1.1DesignSpecifications.pdf,pp.19-20,57-58Amerifluxhttp://public.ornl.gov/ameriflux/available.shtmlEarthScienceReferenceHandbookhttp://eospso.gsfc.nasa.gov/sites/default/files/publications/2006ReferenceHandbook.pdf(p.31)ILRSDataproducts:(CODMAC-CommitteeonDataManagement,ArchivingandComputing)http://ilrs.gsfc.nasa.gov/about/reports/9809_attach7b.html

Page 52: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Sensor Data Archiving

Introduces different approaches and repositories for

archiving and publishing data sets of sensor data.

Page 53: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

ReturntoEnviroSensingClustermainpage

Contents1Overview2Introduction3Methods

3.1PublishingofSnapshots3.2PersistentIdentifiers3.3Versioning3.4DataStorageFormats3.5DataStorageStrategies

4BestPractices5CaseStudies

5.11.LTERNIS5.22.KNB5.33.GIWS,UniversityofSaskatchewanWISKIdataarchive

6Resources7References

OverviewArchivingdatasnapshotsandusingappropriatemetadataandpackagingstandardscanincreasethelongevityanddiscoveryofdataimmensely.However,theselocalcurationtechniquesarestillsusceptibletothreatstotheprojectsorinstitutionsthatmaintainthelocalarchive.Peopleincriticaltechnologypositionsthatmaintainarchivesmaychangecareersorretire,projectscanlosefunding,andinstitutionsthatseemsolidcandissolveduetochangesinpoliticalclimate.Forthesereasons,partneringacrossinstitutionstoprovidearchivalservicesofdatacangreatlyincreasetheprobabilitythatdatawillremainaccessiblefordecadesorintothenextcentury.

Inthischapter,wediscusstechniquesandissuesinvolvedwitharchivingdataonamulti-decadalscale.Forsensordata,wepromotetheuseofperiodicdatasnapshots,persistentidentifiers,versioningofdataandmetadata,anddatastorageformatsandstrategiesthatcanincreasethelikelihoodthatdatawillnotonlybeaccessibleintothefuture,butwillalsobeunderstandabletofutureresearchers.

IntroductionAdataarchiveisalocationthathasareasonableassurancethatdataandthecontextualinformationneededtointerpretthedatacanberecoveredandaccessedaftersignificantevents,andultimatelyafterdecades.Dataarchivesshouldbemaintainedthroughbackupstrategiessuchasredundancyandoffsitebackup,inmultiplelocationsandthroughinstitutionalpartnerships.Archivingactivitiesshouldhaveinstitutionalcommitment,andideallycross-institutionalcommitment.Archivesmaybelocallymaintained,maybepartofanationalornetwork-widearchiveinitiative,orboth.Forrawdata,anarchivecanbealocalorregionalfacility,whereasqualitycontrolled,‘published’datashouldbearchivedinacommunity-supportednetworkarchiveandavailableonline.

Environmentalresearchscientistsareinneedofaccessingstreamingdatafromsensornetworksbothprovisionallyinnearreal-time,afterQA/QCprocessing,andinfinalformforlong-termstudies.Withoutappropriatearchivingstrategies,dataareatgreatriskoftotallossovertimeduetoinstitutionalmemoryloss,institutionalfundingloss,naturaldisasters,andotheraccidents.Thesetypicallyincludenear-termaccidentsandlong-termdataentropyduetocareerandlifechangesfortheoriginalinvestigator(s)[Michener1997].Data,andthemethodsusedtogenerateandprocessthem,areofteninsufficientlydocumented,whichmayresultinmisinterpretationofthedataormayrenderthedataunusableinlaterresearch.Likewise,lackofversioncontroloruseofpersistentidentifiersforallfilescausesdownstream

Page 54: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

confusion,andhindersreproduciblescience.

Datamanagersareincreasinglyaskedtobothpreserverawdatastreamsandtoadditionallyprovideautomated,nearreal-timequalitycontrolandaccesstoprovisionaldatafromthesesensors.Typically,theseprovisionaldatastreamsundergofurthervisualandotherqualitycheckingandfinaldatasetsarepublished.Commonly,furtherinterpretationoccurswheresomemissingdataaregap-filledthroughimputationprocedures,orfaultydataareremoved.Thereisastrongneedtoarchivethesedatastreamsandprovidecontinuedaccess,whichultimatelysafeguardstheinvestmentofbothtimeandmoneydedicatedtocollectthedatainthefirstplace.Thereareanumberoforganizational,storage,formatting,anddeliveryissuestoconsider.However,fourmainarchivingstrategiesshouldbeused:creatingwelldocumenteddatasnapshots,assigningunique,persistentidentifiers,maintainingdataandmetadataversioning,andstoringdataintext-basedformats.Thesepractices,describedbelow,willincreasethelongevityandinteroperabilityofthedata,andwillpromotetheirusefulnesstocurrentandfutureresearchers.

Methods

PublishingofSnapshots

Generatingperiodicsnapshotsofnearreal-timesensorstreamsallowsthedatatobestoredanddescribedinadeterministicmanner.Theratethatsnapshotsareproduceddependsontheneedsofthecommunityusingthedata,buttypicallysnapshotfilesareorganizedusinghourly,daily,weekly,monthly,orannualdatasets.Italsodependsonthesamplerateandsamplesize.Producingthousandsoftinydatafiles,oronefilewithgigabytesofdata,woulddecreasetheusefulnessofthedatafromatransferandhandlingperspective.Makeiteasyontheresearchersusingthedata,andsizethesnapshotsappropriately.

Withoutdetaileddocumentationofthecontextualinformationneededtointerpretindividualmeasurements,evenwell-archiveddatawillberenderedunusable.Developmetadatafilestoaccompanythedatausingamachine-readablemetadatastandardappropriatetothecommunityusingthedata.CommonstandardsincludetheISO19115GeographicInformationMetadata[ISO/TC211,2003],theContentStandardforDigitalGeospatialMetadata(CSDGM)[FGDC,1998],theBiologicalProfileoftheCSDGM(FGDC,1999],andtheEcologicalMetadataLanguage[Fegrausetal.,2001].AlsoconsiderdocumentingsensordetaileddeploymentsettingsandprocesseswithSensorML[OGC,2000].

Likewise,snapshotsofdatathatrepresentatime-seriesshouldbedocumentedandpackagedappropriatelysuchthattherelationshipsamongfilesareclear.Manyoftheabovemetadatastandardshavetheirownmeansoflinkingdatawithmetadata,howevertheyareallimplementeddifferently.FederatedarchivingeffortssuchasDataONEhaveadopted‘resourcemaps’[Lagoze,2008]todescriberelationshipsbetweenmetadataanddatafilesinalanguage-agnosticmanner.(SeeDataONEpackagingintheResourcessection,andtheOpenArchivesInitiativeOREprimer).Considerpublishingresourcemapsofyourdataandmetadatarelationshipstoimproveinteroperabilityacrossarchiverepositories.

Oncedatacollectionsaresufficientlydescribed,deliverycanalsobeachallenge.Whileprovidingresolvablelinksdirectlytothemetadataanddatafilesisagoodpractice,scientistsoftenwouldliketobeabletodownloadfullcollections.Providingaservicethatpackagesfilesintoadownloadablezipfileiscommonplace,butrelationshipsbetweendataandmetadatacanbelost.ConsiderusingtheBagItspecification(seeBagItintheResourcessection)[Boyko,2009],whichprovidessimpleadditionstozipfilessuchasamanifestfilethatmaintainsthemachine-readablerelationshipsbetweentheitemsinthecollection,whilestillallowingresearcherstodownloaddatapackagesdirectlytotheirdesktop.

PersistentIdentifiers

Theabovesnapshotarchivingstrategieshingeontheabilitytouniquelyidentifyeachfileorcomponentofapackageinanunambiguousmanner.Filenamescanoftencollide,particularlyacrossunrelatedprojects.So,assigningunique,persistentidentifierstoeachfile,andtheoriginatingsensorstream,isparamounttosuccessfularchiving.Apersistentidentifierisusuallyatext-basedstringthatrepresentsanunchangingsetofbytesstoredonacomputer.Persistentidentifiersshouldbeassignedtosciencedataobjects,science

Page 55: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

metadataobjects,andotherfilesthatassociatethedataandmetadatatogether,suchasresourcemaps.Opaqueidentifierstendtobebestforpersistenceanduniqueness(likeUUIDs),butcanbelessmemorable.SystemssuchastheDigitalObjectIdentifierservice(DOI)andEZIDcanhelpinmaintainingunique,resolvableidentifiers(seeUUIDs,DOIs,andEZIDintheResourcesSection).Eachversionofafileorproductsderivedfromfiles(seeversioningbelow)shouldalsohaveapersistentidentifier.Ifsnapshotsofdataarebeingextendedwithnewdata,anewversionofthedatasetneedstobepublished.Shorteridentifiersarebest,andavoidusingspacesandotherspecialcharactersinidentifierstoincreasecompatibilityinfilesystemsandURLs.Ultimately,theuseofpersistentidentifiersallowsassociatedmetadatatotracktheprovenanceofcleaned,qualityassureddataorotherderivedproducts,andpromotesreproduciblescienceandcitabledata.

Versioning

Datafromsensorstreamsareusuallyconsidered‘provisional’untiltheyhavebeenprocessedforqualitycontrol,andmultipleversionsofthedatamaybegenerated.However,provisionaldataareoftenusedinpublicationsandarecitedassuch.Thatsaid,inordertosupportreproduciblescienceusingsensordata,eachversionshouldbemaintainedwithit’sowncitableidentifier.Overwritingfilesordatabaserecordswithnewvaluesorwithannotatedflagswillultimatelychangetheunderlyingbytes,andeffectivelybreakthe‘persistence’oftheidentifierpointingtothedata.Thisappliestometadataorpackagingversionsaswell,andsocaremustbetakentoplaninversioningwithinyourstoragesystem.Yourversioningstrategiesofrawdatawillbedependentonyoursnapshotstrategies(e.g.appendingtohourlyfiles,thensnapshottingandupdatingmetadatafiles,oralternatively,say,producingdaily,weekly,monthly,orannualpackagesthatincludedatafilesandmetadatafilesforthetimeperiodofcovered).However,bymakingcitableversions,researcherswillbeabletoaccesstheexactbytesthatwereusedinajournalpublication,andpeerreviewofstudiesinvolvingsensordatastreamswillbemorerobustanddeterministic.

DataStorageFormats

Sensordatamaybestoredindifferentstructures,eachwithitsownadvantagesanddisadvantages.Asuiteofvariablesfromonestationandcollectedatthesametemporalresolutionmaybestoredwithinonewidetablewithacolumnforeachvariable,eachtimebeingonerecordofseveralvariables.Alternativesmightbeatableforeachvariableoronetableoftheformatof[time,location,variable,value].Thislattersystemmaybevaluecentricwithmetadataattachedtoeachvalueorseriescentricwithmetadataattachedtoacertaintimeintervalforonevariable(e.g.,atimeseriesofairtemperaturebetweencalibrations).Nomatterhowyouorganizeyourdata,long-term,archivalstoragefileformatsneedtobeconsidered.Inthedigitalage,thousandsoffileformatsexistthatarereadablebycurrentsoftwareapplications.However,someformatswillbemorereadableintothefuturethanothers.Asanexample,MicrosoftExcel1.0files(circa1985),arenotreadablebyMicrosoftExcel2012sincethebinaryformathaschangedovertimeinabackward-incompatiblemanner.Therefore,unlessthesefilesarecontinuallyupdatedyearafteryear,theywillberenderedunusable.Thesameistruefordatabasesystemfiles(.dbf)thatholdtherelationaltablestructuresincommonlyuseddatabasessuchasMicrosoftSQLServer,Oracle,PostgreSQL,andMySQL.Databasefilesmustbeupgradedwitheverynewdatabaseversionsotheydonotbecomeobsolete.Agoodruleofthumbistoarchivedatainformatsthatareubiquitous,andarenottiedtoagivencompany’ssoftware.ArchivedatainASCII(orUTF-8)textfilespreferably,sincethisformatisuniversallyreadableacrossoperatingsystemsandsoftwareapplications.IfASCIIencodingwouldcausemajorincreasesinfilesizesformassivedatasets,considerusingabinaryformatthatiscommunity-supportedsuchasNetCDF.NetCDFisanopenbinaryspecificationdevelopedattheUniversityCooperativeforAtmosphericResearch(UCAR),andprovidesopenprogramminginterfacesinmultiplelanguages(C,Java,Python,etc.)thataresupportedbymanyscientificanalysispackages(Matlab,IDL,R,etc.).Bychoosinganarchivestorageformatthatisn’ttiedtoaspecificvendor,datafileswillbereadableindecadestocomeevenwheninstitutionalsupportformaintainingmorecomplexdatabasesystemsfallsshort.

DataStorageStrategies

Forlocalarchives,themostcommonstoragestrategyistojustdirectlystorefilesinahierarchicalmanner

Page 56: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

onafilesystem.Manydatamanagersuseamixoflocation,instrument,andtime-basedhierarchytostorefilesintofolders(e.g./Data/LakeOneida/CTD01/2013/06/22/file.txt).Thisisaverysimple,reliablestrategy,andmaybeemployedbygroupswithlittleresourceforinstallingandmanagingdatabasesoftware.However,filesystem-basedarchivesmaybedifficulttomanagewhenvolumesarelarge,orthenumberofinstrumentsorvariablesaregrowinganddon’tfitastraighthierarchicalmodel.

Manygroupsuserelationaldatabasestomanagesensordataastheystreamintoasite’sacquisitionsystem.Wellknownvendor-basedsolutionslikeOracleDatabaseandMicrosoftSQLServerareoftenused,aswellasopensourcesolutionssuchMySQLandPostgreSQL.Thesesystemsprovideameansofdataorganizationthatcanpromotegoodqualitycontrolandfastsearchingandsubsettingbasedonmanyfactors,beyondthetypicallocation/instrument/datehierarchydescribedabove.Databasesalsoprovidestandardizedprogramminginterfacesinordertoaccessthestoreddatainstandardwaysacrossmultipleprogramminglanguages.Fromanarchiveperspective,theuseofdatabasesmayhelpinmanagingdatalocally,butshouldbeseenasonecomponentofaworkflowtogetdataintoarchivalformats.

Localdatabasesusedformanagingdatashouldbebacked-upregularly,ideallytoanoffsitelocation,andtheunderlyingbinarydatabasefileformatsshouldregularlybeupgradedtothenewest,supportedversionsofthedatabasesoftware.Ultimately,datastoredindatabasesshouldbeperiodicallysnapshottedandstoredinarchivalfileformats,describedabove,withcompletemetadatadescriptionstoenhancetheirlongevity.

Althoughlocalfilesystemsanddatabasesarethemostpracticalmeansofmanagingdata,theyareoftenatriskofbeingdestroyedorunmaintainedoverdecadalscales.Naturaldisasters,computerfailure,staffturnover,lackofcontinuedprogramfunding,andotherrisksshouldbeaddressedwhendecidinghowtoarchivedataforthelongterm.Oneofthestrategiesforbestdataprotectioniscross-institutioncollaborationsthatprovidestorageservicesfortheirparticipants.Thesesortsofarrangementscanguardagainstinstitutionalorprogramdisollution,lackoffunding,etc.ConsiderpartneringwithcommunitysupportedarchivessuchastheLTERNIS,orfederatedarchiveinitiativessuchasDataONEtoarchivesnapshotsofstreamingsensordata(seebothintheResourceSection).

BestPracticesThefollowinglistofbestpracticesaretakenfromtheaboverecommendations,aswellasadditionalconsiderationswhenarchivingsensor-deriveddata.

Developandmaintainanarchivaldatamanagementplansuchthatpersonnelchangesdon’tcompromiseaccesstoorinterpretationofdataarchives(potentiallythroughUniversityLibraryprograms)

Employasounddatabackupplan.Archiveddatashouldbebackeduptoatleasttwospatiallydifferentlocations,farenoughapartthattheywon’tbeaffectedbythesamedestructiveevents(naturaldisasters,powerorinfrastructureissues).Performdailyincrementalbackupsandweeklycompletebackupsthatmaybereplacedperiodically,andannualbackupsthatwon’tchange.(Crashplan,acronis)

Generateperiodicsnapshotsofnearreal-timesensorstreams(acronis)

Developmetadatafilestoaccompanythedatausingamachine-readablemetadatastandard

Assignpersistentidentifierstosciencedataobjects,sciencemetadataobjects,andotherfilesthatassociatethedataandmetadatatogether

Maintainversionedfileswiththeirowncitableidentifier

PreferablyarchivedatainASCII(orUTF-8)fortextfiles,orcommunitysupportedformatslikeNetCDFforbinaryformat

Page 57: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Archiveallrawdata,butallrawdatadonotnecessarilyneedtobeavailableonline.However,assignapersistentidentifiertoeachrawdatafiletobeabletodocumentprovenanceofthepublished,qualitycontrolleddata.

Partneracrossinstitutionstoprovidearchivalservicestomitigateprogrammaticlosses

PreferablymakedatapubliclyavailablethathaveappropriateQA/QCproceduresapplied.

AssignadifferentpersistentidentifierforpublisheddatasetsofdifferentQClevelsinanarchive.Inthemethodsmetadata,documenttheprovenanceandqualitycontrolproceduresapplied.

Documentcontextualinformationforeachdatapoint.i.e.,inadditiontoassigningaqualityflag,assignamethodsflagwhichdocumentsfieldeventslikecalibrations,smallchanges,sensormaintenance,sensorchangesetc.Includenotesthathandleunusualfieldevents(e.g.,animaldisturbanceetc.)Encodemetadataforsensor-deriveddatausingcommunityandornationallyacceptedstandards.

Ensurethetimezoneforalltimestampsiscaptured.Dataloggerarebeingmanuallysettoacertaintime.Considerdaylightsavings.

Establishthemeaningfulnamingconventionsforyourvariablestakingintoaccountthetypeofobservationthatisarchived,adjectivesdescribingthelocation,instrumenttype,andothernecessaryvariabledeterminants.

Determinetheprecisionforyourobservationvaluesinadvance

Preferablyfollowanamingconventionorcontrolledvocabularyforvariables(SeetheResourcesSection)

Avoidusingdatabasesforarchivalstorage,butusethemformanagementandqualitycontrol.However,ifdatabasesareusedformanagingsensordatathenperiodicsnapshotsintoASCIIoropenbinarydataformatsarerecommended.

Trackchangestodatafileswithinmetadatafilestomaintainanaudittrail

CaseStudies

1.LTERNIS

TheNSFLong-TermEcologicalResearchNetworkInformationSystem(LTERNIS)isthecentraldataarchiveforalldatageneratedbyLTERresearchandrelatedprojects.AlldataincludingsensordataaresubmittedwithmetadataintheEcologicalMetadataLanguage(EML).DataarepubliclyavailablethroughthisportalandthroughDataONE,ofwhichtheLTERNISisamember.Specificapproachestoarchivestreamingsensordataarefollowingthebestpracticerecommendationsgiveninthisdocument:Datasetsaresubmittedassnapshotsintimeanditisuptothesiteinformationmanagerstodecidethelengthoftimeineachsnapshot,i.e.,howfrequentlyanewdatasetissubmitted.Minimallyqualitycontrolleddatasetsaresubmitted,whiletherawdataarearchivedateachsite.AsofthiswritingnostandardsforqualitycontrollevelsordataflagginghavebeenadoptedbytheLTERcommunity.

FeaturesoftheLTERNISinclude:

PublicavailabilityofdataandmetadataCongruencecheck-qualitycheckofhowwellthemetadatadescribethestructureofthedataUseofpersistentidentifiersStrongversioningofmetadataanddatafilesinthesystemMemberNodeofDataONESupportofLTERandrelatedprojectsdatastorageusingaccesscontrolrules

Page 58: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

Replicationofdataandmetadataacrossgeographicallydispersedservers

2.KNB

TheKnowledgeNetworkforBiocomplexity(KNB)isaninternationalnetworkthatfacilitatesecologicalandenvironmentalresearchonbiocomplexity.Forscientists,theKNBisanefficientwaytodiscover,access,interpret,integrateandanalyzecomplexecologicaldatafromahighly-distributedsetoffieldstations,laboratories,researchsites,andindividualresearchers.TheKNBrepositoryhasbeenstoringandservingdataforoveradecade,andstoresover25,000datasets.

FeaturesoftheKNBinclude:

PublicavailabilityMetacat,anopensourcedatamanagementsystemMorpho,anopensource,desktopmetadataeditorSupportforanyXML-basedmetadatalanguage,butoptimizedfortheEcologicalMetadataLanguageUseofpersistentidentifiersStrongversioningofallfilesinthesystemSupportforcross-metadatapackagingusingresourcemapsCross-institutionalpartneringwiththeLTERandDataONESupportforbothpublicandprivatedatastorageusingaccesscontrolrulesReplicationofdataandmetadataacrossgeographicallydispersedserversInternationalparticipation,andsupportformulti-languagemetadatadescriptions

RecentdevelopmentsoftheKNBincludesupportfortheDataONEprogramminginterface(API)inboththeMetacatandMorphosoftwareproducts.ThisAPIpromotesinteroperabilityofarchivalrepositories,andenablesfederatedaccesstoenvironmentaldata.SincetheKNBproductssupportthisopenAPI,anyonecancreatetheirownwebordesktopapplicationsthatareoptimizedfortheirresearchcommunity.

3.GIWS,UniversityofSaskatchewanWISKIdataarchive

GlobalInstituteforWaterSecurity(GIWS)attheUniversityofSaskatchewanisdirectlyinvolvedinthecollectionofthefielddatafromdifferentresearchareasincludingRockyMountains,BorealForest,Prairie,andothers.

Inadditiontothe“in-house”manageddata,GIWSusesexternaldatasetsfromorganizationssuchasEnvironmentCanadaandAlbertaEnvironment.DatamanagementplatformonwhichGIWScurrentlyoperatesistheWaterInformationSystemKisters(WISKI).ThissystemisusedtogetherwithCampbellScientificLoggerNetsoftwareandcustom.NETmodulesinautomatedtasksthathandledatacollection,centralizeddataprocessing,storing,andreporting.Afterprocessing,theenvironmentaldatasetsarepublishedandmadeavailabletospecificgroupsofusersthroughtheKistersWISKIWebProwebinterfaceandKiWISwebservice.Bothapplicationscanquerythecentralizeddatabaseandreturndataintheformatsthatareusedforvisualizationorfurtherprocessinganddisseminationpurposes.

FeaturesoftheGIWSsystemincludepublicavailability,useofpersistentidentifiers,supportforcross-institutionalpartnering,dataaccesscontrolfordifferentgroupsofusers,supportforOGCWaterML2dataformat.[SeeGIWSintheResourcesSection]

ResourcesBagItZipfileformat:https://wiki.ucop.edu/display/Curation/BagIt

DataONE:http://www.dataone.org/what-dataoneandhttp://www.dataone.org/participate

DataONEPackaging:http://mule1.dataone.org/ArchitectureDocs-current/design/DataPackage.html

Page 59: Best practices for sensor networks and sensor data managementwiki.esipfed.org/images/a/ad/Sensor_best_practices_1Dec2014.pdf · This document on best practices for sensor networks

DigitalObjectIdentifier(DOI)System:http://doi.org

EcologicalMetadataLanguage:http://knb.ecoinformatics.org/software/eml/

FGDCMetadataStandards:http://www.fgdc.gov/metadata/geospatial-metadata-standards

GIWS:http://giws.usask.ca/documentation/system/GIWS_WISKI.pdf

ISO19115metadataStandard-GeographicInformationhttp://www.iso.org/iso/home/store/catalogue_ics/catalogue_detail_ics.htm?csnumber=53798

EZIDIdentifierService:http://n2t.net/ezid

LTERNetworkInformationSystem:http://nis.lternet.edu

OpenArchivesIntiativeObjectReuseandExchange(ResourceMaps)Primer:http://www.openarchives.org/ore/1.0/primer.html

UniversallyUniqueIdentifiers(UUID):http://en.wikipedia.org/wiki/Universally_unique_identifier

CFMetadatahttp://cf-convention.github.io/

ReferencesBiologicalDataWorkingGroup,FederalGeographicDataCommitteeandUSGSBiologicalResourcesDivision.1999.CONTENTSTANDARDFORDIGITALGEOSPATIALMETADATA,PART1:BIOLOGICALDATAPROFILE.FGDC-STD-001.1-1999https://www.fgdc.gov/standards/projects/FGDC-standards-projects/metadata/biometadata/biodatap.pdf

Boyko,A.,Kunze,J.,Littman,J.,Madden,L.,Vargas,B.(2009).TheBagItFilePackagingFormat(V0.96).RetrievedMay8,2013,fromhttp://www.ietf.org/id/draft-kunze-bagit-09.txt

Fegraus,E.H.,Andelman,S.,Jones,M.B.,&Schildhauer,M.(2005).Maximizingthevalueofecologicaldatawithstructuredmetadata:anintroductiontoEcologicalMetadataLanguage(EML)andprinciplesformetadatacreation.BulletinoftheEcologicalSocietyofAmerica,86(3),158-168.http://dx.doi.org/10.1890/0012-9623(2005)86%5B158:MTVOED%5D2.0.CO;2

Lagoze,Carl;VandeSompel,Herbert;Nelson,MichaelL.;Warner,Simeon;Sanderson,Robert;Johnston,Pete(2008-04-14),"ObjectRe-Use&Exchange:AResource-CentricApproach",arXiv:0804.2273v1[cs.DL],arXiv:0804.2273

MetadataAdHocWorkingGroup,FederalGeographicDataCommittee.1998.CONTENTSTANDARDFORDIGITALGEOSPATIALMETADATA.FGDC-STD-001-1998.http://www.fgdc.gov/standards/projects/FGDC-standards-projects/metadata/base-metadata/v2_0698.pdf

Michener,WilliamK.,JamesW.Brunt,JohnJ.Helly,ThomasB.Kirchner,andSusanG.Stafford.1997.NONGEOSPATIALMETADATAFORTHEECOLOGICALSCIENCES.EcologicalApplications7:330–342.http://dx.doi.org/10.1890/1051-0761(1997)007[0330:NMFTES]2.0.CO;2

OpenGeospatialConsortium,Inc.(OGC).Url:http://www.opengeospatial.org(2010).