CrowdHydrology: Crowdsourcing Hydrologic Data and Engaging Citizen Scientists

Download CrowdHydrology: Crowdsourcing Hydrologic Data and Engaging Citizen Scientists

Post on 26-Sep-2016




4 download

Embed Size (px)


<ul><li><p>Methods Note/</p><p>CrowdHydrology: Crowdsourcing Hydrologic Dataand Engaging Citizen Scientistsby Christopher S. Lowry1 and Michael N. Fienen2</p><p>AbstractSpatially and temporally distributed measurements of processes, such as baseflow at the watershed scale,</p><p>come at substantial equipment and personnel cost. Research presented here focuses on building a crowdsourceddatabase of inexpensive distributed stream stage measurements. Signs on staff gauges encourage citizen scientists tovoluntarily send hydrologic measurements (e.g., stream stage) via text message to a server that stores and displaysthe data on the web. Based on the crowdsourced stream stage, we evaluate the accuracy of citizen scientistmeasurements and measurement approach. The results show that crowdsourced data collection is a supplementalmethod for collecting hydrologic data and a promising method of public engagement.</p><p>IntroductionAdvancements in technology and instrumentation in</p><p>hydrogeology drive a perceived need for high-resolutionspatially and temporally distributed measurements toquantify complex earth system processes. This increasein instrument resolution often comes at an increased costin materials and personnel to install and manage these net-works. To collect high-resolution data sets, many agenciesand organizations have started to incorporate the use ofcitizen scientists (Graham et al. 2011). While there aresome drawbacks relating to the accuracy of citizen sci-entist data (Engel and Voshell 2002), the potential forbroader impact by engaging citizens and expanding mon-itoring networks is attractive. The expansion of mobilephone technology provides many opportunities to engagecitizen scientists. As a result of these new developments,we developed and tested a method to monitor streamstage using crowdsource-based text messages of water</p><p>1Corresponding author: Department of Geology, University atBuffalo, 411 Cooke Hall, Buffalo, NY 14260; (716) 654-4266; fax:(716) 654-3999;</p><p>2U.S. Geological Survey, Wisconsin Water Science Center,Middleton, Wisconsin.</p><p>Received January 2012, accepted May 2012. 2012, The Author(s)GroundWater2012, National GroundWater Association.doi: 10.1111/j.1745-6584.2012.00956.x</p><p>levels. This network relies on untrained observers sendingwater level measurements at designated gauging staffs viamobile phone text messages to a server maintained by theresearchers. On the basis of these data collected over onefield season, we are able to investigate the accuracy ofthese data and density of field measurements.</p><p>Previous citizen scientist projects have involved clas-sifying organisms/objects with fewer projects focusing onfield measurements. A prime example is the ChristmasBird Count, which has been organized annually since1900 by the Audubon Society (Wersma 2010; AudubonSociety 2011). Others include identifying galaxies/stars(Galaxy Zoo; Land et al. 2008; Citizen Sky); data entryand measurements of fossils (OpenDinosaur), identifyingbirds (eBird; Wood et al. 2011); measuring precipitation(CoCoRaHs; Cifelli et al. 2005; RainLog; Weather Spot-ter Network); collecting images and qualitative streamdata (Creek Watch); and documenting wildlife corridorsby monitoring road kill (Wildlife Crossing). In the hydro-logic sciences, the U.S. Environmental Protection Agency(EPA) runs a volunteer monitoring program for waterquality.</p><p>In projects focusing on field measurements, suchas the EPA monitoring program, maintaining qualitycontrol is accomplished with formal training of citizenscientists. Training may limit the number of observersinvolved and, as a result, reduce the temporal orspatial resolution. Training is necessary for projects</p><p> GROUND WATER 1</p></li><li><p>that require multivariable measurements using calibratedinstruments. In less involved projects, such as distributedmeasurements of precipitation, training may consist of asimple online slide show or video (CoCoRaHs; Cifelliet al. 2005). These single variable projects also requirelimited equipment and time commitment. Quality con-trol issues are present in single variable projects butoutliers may be identified using time-series analysis orother validation.</p><p>Predicting how the larger hydrologic scientific com-munity will respond to the publishing of these datais uncharted territory. In astronomy, crowdsourced dataanalysis has become acceptable with more than 20papers published using the Galaxy Zoo database (; Raddick et al. 2010). In the case of clas-sifying organisms or objects, an expert can later verifythe classification. For crowdsourced hydrologic field mea-surements, experts usually cannot go back and verifya precipitation or water level measurement, which addsuncertainty to hydrologic measurements.</p><p>Crowdsourcing is a viable tool for collecting dis-tributed measurements of stream stage based on boththe ease with which stream stage can be measured andthe ubiquity of mobile phones. Many rural and remotewatersheds in the United States lack hydrologic data.The U.S. Geological Survey (USGS) has an extensivenationwide network of stream gauging stations ( However, budget pressuresimpact the number of gauges, and the network resolutionis typically limited to a single gauging station in a givenwatershed. As a result, assessing the impact of localizedmanagement practices and/or forecast future changes onwater resources at a watershed or sub-watershed scale canbe difficult.</p><p>The research presented here focuses on buildingInternet-based interface to harness the power of citizensto collect hydrologic data. Using mobile phones, citizenscientists can send hydrologic measurements (e.g., streamstage) via text message to a server that stores and displaysthe data on the web. A natural extension to using textmessages is a smartphone application such as that used forCreekWatch ( Text messages havethe advantage of both simplicity and ubiquity. Virtuallyall mobile phones have text message capability. Thedevelopment of a smartphone application adds a layer ofdesign and implementation complexity. These data canbe used by resource managers, researchers, teachers, andoutdoor enthusiasts to monitor trends in stream stage.In some cases, recreation choices can be guided bynearly real-time observations in ungauged basins. Theobjectives of this research are (1) to build an inexpensivecrowdsource-based hydrologic network to monitor streamstage and (2) to investigate the potential benefits andpitfalls for expanding such networks for use within thelarger scientific community.</p><p>MethodsThis section provides a broad overview of the project,</p><p>starting with the staff gauges and progressing to datatransmission and handling. The innovation of this projectis the transmission of these data and distributing mea-surements to the larger community; not the method ofmeasuring stream stage. To test this methodology, a net-work of stream gauging staffs (staff gauges) was installedin Summer 2011 at nine locations within the Buffalo,Cattaraugus, and Wiscoy Creek watersheds in WesternNew York, USA. These locations were chosen on thebasis of popular fishing access points and hiking trails.Each staff gauge included a sign displaying a uniquestation number and a phone number for observers tosend a text message of water level measurements. Textmessages from local observers were used to generate adatabase of stream stages and populate an interactive web-based map ( The web interfaceallowed users to freely view and download both currentand historic data. The modular system allowed additionalobservations to be added to the hydrologic network byinstalling additional staff gauges in a watershed. This sys-tem was limited to stream stage, but the database could beeasily adapted to include temperature, rainfall, and snowdepth.</p><p>SignageThe system relied on untrained observers to send in</p><p>measurements of stream stage. Instructional signage ateach of the gauging locations was the only interactionbetween the scientists and the observers making measure-ments. Each gauging station had two signs: one located atthe top of the gauging staff and a second sign posted onthe shore. The sign on the top of the gauging staff gavethe phone number and station number. For this study, thestation number denoted by the two-letter state identifica-tion code along with a unique four character numeric code(e.g., NY1000). The on-shore sign detailed how to makea measurement and gave a short description of the project(Figure 1).Data Retrieval and Automation</p><p>This section highlights the main aspects of Social.Water, the underlying software for the CrowdHydrol-ogy project. The Social.Water code is available at Social.Water relied solely on open-source programming lan-guages (Python and JavaScript). Text messages werereceived at a GoogleVoice ( phonenumber and forwarded to a Gmail ( Social.Water read and parsed the messages,updated a simple database, and generated hydrographsand comma-separated variable (CSV) text files that wereposted on the website</p><p>To interact with the Gmail account, we used IMAP(Internet Message Access Protocol) through a modulebuilt into Python. The code logged into the Gmail account,checked for new messages, downloaded the body ofselected messages, and logged out of the account. IMAP</p><p>2 C.S. Lowry and M.N. Fienen GROUND WATER</p></li><li><p>CrowdHydrology</p><p>CrowdHydrology is an experiment currentlyconducted by the University at Buffalo Departmentof Geology. Our goal is to develop innovativemethods to collect spatially distributed hydrologicdata throughout Western New York. If you have anyquestions or would like to join our network pleasecontact us at</p><p>Project Information</p><p>4. Researchers, Students, Outdoors people, Resource managers, etc. can then use these data.</p><p>1. Read the water level off the ruler (gaging staff)2. Text the station number and the water level to 716-218-02823. Stream stage is then added to our database at</p><p>How it works</p><p>Station Number: Site Partner:</p><p>Figure 1. Streamside informational sign.</p><p>left the messages on the server, so a backup of thedata was effectively maintained. Using Google Voice, allmessages forwarded to the Gmail account included thetext SMS from and a phone number in the subject line.Messages that did not contain data were avoided by onlyreading messages identified as SMS from.</p><p>New messages were parsed to find the data point andto align the data with the proper gauge. The time of themeasurement was taken from the time stamp of the e-mailmessage. To maintain simplicity of signage, little guidanceregarding format of the text message was provided to thecitizen scientists. In order to enable this level of flexibility,a fuzzy matching algorithm was required to properly alignmessage content with gauges.</p><p>Social.Water used the Python package fuzzywuzzy(Cohen, 2011) to rank the level of agreement betweentwo strings to match identifying text in the messages tothe list of allowable gauge numbers (NY1000, NY1001,. . . , NY1008). The message was then assigned to theclosest match. Heuristic evaluation of matching messagesto the intended stations was conducted. The algorithm wasadjusted until all messages that could be interpreted byscientists were correctly classified by the code.</p><p>The final step was to read the actual measurement.This was done using regular expressions that scannedthe message for floating point values. The result wasa code that interpreted messages similar to the waya human would do. Table 1 shows examples of therange of messages that were successfully interpreted bySocial.Water. Some messages were vague, such as waterlevel 2.4 without indicating the gauge. Neither code nora scientist could interpret such messages.</p><p>Table 1Examples of Actual Messages That WereSuccessfully Interpreted by Social.Water</p><p>Date Station Number Text Message</p><p>October 9, 2011 NY1000 Station NY1000 waterlevel 1.40</p><p>October 11, 2011 NY1000 Station#NY 1000 1.38October 23, 2011 NY1007 1.80 Station NY 1007November 25, 2011 NY1007 By 1007 2.15November 25, 2011 NY1002 By 1002 ny 2.64</p><p>After parsing, a text file was updated and stored onthe server containing time and value. This served as thedatabase. Finally, a small JavaScript-enabled HTML pagewas written using the open-source Dygraph package. Thisgenerated an interactive hydrograph linked to a map onthe main CrowdHydrology web site. The charts werealso viewable on most smartphones. One goal of thisimplementation was to enable citizen scientists to see theresults of measurements that they contributed to in nearreal time on the web. The Social.Water code was run ona web server inside a cron script every 5 min to facilitatethis goal.</p><p>ResultsNine stream gauges were installed in two phases</p><p>in Summer 2011. The initial test site was installed inMay 2011 as a proof of concept. The result was over</p><p> C.S. Lowry and M.N. Fienen GROUND WATER 3</p></li><li><p>Table 2Distribution of Measurements at CrowdHydrology Sites</p><p>Station Number Start Date End Date Number of Measurements</p><p>NY1000 April 19, 2011 December 15, 2011 133NY1001 August 10, 2011 December 15, 2011 6NY1002 August 10, 2011 December 15, 2011 6NY1003 August 10, 2011 December 15, 2011 2NY1004 August 10, 2011 December 15, 2011 3NY1005 August 10, 2011 December 15, 2011 2NY1006 August 10, 2011 December 15, 2011 1NY1007 August 10, 2011 December 15, 2011 5NY1008 August 10, 2011 December 15, 2011 5</p><p>one hundred recorded measurements between May andNovember 2011 (Table 2). Eight additional sites wereinstalled in August 2011 and resulted in a combined totalof 30 measurements (Table 2). An additional 17 messagescollected were missing either the station number or thewater level measurement. A total of 68 unique observers(phone numbers) contributed measurements. Eight-fivepercent of the citizen scientists sent in a single textmessage. The most prolific single contributor sent 60measurements for a single station. At three of the ninestations, two or fewer measurements were recorded; notext messages were collected by observers outside ourresearch group.</p><p>To validate measurement accuracy, a pressure trans-ducer was installed at the NY1000 station from October19 to November 25, 2011 at a frequency of 30 min(Figure 2). This gauging station was chosen because itwas the most popular of the stations. The root mean squareerror (RMSE) between the pressure transducer waterlevels and crowdsourced water level measurements was0.016 feet. These results were based on 19 water levelsmeasurements from 9 observers over the 37-day period.The mean of the errors between the crowdsourced waterlevels and transducer measurements was not significantlydifferent than zero using a Students t-test (p = 0.90),indicating that the crowdsourced data were not biased highor low relative to the transducer data.</p><p>Discussion</p><p>Station PopularityLocation was the mos...</p></li></ul>