characteristics of reprocessed hydrometeorological
Post on 27-Nov-2021
17 Views
Preview:
TRANSCRIPT
Characteristics of Reprocessed Hydrometeorological Automated Data System(HADS) Hourly Precipitation Data
DONGSOO KIM AND BRIAN NELSON
NOAA/NESDIS/NCDC, Asheville, North Carolina
DONG-JUN SEO
NOAA/NWS/Office of Hydrologic Development, Silver Spring, Maryland, and University Corporation for
Atmospheric Research, Boulder, Colorado
(Manuscript received 28 October 2008, in final form 29 April 2009)
ABSTRACT
The Hydrometeorological Automated Data System (HADS) is a real-time data acquisition, processing,
and distribution system operated by the Office of Hydrologic Development (OHD) of NOAA’s National
Weather Service (NWS). The initial reprocessing of HADS data from its original format since its inception in
July 1996 has been completed at NOAA’s National Climatic Data Center (NCDC). The quality of the
reprocessed HADS hourly precipitation data from rain gauges is assessed by two objective metrics: the average
fraction of missing values and the percentage of top-of-the-hour observations for a 3-yr period (2003–05).
Pairwise comparisons between the reprocessed product and the real-time product are made using repre-
sentative samples (about 13%) from the 48 contiguous United States. The monthly average of missing values
varies from 0.5% to 2% in the reprocessed product and from 1.7% to 10.1% in the real-time product. Except
for January 2003, the reprocessed product consistently reduced missing values, by as much as 9.4% in October
2004. The availability of top-of-the-hour observations is about 85% in the reprocessed product, while the real-
time product has top-of-the-hour observations only about 50% of the time. This paper discusses real-time
product quality issues, additional quality assurance algorithms used in the reprocessing environment, and the
design of system-wide performance comparisons. Thus, the benefits to users of reprocessing the HADS data
are the correction of 4-h observation time errors during 1 July–11 August 2005 and the demonstration of
diurnals pattern of precipitation frequencies in regional domains. A Web-based interactive quality assessment
tool for reprocessed HADS hourly precipitation data and access to the data are also presented.
1. Introduction
The Hydrometeorological Automated Data System
(HADS) provides a collection of hydrometeorological
observations from diverse networks that use Geosta-
tionary Operational Environmental Satellite (GOES)
data collection platforms (DCPs) for real-time data trans-
mission. The diverse networks that compose HADS in-
clude the U.S. Geological Survey (USGS), the U.S. Army
Corps of Engineers (USACE) districts, and participants
in the Remote Automated Weather Stations (RAWS)
program hosted by the U.S. Department of Agricul-
ture’s (USDA’s) Forest Service. Data are transmitted
to the HADS program office at the National Weather
Service’s Office of Hydrologic Development (NWS/OHD)
for processing and archiving. In this paper we focus on
one particular class of observations (hourly precipita-
tion) from the HADS dataset and undertake an effort to
enhance and improve it, both spatially and temporally.
This reprocessing effort is driven by the fact that hourly
rain gauge data are needed in order to describe precipi-
tation for finer-scale events, such as diurnal variations of
convective storms, heavy rains that trigger debris flow,
and verifications of model forecasts, to name a few. For
any scientific study, high quality data are necessary. Of-
ten, however, missing values render the record incom-
plete, and therefore the users have to estimate the missing
values. The reprocessing effort allows for the recovery of
certain missing data points and for the rigorous quality
Corresponding author address: Dongsoo Kim, NOAA/NESDIS/
National Climatic Data Center, 151 Patton Ave., Asheville, NC
28801-5001.
E-mail: dongsoo.kim@noaa.gov
OCTOBER 2009 K I M E T A L . 1287
DOI: 10.1175/2009WAF2222227.1
Unauthenticated | Downloaded 11/27/21 12:24 PM UTC
control of the raw data to provide an improved dataset for
use in research and climatic applications.
The purposes of reprocessing the HADS data are
threefold: 1) to enlarge the hourly hydroclimate data-
base for use in various applications such as multisensor
precipitation reanalysis (Nelson et al. 2008); 2) to pro-
vide real-time data to users, such as forecasters at NWS
Weather Forecast Offices (WFOs) and River Forecast
Centers (RFCs), with data quality information for spe-
cific gauge stations; and 3) to provide improved-quality
hourly precipitation data to the user community. Rain
gauge data often come with some measure of ambiguity.
Missing values are a source of much of this ambiguity in
rain gauge datasets. Precipitation data are encoded as
missing when 1) the gauge was not functioning at the
time of a scheduled measurement, 2) there was a dis-
ruption of data transfer at the time of transmission, and
3) there was a temporary failure in the data storage or
product generation processes. In addition, the produc-
tion system may encode the value as missing when the
data failure was assumed by a quality threshold, for ex-
ample, a negative hourly precipitation amount. The is-
sues of missing values in real-time precipitation data
used by the NWS WFOs and RFCs were revisited and
corrective measures applied by reprocessing the original-
format precipitation data and comparing the results with
hourly precipitation products generated in real time.
Near-real-time HADS data are available online for
1 week at the NWS/Office of Hydrologic Development
(OHD) Web site (http://www.nws.noaa.gov/oh/hads/).
Currently, the original-format precipitation data are trans-
ferred to the National Climatic Data Center (NCDC) at
the end of the day. Most of the historical data, collected
since June 1996, are then stored and available for use at
NCDC. Because of the diverse ownership of the net-
works included in HADS, it is difficult to expect uni-
form quality in precipitation measurements and sensor
maintenance. In addition, the locations of the surface
stations are determined by the network owner’s mis-
sion requirements. As a result, the spatial density of the
gauges is highly inhomogeneous and the number of
stations changes over time. On average, about 6200 rain
gauges were available in 2007, while only about 2800
were available in 1996.
The HADS program produces hourly precipitation
data in real time to support operational hydrologic
forecasting at the NWS. For example, the HADS precip-
itation data are used in quantitative precipitation esti-
mation (QPE) such as multisensor precipitation analysis
(Seo and Breidenbach 2002). At least 70% of the hourly
precipitation data used by RFC forecasters are com-
posed of HADS precipitation data. As such, improve-
ments in quality, including reduction of missing values,
contribute directly the to overall improvement of the
QPE product at each RFC.
There are two precipitation-related variables in the
HADS data: cumulative and incremental precipitation
amounts. More than 95% of the gauges have been
reporting cumulative precipitation amounts since the
reset of the value (coded as PC). Less than 5% of the
gauges are reporting incremental precipitation at pre-
specified time intervals (coded as PP). It is simple to
convert PC to PP by subtracting the previous PC value
from the current PC value. When the increment is
60 min, the output measures hourly precipitation and
is usually measured at the top of the hour. If the gauge
reports subhourly PP, the running total of subhourly PP
for 1 h also measures hourly precipitation.
The HADS program produces hourly precipitation
data and makes them available to users. This product is
defined as ‘‘real-time PP,’’ as it is produced in real time.
In the retrospective environment, data are recovered
that would have been dropped in the real-time envi-
ronment. This reprocessed PP output is defined as ‘‘re-
pro PP.’’ In the remainder of the paper, we present the
HADS precipitation data flow to help understand the
staging places of the data and quality control practices.
We discuss the reprocessing steps at NCDC and the
analysis approaches with metrics of the fraction of missing
values and the percentage of top-of-the-hour observa-
tions. We demonstrate the importance of the repro-
cessing by analyzing the diurnal cycle of the precipitation
frequency in a regional domain. Finally, we conclude
with recommendations for future study.
2. Data flow, quality assurance, and control practices
a. Data flow
Figure 1 shows a schematic of the HADS precipita-
tion data and product flow in real time and from the
archive. The HADS program office at OHD collects
data from the DCP owners, produces PP values, and
disseminates the data. In the real-time environment
(solid lines in Fig. 1), both PC and PP are delivered to
users at RFCs, WFOs, and the National Centers for
Environmental Prediction (NCEP). NCEP collects PP
values from both HADS and non-HADS data [e.g.,
Automated Surface Observation System (ASOS) hourly
precipitation] for assimilation and verification purposes
(Lin and Mitchell 2005). Here, a ‘‘real-time PP’’ value is
defined as the product generated in an hourly cycle even
if the station reports subhourly measurements. A his-
torical archive of these values is available from the Na-
tional Center for Atmospheric Research’s (NCAR)
Earth Observing Laboratory (EOL) Web site (http://
data.eol.ucar.edu/codiac/dss/id521.004). In the archival
1288 W E A T H E R A N D F O R E C A S T I N G VOLUME 24
Unauthenticated | Downloaded 11/27/21 12:24 PM UTC
environment, most NWS operational products (texts,
grids, and graphics) are archived at NCDC through the
Service Records Retention System (SRRS). Manually
edited precipitation data created by RFCs and WFOs
are embedded into this data flow. However, not all
RFCs and WFOs report manually edited precipitation
data; hence, the knowledge of the QC process in oper-
ational QPE is not preserved and the product is not
reproducible (e.g., Kursinski and Mullen 2008). Another
archival flow that began in May 2005 is for original-
format PC values to be sent to NCDC. This archival flow
is a part of the reprocessing of HADS precipitation data.
b. Quality control and assurance practices
The quality control and assurance (QC/QA) of HADS
precipitation data were originally designed for real-time
use in order to meet the operational mission of the NWS.
The HADS program staff monitors incoming HADS
data, updates metadata, and isolates obviously prob-
lematic stations. However, the QC/QA of the observed
values is left to the end users.
The operational QC process for the hourly precipi-
tation data at the RFCs follows four levels of QC pro-
cedures as described in Kondragunta and Shrestha
(2006). The first level of the QC process deals with gross
errors caused by instrument malfunction and transmis-
sion and coding–decoding errors due to format and
configuration changes. The second level of the QC pro-
cedure checks for outliers outside of threshold values for
each season and location. The third level uses neigh-
boring gauge data and independent observations for
spatial consistency checks, temporal consistency checks,
and multisensor checks. The last level is left to the ex-
pert judgment of the forecaster. Screening of prob-
lematic data is the most important and time-consuming
duty of the forecasters at RFC (J. Bradberry 2005, per-
sonal communication).
Gauge precipitation data are often used for the veri-
fication of quantitative precipitation forecasts (QPFs)
from numerical weather prediction (NWP) models.
Tollerud et al. (2005) developed a QC system for HADS
precipitation data to verify model-based precipitation
forecasts. In their work, the QC system was used to
screen out questionable gauge stations. Questionable
gauge measurements that violate internal threshold val-
ues in the QC system are considered to be gross errors.
If the gross errors continue to be present, the gauge is
labeled a ‘‘repeat offender.’’ These repeat offenders
are entered into the list of rejected stations. Data from
rejected stations were not used in the rest of the QC
process. Improved verification scores resulted from the
use of the quality-controlled data. The above system was
developed based on real-time PP data served by NCEP,
half of which are not from the top of the hour.
3. Reprocessing
The reprocessing of HADS hourly precipitation data
begins with the decoding of original-format HADS data
at full resolution as soon as OHD pushes them to NCDC
at the close of the day. The decoded cumulative pre-
cipitation data are checked for temporal inconsistencies
to recover missing values. Then, the detection and cor-
rection of spikes and noise in the hourly data complete
the reprocessing step. At the beginning of a new month,
we repeat the procedure by double-checking the data
inventory and the metadata of the previous month.
a. Data preparation
Each month’s HADS data were parsed for two
precipitation-related variables, PC and PP, using the
NWS’s Standardized Hydrometeorological Exchange For-
mat (SHEF) decoding package (NWS 2002). In this pro-
cess, illegal characters embedded in the SHEF-encoded
HADS data were removed. Occasionally, a misplaced
digit in the SHEF text caused a decoding failure. In these
cases, the misplaced location of the digit was manually
corrected and the decoding step rerun. All decoded PC
values were saved at reported intervals along with sim-
plified metadata that include the following fields: station
name, network owner, latitude, longitude, and measure-
ment interval. In this way, a metadata list was created for
FIG. 1. A schematic diagram of HADS real-time and archival
product flows. A real-time precipitation product begins at NWS/
OHD and is delivered to end users at an RFC or WFO. This prod-
uct is also stored at NCEP and NCAR for other applications. The
HADS program pushes original-format HADS data to NCDC once
a day, where it is then reprocessed. Some RFCs report manually
edited precipitation data, and they are also archived at NCDC
through the SRRS.
OCTOBER 2009 K I M E T A L . 1289
Unauthenticated | Downloaded 11/27/21 12:24 PM UTC
each month that excludes stations that do not report
precipitation. The inhomogeneity of the network pro-
viders means that not all measurements are reported at
the same temporal interval. For example, some net-
works report measurements at the 5-, 15-, and 30-min, as
well as hourly, intervals. The subhourly intervals pro-
vide an easy way to report hourly measurements at the
top of the hour. However, some stations report only
hourly intervals, which represent off the top-of-the-hour
accumulations (e.g., 15 min past the hour, 30 min past
the hour) causing misrepresentations of the observa-
tions in hourly precipitation data. We urge caution when
using these off-the-top-of-the-hour measurements, and
in this paper we separate these off-the-top-of-the-hour
measurements from the top-of-the-hour measurements in
all analyses. A resulting indicator file shows if the hourly
PP is from the top of the hour or off the top of the hour.
The real-time PP process is set up to provide the latest
real-time data to its users. This process, however, does
not ensure that the hourly PP is the top-of-the-hour
measurement. The issue of off-the-top-of-the hour PP
data arises in retrospective hourly analyses and can be
detrimental to specific applications such as hydrologic
forecasting or multisensor quantitative precipitation es-
timation. We have found that some Remote Automated
Weather Station (RAWS) gauges were measuring at off
the top of the hour even though a majority of the RAWS
gauges were measuring on the top of the hour. This is not
a comprehensive picture, as other gauges from other
networks report off the top of the hour too.
b. Restoration of missing values
The most frequently observed quality problem was
that of missing values during nonprecipitating events.
Nonprecipitating events are easily recognized as con-
stant PC values before and after a period of missing
values. During the conversion of PC to PP, strings of
missing values were checked. If both PC values before
and after the missing period were identical, the missing
values were replaced with the same PC value, which
resulted in a zero PP value. The missing period was not
extended any longer than 24 h for fear of stuck gauges. If
PC values are different, precipitation is assumed, and
values are left as missing even if the difference is as small
as 0.25 mm (0.01 in.). Then, observation times are clas-
sified into 15-min bins to assure that the derived PP is on
the top of the hour. The output of this step is defined as
‘‘baseline PP’’ to distinguish it from real-time PP.
c. Spikes and noise control
Spikes and noise are nonphysical events. They are
caused by many situations, but the two most common
are a lack of system maintenance and exposure to a se-
vere environment. The DCP system includes gauge in-
struments as well as a datalogger and a transmitter. A
malfunction of any or all of these components can cause
errors of this kind. The HADS metadata do not include
gauge type and system information and, therefore,
controlling spikes and noise requires detection of such
errors in the time series of baseline PP. Such problems
were detected by analyzing baseline PP values for reg-
ular patterns of negative and positive values of equal
size at certain observation times. Then, nonnegativity
constraints were imposed on the PP time series. The
application of the spikes and noise control algorithm
outputs reprocessed hourly precipitation (repro PP).
Figure 2 exemplifies noise in PC values during 20–26
May 2006 at the gauge station in Hungry Horse, Mon-
tana. No rain during the period from 0000 UTC 21 May
through 0500 UTC 25 May 2006 should display a flat line
in its PC values, but there are wiggles in the PC values.
The straightforward derivation to a PP value results in a
sequence of many 20.01 and 10.01 values. Such noise
has existed since the beginning of our archival record
and covered the period October 1997–March 2008.
Clusters of stations of noisy PC values were found in the
northwestern and northeastern United States.
In summary, daily reprocessing steps involve the
following:
d decoding of the SHEF-format PC variable in full fre-
quency, and the creation of metadata;d generation of the top-of-the-hour baseline PP with
recovery of some missing values; andd generation of the repro PP by controlling some spikes
and noise in the baseline PP.
In the first day of the month, the previous month’s HADS
data are reprocessed to update the monthly metadata
and compute each station’s monthly quality flag.
FIG. 2. The time series of accumulated precipitation at the
Hungry Horse, MT (HGHM8), gauge station during a 7-day period
from 0000 UTC 20 May through 2100 UTC 26 May 2006. Apparent
small perturbations make true rain events difficult to detect. Such
noise has existed since the beginning of the archive (October 1997).
1290 W E A T H E R A N D F O R E C A S T I N G VOLUME 24
Unauthenticated | Downloaded 11/27/21 12:24 PM UTC
4. Assessment of reprocessed HADS
a. Comparison with real-time PP values
Real-time PP data generated by the HADS program
were retrieved from the NCAR EOL site. The two
metrics used for the comparison were the percentage of
missing values and the percentage of top-of-the-hour
measurements during each month of 2003–05. To man-
age the high volume of data, every seventh station from
an alphabetical list of all stations in each of the 48 con-
tiguous United States was subsampled for this assess-
ment. Additionally, stations with more than 7 days of
missing values in either repro PP or real-time PP were
removed. The HADS program was unable to deliver
SHEF-encoded historical HADS data to NCDC for the
months of November 2003 and January 2004. December
2003 contained too many missing values in the real-time
PP to allow for a fair comparison. Figure 3 shows the
distribution of subsampled HADS stations during Sep-
tember 2005. The spatial inhomogeneity is not caused by
the subsampling process, but by the network design.
b. Comparison with COOP daily precipitation
For a detailed comparisons of repro PP and real-time
PP, a regional domain (North and South Carolina), dur-
ing the warm season (April–September), was selected.
In this domain, both repro PP and real-time PP were
compared with Cooperative Observer Network (COOP)
daily precipitation data (NCDC 2003). Figure 4 shows
spatial distributions of HADS and COOP stations, and
the average nearest distance between HADS and COOP
stations is about 11 km. Each time series of HADS hourly
precipitation was summed up according to the COOP’s
reported observation time (at the top-of-the-hour) for
24 h. From this process, quality metrics were computed
for two daily time series, HADS (repro PP and real-time
PP) and COOP, for every month. Any HADS–COOP
time series pairs were removed if the ratio of the two was
greater than 3 or less than 1/3 for fear of gross error in the
COOP and/or HADS data. Out of 2408 pairs, 344 were
removed from this gross error check. If a missing value
was present in the daily COOP data, then the next-
nearest COOP station data (within 50 km) were used.
The differences in the monthly totals between repro PP
and real-time PP were defined as the gain, and the
HADS-to-COOP ratio of the monthly totals was referred
to as the bias ratio, which is a commonly used measure in
QPE. As these statistics are based on the monthly totals,
we excluded the missing values from the calculation of
the monthly accumulation. The gain, bias ratio, and per-
centage of missing values are the three quality metrics
used in the detailed comparison.
c. Patterns of missing values and their implications
In general, the rain gauge or electronics malfunctions
at the time of measurement and/or during data trans-
mission caused data to be unavailable at the specified
observation time. On the other hand, the data provider
deletes observed values that fail quality criteria at the
processing level. The two causes must be differentiated
so that the users are in control of correcting suspected
data. We illustrate two examples: HADS station LLDN7
in July 2003 and MCKN7 in August 2003. The original
data was reported at 15-min frequencies, so that hourly
data on the top of the hour are available. Table 1 shows
15-min decoded PC values, real-time PP values, and
reprocessed PP values at the LLDN7 on 1 July 2003.
During the 5-h period, obvious measurement errors oc-
curred. The reprocessed HADS data restored them
rather than encoding them as missing values. Often-
times, such gross errors help in the diagnosis of the du-
ration of a disturbance. Table 2 is an example of station
ROKN7, which incorrectly set to default zero values
instead of encoding the suspect data as missing values.
FIG. 3. Distribution of subsampled HADS stations during Sep-
tember 2005. The subsampling was made from every seventh sta-
tion selected from an alphabetical list of all stations in the CONUS.
At least one station must be present in each state and stations with
more than seven days (168 h) of missing values are deleted.
FIG. 4. Distribution of all HADS stations available in NC and SC
(solid dots) and COOP daily rain gauge stations (open circles).
OCTOBER 2009 K I M E T A L . 1291
Unauthenticated | Downloaded 11/27/21 12:24 PM UTC
Even though the repro PP identified the pattern of spikes
and corrected them, false zero PC values had appeared
as early as October 2001. A history of station quality
should be helpful to users in determining observation
validity, and in the refinement of the QC algorithm.
The occurrence of missing values is hard to charac-
terize when a gauge instrument malfunctions, but we
have observed that a higher frequency of missing values
in real-time PP may be attributed to the latency of the
data ingestion process to the processing environment
at the HADS program office. The recovery of missing
values is possible by reprocessing data from the original
SHEF-formatted archive. We have analyzed the diurnal
cycle of the precipitation frequency (e.g., Dai 1999), one
of the hydroclimate variables, for three warm seasons
in North and South Carolina. A full-blown analysis of
the hydroclimate variables is beyond the scope of this
paper.
5. Results
Direct comparisons between repro PP and real-time
PP are shown in Fig. 5. The monthly average of the
fractional missing values varies from 0.5% to 2% in re-
pro PP, and from 1.7% to 10.1% in real-time PP. Except
for January 2003, repro PP consistently reduced the
missing values, by as much as 9.4% in October 2004.
Overall, the average missing value in repro PP is about
1.0%, which is equivalent to seven missed observations
in 1 month. The improvement in the fractional missing
values from real-time PP to repro PP is possible only
through reprocessing. The fractional percentage of miss-
ing values in repro PP reflects the rate of unrecoverable
missing values due to malfunctions by the gauge and in
data transmission. The top-of the-hour observations are
also important when comparing QPE data from other
platforms such as radar. On average, the top-of-the-hour
observations are available for about 85% of the times
in repro PP, while in real-time PP they are available for
about 50% of the times. The reason for the higher rate
in the off the-top-of-the-hour observations in real-time
PP is because the HADS program processes the latest
available observations to support real-time hydrologic
forecasting. The real-time focus means that the HADS
data processing produces the hourly estimates as they
become available. Thus, many non-top-of-the-hour data
in real-time PP are transmitted to the users. The RFC, as
a user, applies a narrow time window around the data;
62 min on PP values and 610 min on PC values from the
top of the hour (J. Bradberry 2008, personal communi-
cation). Practically, half of the real-time PP data will be
discarded in the retrospective production of MPE. An
advantage of reanalysis is that many more top-of-the-
hour values are available (Nelson et al. 2008).
TABLE 1. Decoded HADS data, real-time PP, and repro PP for
station LLDN7 on 1 Jul 2003. The real-time PP withheld values of
11.92 and 21.60 for having failed the QC check. We denoted these
values as NA.
Observation time
(UTC, 1 Jul)
Decoded PC
(in.)
Real-time PP
(in.)
Repro PP
(in.)
0500 17.92 0.00 0.00
0515 17.92
0530 17.92
0545 17.99
0600 18.13 0.21 0.21
0615 18.22
0630 19.85
0645 20.53
0700 20.88 2.75 2.75
0715 20.92
0730 23.06
0745 23.67
0800 32.80 NA 11.92
0815 36.99
0830 41.02
0845 46.92
0900 54.40 NA 21.60
0915 55.42
0930 56.56
0945 58.65
1000 58.66 4.26 4.26
TABLE 2. Decoded HADS data, real-time PP, and repro PP for
station ROKN7 on 7 Jun 2004. The real-time PP withheld values of
20.71, but the 0.71 values that survived as legitimate.
Observation time
(UTC, 7 Jun)
Decoded PC
(in.)
Real-time PP
(in.)
Repro PP
(in.)
0345 0.71
0400 0.00 NA 0.00
0415 0.71
0430 0.71
0445 0.71
0500 0.71 0.71 0.00
0515 0.71
0530 0.71
0545 0.71
0600 0.00 NA 0.00
0615 0.71
0630 0.71
0645 0.71
0700 0.71 0.71 0.00
0715 0.71
0730 0.71
0745 0.71
0800 0.00 NA 0.00
0815 0.71
0830 0.71
0845 0.71
0900 0.71 0.71 0.00
1292 W E A T H E R A N D F O R E C A S T I N G VOLUME 24
Unauthenticated | Downloaded 11/27/21 12:24 PM UTC
Figure 6 shows the bias ratio results for both repro PP
and real-time PP for the warm seasons (April–September)
of 2003–05 in the Carolinas. A bias ratio close to unity
indicates close agreement with the COOP data in the
monthly total. The median values of repro PP are closer
to unity than those of the real-time PP. Figure 7 is the
empirical probability function of the gain (repro PP 2
real-time PP) for all three warm seasons. The function
is trimmed between 210 and 111 mm, with a 0.5-mm
interval. The distribution shows a positive skewness,
namely, repro PP recovered observation values that real-
time PP missed. The mean value of the highest probability
bin (0.0–0.5) was 0.254 mm. The dashed line fitted the
probability density function whose peak is at 0.18 mm.
Figure 8a shows the frequencies of the missing values
in the daily cycle. The missing values in repro PP show a
uniform distribution throughout the day over the 3-yr
period, but those of the real-time PP display certain times
of increased missing values. A disturbing feature of the
real-time PP is the sharp increase in missing values dur-
ing 1800–2300 UTC (1300–1800 local time) during 2004
when warm-season convective rains were active. Figure 8b
shows a sharp drop in rain events in real-time PP against
repro PP at 2100 UTC during 2004. Note that the in-
creased number of missing values during 1800–2300
UTC causes a misinterpretation of the diurnal precipi-
tation pattern. The secondary maximum rain events
during 1200–1500 UTC during 2004 are attributed to the
remnants of Hurricanes Charley, Florence, Ivan, and
Jeanne, which passed through the region in the month
of September. The pattern of the shift during 2005 was a
result of the time reference error in real-time PP. The
4-h shift in real-time PP lasted from 1 July through
11 August 2005.
The results in this section have potentially large im-
plications for the various applications and analyses. For
example, the recovery of the missing values will provide a
better dataset for studies of finescale climate signals such
as for the diurnal pattern of precipitation. Figure 7 shows
that the recovery of the missing values can provide a
dataset that shows a more representative diurnal pattern
of precipitation. In addition, the recovery of the no-rain
events from missing values has implications for direct
comparisons of the hourly rain gauge measurements to
other rainfall measurements such as those from radar
and satellite. Finally, the identification of both the top-of-
the-hour and off-the-top-of-the hour values in the hourly
precipitation data can have a significant impact in specific
applications such as multisensor precipitation estimation
and the modeling of hydrologic processes at fine scales.
6. Conclusions and future researchrecommendations
The retrospective reprocessing of HADS hourly pre-
cipitation data has reduced the average number of frac-
tional missing values from 5% in the real-time product
down to 1% during the assessment period 2003–05 in the
FIG. 5. Two quality metrics comparing repro PP (dark bars) and
real-time PP (gray bars) during 2003–05 for the CONUS. (a)
Fractional missing values (the smaller the better). (b) Percentage
of top-of-the-hour observations (the larger the fraction is, the
better the time representation).
FIG. 6. Box plots of the bias ratio (monthly total precipitation
comparing HADS to COOP) for the warm seasons for 2003–05.
Median values of repro PP (in the dark color box) are closer to
unity than those of real-time PP.
OCTOBER 2009 K I M E T A L . 1293
Unauthenticated | Downloaded 11/27/21 12:24 PM UTC
conterminous U.S. (CONUS) domain. This is equivalent
to a recovery of 29 h of missing values per month. The
missing values in the reprocessed product are uniformly
distributed across all hours of the day while the real-
time product displayed a diurnal pattern. In addition,
the reprocessed product improved the availability of the
top-of-the-hour observations from 50% in the real-time
product to 85%. The improved availability of the top-of-
the-hour observations significantly increases the value
of the hourly precipitation data in finescale applications,
for example, data fusion with other high-frequency QPE
methods from radars and satellites. The reprocessed
HADS data are expected to be used as an input source
to the Climate Prediction Center’s extended-period
gridded observations for the detection and diagnostics of
precipitation variations and long-term changes (Higgins
et al. 1996). Currently, reprocessed HADS hourly data
are available from NCDC in a 1-day-delayed mode (see
the appendix).
For future research, we offer the following recom-
mendations:
d Preservation of original data is absolutely required in
order to diagnose quality problems. Original SHEF-
formatted HADS data made it possible not only to
improve the quality of the data, but also to determine
the origins of quality problems in the hourly precipi-
tation product.d A single repository of gauge quality information is
necessary in order to improve the quality of the pre-
cipitation data. Many RFCs save manual gauge QC
results for their service area, but do not share it with
other communities, and some network owners apply
extra QC measures unknown to other users. The
gauge quality Web page can serve as a common tool
for both end users and network operators.d Gauge metadata must be completed in order to assess
quality issues. The metadata must include not only
geospatial information, but instrument type and main-
tenance records, in order to understand the history of
the quality problems.d Reprocessing must utilize product and algorithm ver-
sion control to allow the well-documented transitions
to newer techniques.
FIG. 7. Empirical probability function of the gain (repro PP 2
real-time PP) for all three warm seasons. The function is trimmed
between 210 and 111 mm with 0.5-mm class intervals. The dis-
tribution shows a skewness toward positive values; namely, repro
PP recovered observation values that real-time PP missed. The
mean value of the highest probability bin (0.0 to 0.5) was 0.254 mm.
The dashed line shows the fitted probability density function with a
peak value of 0.18 mm.
FIG. 8. (a) Diurnal patterns of frequencies in missing values
during warm seasons in the NC–SC domain. Solid circles connected
with dashed lines are taken from real-time PP; open circles with
solid lines are taken from repro PP. Real-time PP shows peaks of
missing values at certain hours of the day, while repro PP reflects
more of a uniform distribution in time. The peaks of missing values
in 2004 are from May 2004. (b) As in (a) but for precipitation
frequencies. Positive PP values are counted as rain events.
1294 W E A T H E R A N D F O R E C A S T I N G VOLUME 24
Unauthenticated | Downloaded 11/27/21 12:24 PM UTC
Acknowledgments. The authors thank Lawrence
Cedrone and the entire NWS/OHD HADS Program
staff who have always been responsive and corrected
problematic HADS gauge reports. The authors ac-
knowledge Arthur Fotos for programming support of
the reprocessed HADS Data Web site. The authors
thank Anne Markel, Tom Peterson, Ed Kearns, and
Xuangang Yin of NCDC for their careful review and
three anonymous reviewers for many suggestions.
APPENDIX
Reprocessed HADS Hourly Precipitation Web Site
For the first time since the inception of the HADS
program, the hourly precipitation data in HADS have
been reprocessed. Reprocessing HADS data has im-
proved the data quality by recovering many missing
values and by choosing top-of-the-hour observations
when subhourly data were available. Currently, version
1.0 HADS-reprocessed PP products are available for
further applications. There were extended periods of
missing values when the retrieval of original-format
HADS data from OHD’s storage system failed, for ex-
ample, December 1996, January 1997, August 1997,
January 1998, June 1998, May 1999, January–April
2000, July–September 2000, December 2000, January–
September 2001, November 2003, and January 2004.
As of January 2008, the initial version of the repro PP
data has been populated on the Web so that users can
assess the quality and download them (http://www.ncdc.
noaa.gov/hads/). The first Web site page guides the user
to enter the month/year and click on the desired U.S.
state. On the next page, the user can choose the desired
HADS station from a map or enter the five-letter station
name, which leads to a time series page.
a. Time series page
The lower two panels on the Web page display rela-
tive locations of neighboring HADS stations (lower-left
panel) and the relative locations of neighboring daily
COOP stations within a 18 3 18 box from the target
HADS station. The user can view the neighboring sta-
tion’s time series by clicking on the HADS location,
where data can be viewed and/or downloaded.
Monthly statistics of HADS–COOP pair data are
viewable by clicking ‘‘View Data’’ below the panel of
neighboring COOP stations. The header displays the
HADS station name, year, month, latitude, longitude,
and number of collocated COOP stations. The 14 col-
umns of each pair are described in Table A1.
b. Mass analysis page
An extensive user interface page can be found by
clicking on the ‘‘Mass Analysis’’ link on the time series
page. This page overlays accumulated precipitation with
neighboring HADS stations using different colors for up
to four stations. The effects of missing values (marked
with black dots), variability of rain events as a function
of distance and direction, and gross errors can be easily
understood.
c. Storm period page
Users can examine storm periods by clicking the
‘‘Storm Period’’ link on the time series page, and
selecting he desired storm period by entering the start
and end times. This page displays time series of target
stations as well as storm totals for all available neigh-
boring HADS stations within a 18 3 18 box.
TABLE A1. Description of columns used in the monthly statistics of HADS and COOP.
Column Description
1 COOP station ID
2 Conversion factor from UTC to local standard time (LST) (add factor to UTC to convert to LST)
3 COOP observation time in LST
4 No. of missing values in daily COOP
5 Monthly sum of daily COOP precipitation data
6 No. of missing values in hourly HADS (29 when COOP has a missing day)
7 Monthly sum of hourly HADS after shifting from UTC to COOP LST (299 when COOP has a missing day)
8 No. of cases that were entered into the statistical computation (namely, days either COOP or aggregated HADS
reported rain .0.01 in.)
9 Mean differences with degrees of freedom in column 8 (in.)
10 Root-mean-squared differences (in.)
11 Ratio of two monthly sums (columns 7 and 5), also called bias ratio
12 Correlation coefficient between daily COOP and aggregated HADS values with the degree of freedom in column 8
(both no-rain cases are not entered here)
13 Distance to COOP from HADS (8)
14 Relative angular direction to COOP from HADS (8)
OCTOBER 2009 K I M E T A L . 1295
Unauthenticated | Downloaded 11/27/21 12:24 PM UTC
The Web page is considered experimental until the
station quality history and the rescue of missing values
are completed. After that process has been completed,
initial versions of the reprocessed HADS hourly pre-
cipitation data are available (and at higher quality than
the real-time data).
REFERENCES
Dai, A., 1999: Recent changes in the diurnal cycle of precip-
itation over the United States. Geophys. Res. Lett., 26,341–344.
Higgins, R. W., J. E. Janowiak, and Y.-P. Yao, 1996: A gridded
hourly precipitation data base for the United States (1963–
1993). NCEP/Climate Prediction Center ATLAS 1, 47 pp.
Kondragunta, C., and K. Shrestha, 2006: Automated real-time
operational rain gauge quality controls in NWS hydrologic
operations. Preprints, 20th Conf. on Hydrology, Atlanta, GA,
Amer. Meteor. Soc., P2.4. [Available online at http://ams.
confex.com/ams/pdfpapers/102834.pdf.]
Kursinski, A. L., and S. L. Mullen, 2008: Spatiotemporal variability
of hourly precipitation over the eastern contiguous Unites
States from stage IV multisensor analyses. J. Hydrometeor., 9,
3–21.
Lin, Y., and K. E. Mitchell, 2005: The NCEP stage II/IV hourly
precipitation analyses: Development and applications. Pre-
prints, 19th Conf. on Hydrology, San Diego, CA, Amer. Me-
teor. Soc., 1.2. [Available online at http://ams.confex.com/ams/
pdfpapers/83847.pdf.]
NCDC, cited 2003: Data documentation for Data Set 3200
(DSI-3200). [Available online at http://www.ncdc.noaa.gov/
oa/documentlibrary/.]
Nelson, B., D. J. Seo, and D. Kim, 2008: Multi-sensor precipita-
tion reanalysis. Preprints, Int. Symp. on Weather Radar and
Hydrology, Grenoble, France, Laboratoire d’etude des Trans-
ferts en Hydrologie et Environnement (LTHE), 02-004,
150 pp. [Available online at http://www.wrah-2008.com/PDF/
O2-004.pdf.]
NWS, 2002: Standard hydrometeorological exchange format (SHEF)
manual. National Weather Service Manual 10-944. [Available
online at http://www.nws.noaa.gov/directives/.]
Seo, D.-J., and J. Breidenbach, 2002: Real-time correction of
spatially nonuniform bias in radar rainfall data using gauge
measurements. J. Hydrometeor., 3, 93–111.
Tollerud, E., R. Collander, Y. Lin, and A. Loughe, 2005: On the
performance, impact, and liabilities of automated precipita-
tion gage screening algorithms. Preprints, 21st Conf. on
Weather Analysis and Forecasting, Washington, DC, Amer.
Meteor. Soc., P1.42. [Available online at http://ams.confex.
com/ams/pdfpapers/95173.pdf.]
1296 W E A T H E R A N D F O R E C A S T I N G VOLUME 24
Unauthenticated | Downloaded 11/27/21 12:24 PM UTC
top related