sea surface temperature estimation from satellite

6
Sea Surface Temperature Estimation from Satellite Observations and In-Situ Measurements Using MultiFidelity Gaussian Process Regression P. Prempraneerach Mechanical Engineering Department, Rajamangala University of Technology Thanyaburi Thanyaburi, Pathumthani, Thailand 12110 [email protected] P. Perdikaris, G.E. Karniadakis, C. Chryssostomidis MIT Sea Grant College Program Massachusetts Institute of Technology Cambridge, MA, USA 02138 [email protected] Abstract—Monitoring ocean acidification (OA) is very important for healthy fisheries and marine ecology systems, especially in coastal areas, like the Boston Harbor and the Massachusetts Bay areas. This problem is particularly exacerbated in this region as it is related to the so-called reduced “buffering capacity” of the New England coastal waters. Sea Surface Temperature (SST) is one of the significant OA indicators; however, accurate in-situ data collection in a vast area is very costly while daily satellite observations provide only coarse information along the coastal areas. To this end, best estimation and uncertainty bounds of SST fields can be obtained from data fusion using Gaussian Process regression (GPR). Using single- and multi-fidelity GPR, confidence limits of coastal data can be established and furthermore a comprehensive multi-fidelity monitoring strategy that integrates satellite observation and in- situ measurements can be designed. Keywords—Sea Surface Temperature; Gaussian Process Regression; Boston Harbor; Uncertainty Quantification I. INTRODUCTION Ocean acidification is generally defined as changes in ocean chemistry derived from oceanic uptake of chemical inputs from atmosphere, particularly the anthropogenic atmospheric CO 2 , bujt nitrogen and sulfur compounds also strongly affect some coastal regions. Due to the accelerated growth in the industrial revolution and deforestation, a rapid increase in atmospheric CO 2 directly causes a decrease in ocean pH through air-sea gas exchange. For the past 250 years after the industrial revolution, the surface-ocean pH has subsided by 0.1 units and a further 0.3-0.4 pH drop is predicted to occur by the end of the twenty- first century, which is equivalent to a 10-fold rise in ocean acidity [2, 3]. This increase uptake of CO 2 reduces the oceanic carbonate ions concentration as well as the calcium carbonate saturation that have direct impact on marine calcifiers. As a result, many marine calcifying organisms, including various coral-reef forming algae and other shellfishes, will have difficulty to form their biogenic shell, which requires calcium carbonate (CaCO 3 ) and will be endanger, if they cannot adapt quickly to this rapidly changing oceanic environment. Coastal regions that are vulnerable to OA are mostly located in cold high-latitude regions and increasing water depth, where naturally have low carbonate ion concentration and high CO 2 solubility, as well as in the upwelling coastal areas that obtain freshwater discharges. Therefore, Boston Harbor (BH), Massachusetts (MA) and the Rhode Island (RI) Bay areas, which are two of the many regions directly influenced by OA, are specific areas of interest in the research work we present here. The level of CO2 dilution in seawater can be mainly measured and monitored from the following parameters: the Total Alkalinity (TA), the total Dissolved Inorganic Carbon (DIC), the partial pressure of CO 2 in surface seawater (pCO 2 ), and pH for a given temperature, salinity and pressure [3]. According to Sabia and et. al. [1], these parameters can be estimated from a proper merging of different satellite observations, including Ocean Colour (OC), Sea Surface Temperature (SST), Chlorophyll-a (Chl-a), Sea Surface Salinity (SSS), etc. For example, a combination of measured surface seawater pH, Chl-a, and SST data sets could provide an estimation of pH over the North Pacific, using a direct pH algorithm, as described in [4, 5]. Furthermore, an empirical relation for subtropical surface TA can be derived as a function of SSS and SST [4]. The SST can be derived from radiance measured by infrared channels of NOAA’s Advanced Very High Resolution Radiometer (AVHRR) instrument on board POES and MetOp satellites as well as from radiant and solar reflected energy measured by GOES I-M Imager on board NOAA’s Geostationary Operational Environmental Satellites (GOES). The radiant data is converted to brightness temperature and then different nonlinear or multi-channel regression algorithms [6,7,8] combine/calibrate their temperature values with global drifting buoys to calculate different SST products. The quality of SST products can be measured by three main parameters: bias, standard deviation of retrieved SST, and sensitivity to true SST [7]. Because of the spatial variation of these three parameters, the SST-quality from various regression algorithms is region dependent, characterized by the Quality Retrieval Domain (QRD). As a result, the Gaussian Process Regression (GPR) can be applied to improve the quality of SST estimation This work is supported by the Office of Naval Research (N00014-14-1- 0166) and MIT Sea Grant (NA14OAR 4170077 Department of Commerce). 978-1-5090-5210-3/17/$31.00 ©2017 IEEE

Upload: others

Post on 04-Apr-2022

10 views

Category:

Documents


0 download

TRANSCRIPT

Sea Surface Temperature Estimation from Satellite Observations and In-Situ Measurements Using

MultiFidelity Gaussian Process Regression

P. Prempraneerach Mechanical Engineering Department,

Rajamangala University of Technology Thanyaburi Thanyaburi, Pathumthani, Thailand 12110

[email protected]

P. Perdikaris, G.E. Karniadakis, C. Chryssostomidis MIT Sea Grant College Program

Massachusetts Institute of Technology Cambridge, MA, USA 02138

[email protected]

Abstract—Monitoring ocean acidification (OA) is very important for healthy fisheries and marine ecology systems, especially in coastal areas, like the Boston Harbor and the Massachusetts Bay areas. This problem is particularly exacerbated in this region as it is related to the so-called reduced “buffering capacity” of the New England coastal waters. Sea Surface Temperature (SST) is one of the significant OA indicators; however, accurate in-situ data collection in a vast area is very costly while daily satellite observations provide only coarse information along the coastal areas. To this end, best estimation and uncertainty bounds of SST fields can be obtained from data fusion using Gaussian Process regression (GPR). Using single- and multi-fidelity GPR, confidence limits of coastal data can be established and furthermore a comprehensive multi-fidelity monitoring strategy that integrates satellite observation and in-situ measurements can be designed.

Keywords—Sea Surface Temperature; Gaussian Process Regression; Boston Harbor; Uncertainty Quantification

I. INTRODUCTION Ocean acidification is generally defined as changes in ocean

chemistry derived from oceanic uptake of chemical inputs from atmosphere, particularly the anthropogenic atmospheric CO2, bujt nitrogen and sulfur compounds also strongly affect some coastal regions. Due to the accelerated growth in the industrial revolution and deforestation, a rapid increase in atmospheric CO2 directly causes a decrease in ocean pH through air-sea gas exchange. For the past 250 years after the industrial revolution, the surface-ocean pH has subsided by 0.1 units and a further 0.3-0.4 pH drop is predicted to occur by the end of the twenty-first century, which is equivalent to a 10-fold rise in ocean acidity [2, 3]. This increase uptake of CO2 reduces the oceanic carbonate ions concentration as well as the calcium carbonate saturation that have direct impact on marine calcifiers. As a result, many marine calcifying organisms, including various coral-reef forming algae and other shellfishes, will have difficulty to form their biogenic shell, which requires calcium carbonate (CaCO3) and will be endanger, if they cannot adapt quickly to this rapidly changing oceanic environment. Coastal regions that are vulnerable to OA are mostly located in cold

high-latitude regions and increasing water depth, where naturally have low carbonate ion concentration and high CO2 solubility, as well as in the upwelling coastal areas that obtain freshwater discharges. Therefore, Boston Harbor (BH), Massachusetts (MA) and the Rhode Island (RI) Bay areas, which are two of the many regions directly influenced by OA, are specific areas of interest in the research work we present here.

The level of CO2 dilution in seawater can be mainly measured and monitored from the following parameters: the Total Alkalinity (TA), the total Dissolved Inorganic Carbon (DIC), the partial pressure of CO2 in surface seawater (pCO2), and pH for a given temperature, salinity and pressure [3]. According to Sabia and et. al. [1], these parameters can be estimated from a proper merging of different satellite observations, including Ocean Colour (OC), Sea Surface Temperature (SST), Chlorophyll-a (Chl-a), Sea Surface Salinity (SSS), etc. For example, a combination of measured surface seawater pH, Chl-a, and SST data sets could provide an estimation of pH over the North Pacific, using a direct pH algorithm, as described in [4, 5]. Furthermore, an empirical relation for subtropical surface TA can be derived as a function of SSS and SST [4].

The SST can be derived from radiance measured by infrared channels of NOAA’s Advanced Very High Resolution Radiometer (AVHRR) instrument on board POES and MetOp satellites as well as from radiant and solar reflected energy measured by GOES I-M Imager on board NOAA’s Geostationary Operational Environmental Satellites (GOES). The radiant data is converted to brightness temperature and then different nonlinear or multi-channel regression algorithms [6,7,8] combine/calibrate their temperature values with global drifting buoys to calculate different SST products. The quality of SST products can be measured by three main parameters: bias, standard deviation of retrieved SST, and sensitivity to true SST [7]. Because of the spatial variation of these three parameters, the SST-quality from various regression algorithms is region dependent, characterized by the Quality Retrieval Domain (QRD). As a result, the Gaussian Process Regression (GPR) can be applied to improve the quality of SST estimation

This work is supported by the Office of Naval Research (N00014-14-1-0166) and MIT Sea Grant (NA14OAR 4170077 Department of Commerce).

978-1-5090-5210-3/17/$31.00 ©2017 IEEE

and to predict its associated uncertainty bounds in the coastal waters of BH and MA bay areas.

The objectives of this research are 1) to identify the quality of SST data extracted from both in-situ measurements and satellite observations, and 2) to compute optimal mean and variance of SST time series from regressing different fidelity-level data. A coastal state variable, specifically SST in BH, MA and RI Bay areas, are obtained from the Massachusetts Water Resources Authority (MWRA) in-situ sampling measurements; also from data acquired by the Geostationary Operational Environmental Satellite (GOES) and Advanced Very High Resolution Radiometer (AVHRR) Instrument, provided by the National Oceanic and Atmospheric Administration (NOAA), as described in Section II. Second, indicators for data quality and data association provides in Section III-A. Third, the single- and multi-fidelity GPRs are explained in Section III-B, and lastly we present data assimilation using GPRs of SST data across multi-level accuracy platforms (multi-fidelity data). These are high-accuracy physical data from sparse MWRA in situ SST measurement and low-accuracy satellite observations from AVHRR- and GOES-SST products so that best estimations with confidence limits of yearly SST variation could be evaluated and quantified.

II. SST DATA OBSERVATIONS/MEASUREMENTS Three different sources for coastal SST in the BH region

and MA bay areas are described in more detail. Moreover, the availability, quality, accuracy and limitations associated with each data sources are also discussed in this section.

A. Satellite Observations Satellite observations in the MA and RI Bay areas obtained

from NOAA website are available for the entire year. However, the quality of satellite data is limited because of the spatial resolution and a percent cloud coverage in the daily-average SST products. The SST in this analysis are obtained from two infrared imagers: AVHRR instrument on board 4 main satellites (NOAA-18 and -19, METOP-1 and -2) [12] as well as from the GOES instrument on board GOES satellites [13]. From radiance reflection from sea surface, measured by these two satellites, SST can be derived, thus AVHRR- and GOES-SST indicate skin temperature. Three-hourly composite AVHRR-SST products with a spatial resolution of 1 km2 area, covering the MA and RI bay areas, is averaged into daily, 3-day, and 7-day composite data, including day/night overpass. Similarly, GOES-SST products with a spatial resolution of 4 km2 for full US East-Coast is provided by NOAA in three-hourly, daily, 3-day, and 7-day averaged formats. A Mercator projection is used for representing satellite observation of SST from both AVHRR- and GOES-SST products.

To demonstrate the effect of different spatial resolutions, daily-averaged SST data on Day 172 of year 2014 are shown in Fig. 1 for the MA-RI bay area, and in Fig. 2 for the BH region. The finer the spatial resolution is the more accurate SST, extracted from the satellite observations at specific location, becomes. The GOES satellite shows coarse features, covering only 37 pixels of daily-averaged SST image in the Boston Harbor coast while AVHRR-SST provides us with finer feature

variation of daily averaged SST, covering 1,234 pixels in the same coastal area. Thus, NOAA’s AVHRR-SST is used as a higher-fidelity data set than GOES-SST in the multi-fidelity GPR, discussed in the last section.

Fig. 1. NOAA daily-averaged SST (ºC) in MA-RI bay on Day 172 of year 2014 from AVHRR instrument (Left) and from GOES satellite (Right).

Fig. 2. NOAA daily-averaged SST (ºC) in Boston Harbor on Day 172 of year 2014 from AVHRR instrument (Left) and from GOES satellite (Right).

B. MWRA in situ Sampling Measurement The SST measurements from MWRA in-situ sampling

locations both in MA bay areas and in the BH region are obtained from MWRA website [14]. These MWRA measurements at specific time of the day include physical, clarity, nutrients and bacteria data. Only SST, which is a physical parameter, at 1-meter depth below the sea surface is extracted from six MWRA sampling locations in Fig. 3. Only MWRA #139 and #F22 locations are selected to represent SST variations in the inner- and outer-BH/coastal regions. The outer-BH area is less sensitive to heat flux from inland. Locations (Latitude/Longitude) and brief descriptions of these six MWRA locations are given in Table 1.

III. DATA FUSION USING GAUSSIAN PROCESS REGRESSION After extracting SST data from NOAA satellite observations

and MWRA in-situ measurements, then SST from these different sources are compared against one another at two MWRA locations: #139 and #F22. Then, data fusion using Gaussian Process Regression is employed to integrate the coastal MWRA-SST measurements/high-fidelity data and the AVHRR-SST observations/low-fidelity data such that the best estimations of SST time series and their associated confidence limits can be predicted.

Fig. 3. Six MWRA sampling locations in the Boston Harbor region (Left) and one sampling location in MA bay area (Right) for SST parameter estimations.

TABLE I. MWRA SAMPLING LOCATIONS FOR WATER-QUALITY MONITORING STATION IN BOSTON HARBOR AREA AND FOR WATER COLUMN

SAMPLING STATION IN MASSACHUSETTS BAY AREA. [14]

Sampling location Location Description Latitude Longitude

139 Hangman’s island (Quincy Bay,

near Nut Island discharges) 42.286667 -70.96833

124 Hingham Bay 42.272667 -70.89767

141 Peddock’s Island (Nantasket Roads)

42.305000 -70.93083

142 President Roads (Broad Sound) 42.339167 -70.93150

106 Long Island (near Deer Island dischrages)

42.333333 -70.96000

F22 MA Bay 42.479830 -70.61767

A. Data Extraction from Different Buoy Locations First, SST data from MWRA measurements and SST

products from NOAA’s AVHRR and GOES on the same day show a linear correlation relation, as illustrated by scatter plots in Fig. 4, at MWRA #139 and #F22 sampling locations. Especially, MWRA-SST at the #F22 location, which is away from the coast, exhibits a strong linear correlation for all seven SST data points from both AVHRR and GOES satellite observations.

Fig. 4. Scatter plots of MWRA-SST measurements as function of SST values from AVHRR and GOES observations display the linear correlation between three data sets at both MWRA #139 and #F22 sampling locations.

Second, daily SST data from MWRA in-situ measurements in the year of 2014 are plotted overlaying daily-averaged SST data from NOAA’s AVHRR and GOES observations in Figs. 5 and 6 at MWRA #139 and #F22 sampling locations, respectively. At MWRA #139 location, located in the inner BH area, SST from NOAA’s AVHRR and GOES observations exhibit similar variation as that from MWRA sampling measurements, except in the winter season when river run-off and inland heat flux have strong impact on SST variation.

However, at MWRA #F22 location further out in MA bay area, MWRA-SST measurements are well correlated with AVHRR-SST and GOES-SST observations because the influence of heat transfer between coastal region and river/land becomes minor, inducing less SST-variation for all year in 2014. Nevertheless, the daily-averaged SST values from the GOES observations are mostly higher than that from other sources due to its coarse spatial resolution.

Fig. 5. Comparison of daily averaged SST data at MWRA #139 location inside the BH region from MWRA measurements, GOES and AVHRR satellites in the year of 2014.

Fig. 6. Comparison of daily averaged SST data at MWRA #F22 location in the MA bay area from MWRA measurements, GOES and AVHRR satellites in the year of 2014.

Third, the percent cloud coverage could affect the quality of SST products from satellite observations, thus the percent cloud coverage is defined as in (1) to quantify this effect. The percent cloud coverage can provide a relative SST-data quality for each satellite observations. Fig. 7 shows that the percent cloud coverage in inner BH area as a gray-scale bar graphs underneath daily-average SST value at MWRA #139 sampling location is high during the winter season (December-February) and then it becomes lower in the spring and summer (March-July).

(1)

Fig. 7. At MWRA #139 sampling location, % Cloud Coverage bar graph from GOES SST products in the year of 2014 as a data-quality indicator.

B. Single- and Multi-fidelity Gaussian Process Regressions To learn the function of SST samples ( )

from MWRA measurements ( ) and AVHRR satellite observations ( ), respectively, for high- and low-fidelity data sets, we adopt a Bayesian non-parametric regression approach based on Gaussian process

priors. Basically, this allows us to construct probabilistic interpolation or regression schemes that model the spatial covariance among P observed/measured values, and quantifies the error/uncertainty associated with their predictions. Characteristics of Gaussian Process (GP) or regression approach is that data estimation in un-sampled void regions with large gaps would be unreliable and less accurate, and generally overestimate minimum data values/valleys or underestimate maximum data values/peaks. The sampling data should be measured at local minimums /maximums, but in real environment these particular locations are unknown beforehand. Therefore, weights for the Gaussian basis function must be optimized such that a minimum error for data estimation can be obtained.

A random function, y(x), of SST at each observation location, x, is assumed to compose of two parts: a true part, ytrue(x), and noise/error part, e(x), as shown in Eq. (2) below.

y(x) = ytrue(x) + e(x) (2) The covariance function, Cov[y(xi),y(xj)] = σ2ψij, of this random SST function could be written as in Eq. (3), where the correlation function, ψij, can be expanded by the basis function, discretely approximated by a square distance separation between sampled points, below.

Cov[y(xi),y(xj)]=E[(y(xi)–μi)(y(xj)-μj)] (3)

The objective is to calculate the best likelihood estimator [9, 10] of sampled points, y(xi), as shown in Eq, (4) such that estimated mean and variance can be obtained from maximizing the likehood estimates (MLEs) as expressed in Eq. (5). A width of Gaussian basis function, represented by vector, must be optimized with respect to sampled data by a global optimization technique, like a genetic algorithm.

(4)

(5)

Here λ is a regression constant that prevents a singularity for inversion of the covariance-matrix (σ2ψ), when the sampled data is noisy. A column vector, 1, is composed of one in all elements. As a result, a prediction of the single-fidelity GP can be calculated by Eq. (6) below.

(6) where ψ is the i-th column of ψ matrix. The prediction of yearly SST time series from MWRA in-situ measurements at #139 and #F22 sampling locations are shown in Fig. 8 and 10, respectively. Because of regression characteristics, the predicted mean does not pass through all sampled points.

A multi-fidelity Gaussian Process (GP) can incorporate the high-fidelity data (XM,yM) with more than one set of low-fidelity data (XA,yA) to learn the function estimation as well as

to give more accurate prediction at the same time. Let us assume that high- and low-fidelity data re independent or cov[yM(xi),yA(X)|yA(xi)] = 0, ∀X ≠ xi where X = [XM,XA]T. Let us define NM(.) and NA(.) as GP of high- and low-fidelity data. Using an auto-regressive model, the GP of high-fidelity data (NM(.)) can be expressed as scaled GP of low-fidelity data plus GP (Nd(.)) as

NM(X) = NA(X) + Nd(X) (7) By maximizing the log-likelihood function expressed in Eq.

(8), the estimated mean and variance of the low-fidelity data could be derived in [9, 10].

(8)

where ψA(XA,XA) and ψA(XA,XM) are the satellite-observation covariance matrices at low- and high-fidelity sampling data. Here, A and λA can be optimally found by the genetic algorithm for the covariance matrix, ψA. Also, we define the difference, d, between high-fidelity data and scaled low-fidelity data, as in Eq. (9).

(9) Likewise, the estimated mean and variance of the high-

fidelity data, derived in [9, 10], can be obtained from maximizing the ln-likelihood function, given in Eq. (10).

(10)

where ψd(XM,XM) is the difference covariance matrix at high-fidelity sampling data. Again, the genetic algorithm can yield globally optimal values of M, λM and the scaling constant, , according to ln(LM). For the regression purpose, λA and λM must be added to the covariance matrices, ψA and ψd, respectively. By augmenting the high- and low-fidelity sampled data into y = [yA yM]T, then the predictor of the multi-fidelity GP can be computed in Eq. (11).

(11) where C and c denote the combined covariance matrix and scaled covariance vector, respectively.

If we incorporate both AVHRR- and GOES-SST satellite observations with high-fidelity MWRA-SST in-situ measurements, computational complexity and cost of the multi-fidelity GPR grow quickly. Thus, a Collaborative multi-output Gaussian Processes (COGP) [11], which leverages the sparse Gaussian process framework, can construct a scalable inference scheme for multi-output probabilistic regression. Using a variational inference approach, we are able to obtain a lower bound of the marginal likelihood that factorizes across data-points, and thus employ an inference scheme that is amenable to stochastic gradient descend. This allows us to scale the regression algorithms to very large multi-output data-sets and at the same time reduces a computational cost. A key to multi-output learning is to model dependencies between the outputs (or output correlation) based on sparse structure, which created by the shared inducing points. For learning all parameters in

models, including kernel hyperparameters and inducing points, stochastic optimization is performed by using an evidence lower bound (ELBO) and applying standard stochastic gradient descent.

C. Temporal Data Fusion using single- and multi-fidelity Gaussian Process Regressions According to Fig. 8, the single-fidelity GPR yields a smooth

and continuous estimated mean of MWRA-SST at #139 sampling location, but its associated uncertainty bound is a large for the entire year of 2014. Applying the multi-fidelity GPR, the estimated mean and uncertainty bound can be predicted with much better accuracy, as illustrated in Fig. 9. The regression mean becomes less smooth, which is able to capture variation trends better than that of the single-fidelity GPR. Furthermore, using the multi-fidelity GPR, AVHRR-SST observations can assist to significantly lower the uncertainty bound (or 2xσ), particularly at MWRA #139 location, where regular sampling of SST data is available.

Fig. 8. At MWRA #139 (Left) and #F22 (Right) sampling locations, using single-fidelity GP regression of MWRA SST measurement, estimated mean ( ) and uncertainty bound (2*σ) are shown in black line and gray band.

Fig. 9. At MWRA #139 sampling location, using multi-fidelity GPR of high-fidelity MWRA-SST measurement and low-fidelity AVHRR-SST products, estimated mean ( ) and tighter uncertainty bound (2*σ) are shown in black line and gray band..

At MWRA #F22 sampling location, high-fidelity MWRA-SST data becomes more sparse and unavailable in the winter season at the beginning and end of year 2014 because of its farther location out in the MA bay. However, the single-fidelity GPR is still be able to predict a smooth mean variation with large uncertainty bound, displayed in Fig. 8. When the AVHRR-SST product is integrated with sparse MWRA-SST measurements, the estimated mean and uncertainty bounds from the multi-fidelity GPR become more accurate and further reduce, respectively, as shown in Fig. 10. Notice that the uncertainty bound increases, still covering AVHRR-SST

observations, where the high-fidelity MWRA-SST are missing. Thus, in both BH region and MA bay area, SST temporal fluctuate varies between 0 and 20 degree Celsuis for both BH region and MA bay area.

Fig. 10. At MWRA #F22 sampling location, using multi-fidelity GPR of high-fidelity MWRA-SST measurements and low-fidelity AVHRR-SST products, estimated mean ( ) and tighter uncertainty bound (2*σ) are shown in black line and gray band.

Using the MATLAB COGP from [11] to perform stochastic optimization with learning rate of 0.01 for the variational parameters, 1x10-5 for noise precision and GP prior using Square Exponential covariance function for both Q shared and H individual functions, 1x10-4 for the weight, SST regression mean and 1 standard deviation bound using COGP with P=3 and Q=1 and a fixed number of shared inducing point, Mi, of 15 are shown in Figs. 11 and 12, respectively for high-fidelity MWRA-SST measurements and low-fidelity AVHRR- and GOES-SST observations at MWRA #139 and #F22 locations.

In the case of #139 location, where there are regularly MWRA-SST measurements all year round, the COGP with a fixed number of both shared inducing points, Mj, of 15 and individual inducing points, Mi, of 10 can incorporate partial satellite observations to provide a smooth predicted mean and a tighter predicted bound or standard deviation, using a larger batch size, Nb, of 50 for stochastic optimization. The predicted COGP mean in Fig. 11 is very similar to that using the multi-fidelity GPR. The COGP predicted variance at #139 location could be reduced by increasing the batch size, Nb, as shown in Fig. 13. Notice that the predicted variances are lower at dates, where there are MWRA high-fidelity sampling measurements. Thus, the batch size for stochastic optimization has direct influence on the uncertainty of the COGP prediction for regular-sampled of high-fidelity data

On the other hand, when the high-fidelity MWRA-SST measurents at #F22 location are irregularly sampled during the entire year of 2014 or in-itu data become gappy, especially in the winter season, the COGP with fixed number of shared inducing points, Mj, of 16 needs to weight its prediction more on low-fidelity AVHRR- and GOES-SST observations. The predicted mean with a tigter uncertain bound can be achieved by using the COGP with a number of individual inducing points, Mi, of 2, as shown in Fig. 12. The COGP prediction at this measuement location is much better than that using the multi-fidelity GPR, since the COGP mean follows the trend of low-fidelity AVHRR-SST observations when the high-fidelity data is unavalible after 303 day of year 2014. As a result, the

predicted COGP variance with lower Mi can lower the uncertainty bound, particularly at the beginning/end of the year, as illustrated in Fig. 13. Therefore, the COGP is sensitive to the number of inducing variables in the case of missing high-fidelity data in a certain interval.

Fig. 11. At MWRA #139 location, using Collaborative multi-output Gaussian Process (COGP) of high-fidelity MWRA-SST measurements and low-fidelity AVHRR- and GOES-SST products, estimate mean and uncertainty bound (2*σ) of SST are shown in black line and gray band.

Fig. 12. At MWRA #F22 location, using Collaborative multi-output Gaussian Process (COGP) of high-fidelity MWRA-SST measurements and low-fidelity AVHRR- and GOES-SST products, estimate mean and uncertainty bound (2*σ) of SST are shown in black line and gray band.

Fig. 13. Comparison of SST variance (s2) among the single-fidelity GPR, multi-fidelity GPR and COGP approaches at MWRA #139 sampling location (Left), and at MWRA #F22 sampling location (Right).

IV. SUMMARY The main purpose of this paper is to estimate Sea Surface

Temperature (SST) by using single-fidelity GPR, multi-fidelity GPR and Collaborative multi-output GP (COGP). Specifically, we compute the mean SST at a specific point as a function of time and quantify uncertainty the corresponding uncertainty in the estimation. SST is one of the significant Ocean Acidification

(OA) indicators. Both best estimation of SST variation as well as its associated uncertainty bound can be obtained by incorporating low-fidelity NOAA AVHRR and/or GOES satellite observations with high-fidelity MWRA in-situ measurements. Results of data assimilation using the GPR reveals that the multi-fidelity GP and COGP can provide more accurate prediction of SST time-series mean along with significantly lower uncertainty bound for the coastal water, particularly in the Boston Harbor area. Therefore, the accuracy and confidence limits (or data-quality) on other OA indicators, such as TA, pCO2, DIC etc., could be further derived from this approach of information fusion from diverse sources.

ACKNOWLEDGMENT This work is supported by the Office of Naval Research

(N00014-14-1-0166) and MIT Sea Grant (NA14OAR 4170077 Department of Commerce).

REFERENCES [1] R. Sabia, D.Fernández-Prieto, J. Shutler, C. Donlon, P. Land and N. Reul,

“Remote sensing of surface ocean PH exploiting sea surface salinity satellite observation,” 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 106-109, 2015.

[2] J.M. Guinotte and V.J. Fabry, “Ocean Acidification and Its Potential Effects on Marine Ecosystems,” Annals of the New York Academy of Sciences, vol. 1134, Issue 1, pp. 320-342, 2008.

[3] F.W. Meyer, U. Cardini, and C. Wild, “Ocean Acidification and Related Indicators,” in Environmental Indicators, 1st ed., Springer, Netherlands, pp. 723-737, 2015.

[4] Q. Sun, D. Tang, and S. Wang, “Remote-sensing observations relevant to ocean acidification,” International Journal of Remote Sensing, 33:23, pp. 7542-7558, 2012.

[5] Y. Nakano and Y.W. Watanabe, “Reconstruction of pH in the surface seawater over the north Pacific basin for all seasons using temperature and chlorophyll-a,” Journal of Oceanography, 61, pp. 673-680, 2005.

[6] R.W. Reynolds, T.M. Smith, C. Liu, D.B. Chelton, K. S. Casey, and M.G. Schlax, “Daily high-resolution-blended analyses for sea surface temperature,” Journal of Climate, 20, pp. 5473-5496, 2007.

[7] B. Petrenko, A. Ignatov, Y. Kihai, J. Stroup, and P. Dash, “Evaluation and selection of SST regression algorithms for JPSS VIIRS,” Journal of Geophysical Research: Atmospheres, 119, pp. 4580-4599, 2014.

[8] E. Maturi, A. Harris, C. Merchant, J. Mittaz, B. Potash, W. Meng, and J. Sapper, “NOAA’s Sea Surface Temperatures Products from Operational Geostationary Satellites,” Bull. American Met. Society, 89, pp. 1877-1888, 2008.

[9] A.I.J. Forrester, A. S bester and A. Keane, “Engineering Design via Surrogate Modeling: A Practical Guide”, John Wiley & Sons, 2008.

[10] Rasmussen, Carl Edward. "Gaussian processes for machine learning.", MIT Press, 2006.

[11] T.V. Nguyen and E.V. Bonilla, “Collaborative Multi-output Gaussian Processes,” 30th Conference on Uncertainty in Artificial Intelligence, Quebec, Canada, July 2014.

[12] The NOAA CoastWatch (2016, June 28), Surface water temperature product for Massachusetts & Rhode Island Bays from NOAA’s Advanced Very High Resolution Radiometer (AVHRR) [Online]. Available: http://eastcoast.coastwatch.noaa.gov/data/avhrr/sst/daily/mr/

[13] The NOAA CoastWatch (2016, June 28), Surface water temperature product for the entire U.S. east coast from NOAA’s Geostationary Operational Environmental Satellite (GOES) [Online]. Available: http://eastcoast.coastwatch.noaa.gov/data/goes/sst/daily/ec/.

[14] Massachusetts Water Resources Authority (MWRA) (2016, June 30), Water Quality Data in Boston Harbor [Online]. Available: http://www.mwra.state.ma.us/harbor/html/wq_data.htm.