identification of deep coal seam families using machine ... · pearson, k., 1901, on lines and...

5
AEGC 2019: From Data to Discovery – Perth, Australia 1 Identification of deep coal seam families using machine learning Tauqir Moughal* ,1 Irina Emelyanova 1 David S. Warner 2 [email protected] [email protected] [email protected] Erik C. Dunlop 2 Mohinudeen Faiz 3 David N. Dewhurst 1 [email protected] [email protected] [email protected] Prue E.R. Warner 2 Marina Pervukhina 1 [email protected] [email protected] 1 CSIRO Energy, 26 Dick Perry Avenue, ARRC, Kensington, WA, Australia 2 Deep Coal Technologies Pty Ltd, Torrens Park, SA, Australia 3 CSIRO Energy, 1 Technology Court, Pullanvale, QLD, Australia INTRODUCTION The coals of the Cooper Basin Australia that are deeper than 1000 m (~3000 ft) contain large quantities of gas and liquid hydrocarbons which could potentially supply all of the long- term gas feedstock requirements of the combined Gladstone Liquefied Natural Gas (LNG) facilities using existing under- utilized infrastructure. To date these resources have not been documented and they remain essentially uncommercialized despite some initial attempts to complete them by fracture stimulation and horizontal drilling. Just as successful commercialisation of conventional sandstone reservoirs and CBM requires a good knowledge of in situ geological and mechanical characteristics, the same is required for Deep Coal Reservoirs. That is the geology drives the technology. There exists a large historical dataset of log data from wells penetrating the deep Permian gas reservoirs in the Cooper Basin located in Queensland and South Australia. The mud logs from the dataset were used to assess various physical parameters of the coal seams and to compile the Cooper Basin Deep Coal Reservoir (CBDCR) database of thousands of coal seams intersections (Dunlop et al., 2017). We explore relationships between various physical parameters in the CBDCR database applying Correlation Analysis (CA) and Principal Component Analysis (PCA) methods to determine data dependencies and reduce its dimension. Further, we apply a hierarchical spectral clustering procedure to assemble the Cooper Basin Deep Coal Seams into ten families, which show spatial separation and/or apparent differences in maturation trends. DATA The CBDCR database describes the geology of all coal horizons greater than 3 m (10 ft) thick. These horizons are called target coals. Each target coal is characterised by 30 individual deep coal characterisation parameters (DCCPs) including stratigraphy, thickness, temperature, thermal maturity, drill rate, mud gas, mud gas wetness and gamma ray. The legacy well dataset for the Cooper Basin contains almost no lab analyses or electric logging measurements that are specifically designed to describe coal reservoir characteristics. Thus, much of the raw data is extracted from the mud logs. Also, some of the raw data has been transformed into related criteria which assist in describing the types of coal reservoirs. For example: 1. Mud gas readings are normalised for drilling rates and then transformed into reservoir hydrocarbon gas content. 2. Reservoir temperature is derived from the geothermal gradient established from the electric logging information. 3. Gamma ray readings indicate the clay content of the coals. SUMMARY The Cooper Basin of Australia is a world-class unconventional gas resource with estimated gas resources of 29.8 trillion cubic feet. However, the production of this gas is challenging as the significant gas resources are located in deep coal seams, which are poorly cleated and characterised by extremely low matrix permeability. Feasibility of gas production from the Cooper Basin Deep Coal Gas (CBDCG) play was demonstrated by Santos; however, its commercial viability is yet to be proven. Recent studies provided a new insight into the gas generation ability of Cooper Basin coal seams and showed that multiple environmental features affect gas concentration and flow capacity. Fortunately, a large historical dataset exists and includes wireline and mud log data from wells drilled in the Cooper Basin. Up to 10,000 individual coal seams were identified in 1400 wells and various parameters of individual reservoir intersections, which include gas in place, thermal maturity, temperature and other petrophysical readings, completed the Cooper Basin Deep Coal Reservoir (CBDCR) database. Such a database is suitable for assessing the potential of the ultra- deep Permian coal gas reservoirs of the Cooper Basin using machine learning. In this study, we explore the data using traditional statistical methods and propose a hierarchical clustering procedure to identify various coal seam families. The quality of the identified coal seams families (clusters) is then examined by domain experts. The gas in place, geomechanical parameters, pore pressure and other important for successful production parameters can be further assessed for all confirmed clusters. Key words: deep coal seams, clustering.

Upload: others

Post on 08-Apr-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

AEGC 2019: From Data to Discovery – Perth, Australia 1

Identification of deep coal seam families using machine learning Tauqir Moughal*,1 Irina Emelyanova1 David S. Warner2 [email protected] [email protected] [email protected] Erik C. Dunlop2 Mohinudeen Faiz3 David N. Dewhurst1 [email protected] [email protected] [email protected] Prue E.R. Warner2 Marina Pervukhina1 [email protected] [email protected] 1CSIRO Energy, 26 Dick Perry Avenue, ARRC, Kensington, WA, Australia 2Deep Coal Technologies Pty Ltd, Torrens Park, SA, Australia 3CSIRO Energy, 1 Technology Court, Pullanvale, QLD, Australia

INTRODUCTION

The coals of the Cooper Basin Australia that are deeper than 1000 m (~3000 ft) contain large quantities of gas and liquid hydrocarbons which could potentially supply all of the long-term gas feedstock requirements of the combined Gladstone Liquefied Natural Gas (LNG) facilities using existing under-utilized infrastructure.

To date these resources have not been documented and they remain essentially uncommercialized despite some initial attempts to complete them by fracture stimulation and horizontal drilling. Just as successful commercialisation of conventional sandstone reservoirs and CBM requires a good knowledge of in situ geological and mechanical characteristics, the same is required for Deep Coal Reservoirs. That is the geology drives the technology. There exists a large historical dataset of log data from wells penetrating the deep Permian gas reservoirs in the Cooper Basin located in Queensland and South Australia. The mud logs from the dataset were used to assess various physical parameters of the coal seams and to compile the Cooper Basin Deep Coal Reservoir (CBDCR) database of thousands of coal seams intersections (Dunlop et al., 2017). We explore relationships between various physical parameters in the CBDCR database applying Correlation Analysis (CA) and Principal Component Analysis (PCA) methods to determine data dependencies and reduce its dimension. Further, we apply a hierarchical spectral clustering procedure to assemble the Cooper Basin Deep Coal Seams into ten families, which show spatial separation and/or apparent differences in maturation trends.

DATA

The CBDCR database describes the geology of all coal horizons greater than 3 m (10 ft) thick. These horizons are called target coals. Each target coal is characterised by 30 individual deep coal characterisation parameters (DCCPs) including stratigraphy, thickness, temperature, thermal maturity, drill rate, mud gas, mud gas wetness and gamma ray. The legacy well dataset for the Cooper Basin contains almost no lab analyses or electric logging measurements that are specifically designed to describe coal reservoir characteristics. Thus, much of the raw data is extracted from the mud logs. Also, some of the raw data has been transformed into related criteria which assist in describing the types of coal reservoirs. For example:

1. Mud gas readings are normalised for drilling rates and then transformed into reservoir hydrocarbon gas content.

2. Reservoir temperature is derived from the geothermal gradient established from the electric logging information.

3. Gamma ray readings indicate the clay content of the coals.

SUMMARY The Cooper Basin of Australia is a world-class unconventional gas resource with estimated gas resources of 29.8 trillion cubic feet. However, the production of this gas is challenging as the significant gas resources are located in deep coal seams, which are poorly cleated and characterised by extremely low matrix permeability. Feasibility of gas production from the Cooper Basin Deep Coal Gas (CBDCG) play was demonstrated by Santos; however, its commercial viability is yet to be proven. Recent studies provided a new insight into the gas generation ability of Cooper Basin coal seams and showed that multiple environmental features affect gas concentration and flow capacity. Fortunately, a large historical dataset exists and includes wireline and mud log data from wells drilled in the Cooper Basin. Up to 10,000 individual coal seams were identified in 1400 wells and various parameters of individual reservoir intersections, which include gas in place, thermal maturity, temperature and other petrophysical readings, completed the Cooper Basin Deep Coal Reservoir (CBDCR) database. Such a database is suitable for assessing the potential of the ultra-deep Permian coal gas reservoirs of the Cooper Basin using machine learning. In this study, we explore the data using traditional statistical methods and propose a hierarchical clustering procedure to identify various coal seam families. The quality of the identified coal seams families (clusters) is then examined by domain experts. The gas in place, geomechanical parameters, pore pressure and other important for successful production parameters can be further assessed for all confirmed clusters. Key words: deep coal seams, clustering.

Coal families from machine learning Moughal et al.

AEGC 2019: From Data to Discovery – Perth, Australia 2

4. Caliper logs can help describe the stress state of the coals. In total, up to 10,000 target coal horizons in 1400 wells have been described. For each well, ten DCCPs were identified as most relevant and selected for the analysis. These parameters are top of target coal (Top D), thickness of target coal (Thick M), minimum gamma ray (GR Min), average mud gas reading (TG Ave), average rate of penetration (ROP Ave), gas wetness ratio (Wetness), thermal maturity (VRo), drilling mud weight (MW), latitude (Lat) and longitude (Long).

METHODS

In order to identify the Cooper Basin Deep Coal Reservoir families, we conducted a two-phase data analysis. The first phase consists of the exploratory data analysis while during the second phase, identification of coal seams families was conducted. The detail of these methods is described as follows.

Exploratory Data Analysis The Pearson product-moment correlation coefficient is used to uncover the dependencies among the DCCPs. Principal component analysis (PCA) is a traditional statistical technique which is used to reduce redundant information and dimensions (Pearson, 1901). PCA provides a linear combination of input parameters with maximum variance. The standard measure for evaluation of the performance of PCA is the proportion of the total variance that it accounts for. It is common practice to retain only first two or three principal components as they will capture >90% of the data variance explained. Identification of Deep Coal Reservoir Families In order to identify the Deep Coal Reservoir families, we propose a data-driven approach based on the Spectral Clustering (SC) technique (Shi and Malik, 2000). Clustering aims at finding groups of objects in the dataset that are alike within the same group and dissimilar to those from the other groups. The SC groups the data pairwise by using the eigenvectors of the Laplacian matrix derived from the similarity matrix of the data. Similar to other clustering algorithms, SC requires the user to specify the number of clusters to be generated. In this study, we use a cluster consistency criterion, known as the Silhouette Index (SI), for the selection of the optimal number of clusters. The SI ranges from -1 to +1, where a higher value represents a better match of objects to its own cluster and a poor match to all the other clusters. The maximum SI from those calculated sequentially assuming two, three and more clusters is the optimal number of natural groupings (clusters) in the dataset. We develop a decision tree (DT) for partitioning the original dataset into clusters that can be interpreted as reservoir families. At Level 1, we apply SC to split the input dataset into an optimal number of clusters, identified by the maximum SI. These Level 1 clusters are further partitioned to form Level 2 of the DT. The process continues until a reasonable number of clusters is achieved. This number is determined based on the expert knowledge of the region’s geology. Figure 1 visualises the three levels of the DT built for the CBDCR database. There are two clusters at Level 1, four

clusters at Level 2 and ten clusters at Level 3 as they correspond to the maximum SI values estimated at each DT node.

Figure 1. Schematic of hierarchical spectral clustering.

RESULTS AND DISCUSSION

Figure 2 shows the heat map of the correlations among the DCCPs. It is a symmetrical graph whose diagonal elements are 1 and off-diagonal elements represent the pairwise correlations. Overall, very low correlations among the input parameters are observed except for the moderate correlation between Top D and VRo, between Top D and Latitude and between Mud Weight and VRo. The PCA applied to the DCCPs confirms this observation. Figure 3 shows the biplot of the first and second PCs along with the magnitude of the individual parameters represented by the blue vectors. The Top D and VRo parameters are the largest in magnitude with respect to PC1, and the longitude and minimum gamma ray are largest with respect to PC2, hence significantly contributing into the respective PCs as compared to other parameters. Figure 4 shows the percentage of the variance explained by each PC and the cumulative percentage curve. None of the individual PC was able to capture a significant amount of the total variance and even the first three PCs only accumulate approximately 52% of the total variance. As all PCs are contributing to very low percentage of total variance, the original dataset of the ten DCCPs was not converted into a dataset of PCs for applying the hierarchical spectral clustering procedure. Figure 5 shows the spatial locations of the target coals grouped into ten clusters. The hierarchical spectral clustering procedure formed three well separated groups such as cluster 113 (red), cluster 211 (green) and cluster 221 (grey).

CONCLUSIONS

In this paper we present a data-driven approach for identification of coal seam families in the Cooper Basin. A decision tree was developed for partitioning the CBDCR

Input Dataset

Cluster 1

Cluster 11

Cluster 111

Cluster 112

Cluster 113

Cluster 12

Cluster 121

Cluster 122

Cluster 123

Cluster 2

Cluster 21Cluster 211

Cluster 212

Cluster 22Cluster 221

Cluster 222

Level 1 Level 2 Level 3

Coal families from machine learning Moughal et al.

AEGC 2019: From Data to Discovery – Perth, Australia 3

dataset into ten clusters interpreted as deep coal seams families. These families show well separated boundaries and significant differences in the maturation trends. The study demonstrates the potential of machine learning to improve traditional practices of reservoir modelling.

Figure 2. Correlation among DCCPs.

Figure 3. Biplot of the first and second principal component and DCCPs.

Figure 4. The total variance explained by the principal components. A moderate positive correlation between Top D and VRo along with cluster labels is shown by Figure 6. Cluster 211 and 221 are the largest and well separated clusters. Figure 7 shows the line graph of maximum, average, minimum and standard

deviation values (grey bar) corrsponding to Level 3 cluster labels. It can be seen that in general, depth and latitude shows some random behavior while longitude shows a decreasing trend from first to last cluster. The detail of descriptive statistics of all input parameters for each cluster is described in Table 1.

Figure 5. Well locations in the Cooper Basin. The colour indicates various coal seams families identified by the hierarchical clustering procedure.

Figure 6. Scatter plot of VRo vs depth.

ACKNOWLEDGEMENTS

The authors would like to acknowledge CSIRO strategic funds that allowed us to work on this important scientific problem. Deep Coal Technologies are thanked for providing their databases and extensive advice associated this work.

REFERENCES

Dunlop, E.C., Warner D.S., Warner Prue E. R., Coleshill Louis R., 2017, Ultra-deep Permian coal gas reservoirs of the Cooper Basin: insights from new studies: The APPEA Journal, 57, 218-262. Shi, J. and Malik, J., 2000, Normalized cuts and image segmentation: IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8) 888-905. doi: 10.1109/34.868688 Pearson, K., 1901, On lines and planes of closest fit to systems of points in space: The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2:11, 559-572.

Coal families from machine learning Moughal et al.

AEGC 2019: From Data to Discovery – Perth, Australia 4

Figure 7. Descriptive statistics of DCCPs. Blue, orange and grey line shows the maximum, average and minimum values, respectively. Grey bars represent the standard deviation values.

Coal families from machine learning Moughal et al.

AEGC 2019: From Data to Discovery – Perth, Australia 5

Table 1. Mean and standard deviation (SD) of level 3 clusters.

Cluster N Samples

Statistics Top D (ft)

Thick M (ft)

GR Min

(API)

TG Ave (Units (50

Units=1%))

ROP Ave (min/m)

Wetness (%)

VRo (%)

MW (ppg)

Lat (deg)

Long (deg)

111 146 Mean 8154.21 4.64 32.28 855.64 0.88 8.28 1.27 9.33 -28.15 140.91 SD 738.59 1.96 13.62 819.17 0.40 5.31 0.20 0.23 0.16 0.12

112 34 Mean 6094.54 5.02 34.94 1317.65 0.66 9.88 0.91 9.34 -28.08 141.79 SD 418.82 2.01 10.21 2559.33 0.33 5.54 0.07 0.17 0.24 0.32

113 89 Mean 8255.77 4.86 41.18 850.67 0.90 10.09 1.24 9.44 -27.15 142.01 SD 751.81 2.48 20.72 602.53 0.43 4.69 0.21 0.31 0.48 0.40

121 127 Mean 6131.29 5.28 21.47 340.43 0.59 17.26 0.81 9.17 -28.43 140.87 SD 357.93 2.53 4.10 330.14 0.24 8.76 0.12 0.23 0.14 0.30

122 150 Mean 6465.14 4.43 22.07 225.92 0.60 14.11 0.77 9.21 -28.40 140.00 SD 219.05 1.75 5.86 174.23 0.21 6.73 0.05 0.22 0.11 0.12

123 84 Mean 6660.43 3.80 30.74 753.94 0.74 14.26 0.95 9.31 -28.30 140.68 SD 334.84 1.62 13.66 695.59 0.27 7.32 0.09 0.19 0.10 0.24

211 1304 Mean 9088.53 6.27 22.78 736.65 0.78 11.47 1.06 9.27 -27.75 140.03 SD 517.56 4.28 11.09 576.46 0.40 5.30 0.14 0.20 0.19 0.20

212 38 Mean 9727.32 4.20 26.61 790.84 0.75 10.16 0.91 10.40 -27.48 140.34 SD 212.87 1.71 13.16 392.40 0.26 2.34 0.03 0.26 0.01 0.01

221 590 Mean 8439.03 5.02 25.79 888.87 0.74 7.04 1.36 9.46 -28.18 140.18 SD 927.22 3.44 16.98 865.59 0.37 6.49 0.46 0.71 0.18 0.16

222 60 Mean 6639.83 5.55 23.60 393.08 0.65 27.10 0.72 9.24 -28.16 139.68 SD 397.52 3.93 11.29 481.36 0.16 20.44 0.07 0.15 0.11 0.16