machine learning applications for production prediction

University of Calgary

PRISM: University of Calgary's Digital Repository

Graduate Studies The Vault: Electronic Theses and Dissertations

2020-12-09

Machine Learning Applications for Production

Prediction and Optimization in Multistage

Hydraulically Fractured Wells

Chaikine, Ilia

Chaikine, I. (2020). Machine Learning Applications for Production Prediction and Optimization in

Multistage Hydraulically Fractured Wells (Unpublished doctoral thesis). University of Calgary,

Calgary, AB.

http://hdl.handle.net/1880/112817

doctoral thesis

University of Calgary graduate students retain copyright ownership and moral rights for their

thesis. You may use this material in any way that is permitted by the Copyright Act or through

licensing that has been assigned to the document. For uses that are not allowable under

copyright legislation or licensing, you are required to seek permission.

Downloaded from PRISM: https://prism.ucalgary.ca

UNIVERSITY OF CALGARY

Machine Learning Applications for Production Prediction and Optimization in Multistage

Hydraulically Fractured Wells

Ilia Chaikine

A THESIS

SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE

DEGREE OF DOCTOR OF PHILOSOPHY

GRADUATE PROGRAM IN CHEMICAL AND PETROLEUM ENGINEERING

CALGARY, ALBERTA

DECEMBER, 2020

Abstract

Due to improvements in horizontal drilling and completion technologies over the past several

decades, multistage hydraulic fracturing has become very popular and has led to an explosive

growth of the shale and tight oil and gas production worldwide. Even though the completion

techniques are well known and relatively simple, the dynamics of fracture formation and

hydrocarbon flow within the reservoir are extremely complex. Even with the recent

developments, little is known about how the rock mechanical properties, completion design and

well spacing affect the morphology of fracture networks and the production of hydrocarbons at

the wellhead. Because of this lack in understanding there are no models as of yet that are capable

of forecasting production performance with good accuracy. The focus of the thesis is the

Montney Formation in Alberta. The research presented in this thesis describes a method to use a

convolutional-recurrent neural network (c-RNN) to generate synthetic shear sonic logs with high

accuracy and to link a broad range of input parameters, both geological and stimulation, at every

stage along a horizontal well bore to the production performance at the well head. The results

show that the production performance is driven more by the rock mechanical properties

surrounding the perforation clusters than the design of the hydraulic fracture. The results also

show that well spacing has affect on production performance. The outcomes of the research

provide tools for improving the accuracy of rock mechanical models, optimizing hydraulic

fracturing operations with respect to water usage and the placements future wells in the reservoir

to maximize gas production.

Acknowledgements

I would like to thank my supervisor Dr. Ian Gates for taking me in as his student. His knowledge,

encouragement and support has made me passionate about research and its limitless potential.

I would also like to thank Sproule Associates Limited for providing me the financial support

needed to pursue my goals. Special thanks also go to my colleagues at Sproule, firstly to my

mentor and boss Scott Pennell, for always making time to listen and discuss ideas and for always

supporting my goals, without your help this research project would have not happened. I would

also like to thank Irina Baisitova for mentoring me since the start of my engineering career and

helping me get through some of those tough graduate courses; Alexey Romanov for always

making time to show me how to work Petrel and build geomodels and teaching me all about

statistics; Surya Karri for teaching me all I needed to know about petrophysics and sonic logs;

Victor Verkhogliad for teaching me all about Montney geology and geology in general; and

Richard Holst for your enthusiasm about my project and always being there to answer any

question I had.

I give my greatest thanks to my wife and best friend Kaitlyn Johnson, you have made this

graduate school experience wonderful and your unfaltering love, faith and support

has made my life the only life I want to live.

Dedication

To my wife Kaitlyn, thank you for being there for me through thick and thin and supporting me in

all my goals and dreams.

Table of Contents

Abstract ........................................................................................................................................... ii

Acknowledgements ........................................................................................................................ iii

Dedication ...................................................................................................................................... iv

Table of Contents ............................................................................................................................ v

List of Tables ................................................................................................................................. ix

List of Figures ................................................................................................................................ xi

Chapter 1: Introduction ................................................................................................................... 1

1.1 Background ......................................................................................................................................... 1

1.2 Montney Formation and Study Area .................................................................................................. 4

1.3 Problem Statement ............................................................................................................................. 6

1.4 Organization of Thesis ......................................................................................................................... 8

Chapter 2: Literature Review ........................................................................................................ 10

2.1 Multi-Stage Hydraulic Fracturing ...................................................................................................... 10

2.2 Variables Affecting Post Fracture Production Performance ............................................................. 11

2.2.1 Geological properties surrounding a wellbore location ............................................................ 12

2.2.2 Completion design ..................................................................................................................... 13

2.2.3 Well spacing and completion order ........................................................................................... 14

2.3 Hydrocarbon Production Forecasting ............................................................................................... 17

2.3.1 Forecasting ................................................................................................................................. 17

2.3.2 Forecasting Hydrocarbon Production ........................................................................................ 18

2.4 Time Series Forecasting and Machine Learning Algorithms ............................................................. 21

2.4.1 Parametric Models ..................................................................................................................... 23

2.4.2 Supervised Machine Learning and Artificial Neural Networks .................................................. 24

2.5 History of Machine Learning in Production Forecasting and Optimization ...................................... 34

2.6 What is Missing in Literature? .......................................................................................................... 35

Chapter 3: A New Neural Network Procedure to Generate Highly Accurate Synthetic Shear

Sonic Logs in Unconventional Reservoirs .................................................................................... 37

3.1 Preface .............................................................................................................................................. 37

3.2 Abstract ............................................................................................................................................. 38

3.3 Introduction ...................................................................................................................................... 39

3.4 Study Area and Data Processing ....................................................................................................... 42

3.4.1 Study Area and Structure Model ............................................................................................... 42

3.4.2 Data Preparation ........................................................................................................................ 44

3.5 Experimental Setup and Types of Neural Network Algorithms ........................................................ 47

3.5.1 Experimental Setup .................................................................................................................... 47

3.5.2 Comparisons .............................................................................................................................. 49

3.5.3 Networks used in the experiments ............................................................................................ 50

3.5.4 Overfitting .................................................................................................................................. 51

3.5.5 Stopping procedure ................................................................................................................... 53

3.5.6 Summary of Procedure .............................................................................................................. 57

3.6 Results and Discussion ...................................................................................................................... 59

3.7 Conclusions ....................................................................................................................................... 72

Chapter 4: A Convolutional-Recurrent Neural Network Model for Predicting Multi-Stage

Horizontal Well Production .......................................................................................................... 74

4.1 Preface .............................................................................................................................................. 74

4.2 Abstract ............................................................................................................................................. 75

4.3 Introduction ...................................................................................................................................... 76

4.4 Input Data Preparation ..................................................................................................................... 78

4.4.1 Geological Properties ................................................................................................................. 80

4.4.2 Completion Variables ................................................................................................................. 81

4.4.3 Well spacing and completion order ........................................................................................... 83

4.4.4 Production Data ......................................................................................................................... 84

4.5 Experimental Setup ........................................................................................................................... 88

4.5.1 Networks used in the experiments ............................................................................................ 88

4.5.2 Input shape and normalization .................................................................................................. 89

4.5.3 Experimental Setup .................................................................................................................... 91

4.5.4 Hyperparameter Tuning ............................................................................................................. 95

4.6 Results and Discussion ...................................................................................................................... 97

4.7 Conclusions ..................................................................................................................................... 105

Chapter 5: Optimizing Water Usage during Multi-Stage Hydraulic Fracturing with a

Convolutional-Recurrent Neural Network .................................................................................. 107

5.1 Preface ............................................................................................................................................ 107

5.2 Abstract ........................................................................................................................................... 108

5.3 Introduction .................................................................................................................................... 109

5.4 Study Area and Proposed Wells ...................................................................................................... 114

5.4.1 Well spacing and completion order ......................................................................................... 115

5.4.2 Rock mechanical properties ..................................................................................................... 116

5.4.3 Completion Parameters ........................................................................................................... 116

5.4.4 Input shape and normalization ................................................................................................ 117

5.5 Neural Network Algorithms and Experimental Setup..................................................................... 118

5.6 Results and Discussion .................................................................................................................... 120

5.7 Conclusions ..................................................................................................................................... 134

Chapter 6: Using a Convolutional-Recurrent Neural Network Forecasting Model to Optimize the

Positioning of New Wells in a Partially Developed Field .......................................................... 136

6.1 Preface ............................................................................................................................................ 136

6.2 Abstract ........................................................................................................................................... 137

6.3 Introduction .................................................................................................................................... 138

6.4 Study Area and Completion Scenarios ............................................................................................ 141

6.4.1 Well spacing and completion order ......................................................................................... 141

6.4.2 Rock mechanical properties ..................................................................................................... 142

6.4.3 Completion Parameters ........................................................................................................... 143

6.5 Neural Network Algorithm for Gas Production Forecasting ........................................................... 143

6.6 Well Combinations and Procedure ................................................................................................. 145

6.7 Results and Discussion .................................................................................................................... 150

6.8 Conclusions ..................................................................................................................................... 159

Chapter 7: Conclusions and Recommendations ......................................................................... 161

7.1 Conclusions ..................................................................................................................................... 161

7.2 Recommendations .......................................................................................................................... 163

References ................................................................................................................................... 166

Appendix A: Temperature Transient Analysis of the Steam Chamber during a SAGD Shutdown

Event ........................................................................................................................................... 175

A.1 Abstract ........................................................................................................................................... 175

A.2 Introduction .................................................................................................................................... 176

A.3 Literature Review ............................................................................................................................ 178

A.4 Temperature Transient Analysis ..................................................................................................... 179

A.4.1 Non-Condensing Model for SAGD Start up ............................................................................. 180

A.4.2 Condensing Model for SAGD Ramp up .................................................................................... 184

A.4.3 Variable Thermal Diffusivity .................................................................................................... 189

A.5 Results and Discussion .................................................................................................................... 192

A.5.1 Constant Thermal Diffusivity ................................................................................................... 192

A.5.2 Variable Thermal Diffusivity .................................................................................................... 196

A.6 Conclusions ..................................................................................................................................... 198

List of Tables

Table 2.1: Completion variables. .................................................................................................. 14

Table 2.2: Parametric and non-parametric algorithms for analyzing time series data. ................. 22

Table 3.1: Total number of data points in each well used for the study. ...................................... 47

Table 3.2: Starting hyperparameters for the c-RNN developed in this study. .............................. 57

Table 3.3: Results of the study comparing the performance of the various methods of generating

synthetic DTS curves. ................................................................................................................... 60

Table 4.1: The stage variables that were used as inputs in the experiments. ................................ 80

Table 4.2: Completion variables ................................................................................................... 81

Table 4.3: Starting hyperparameters for the c-RNN developed in this study ............................... 96

Table 4.4: Results of the leave-one out experiments performed on only the geological variables

....................................................................................................................................................... 97

Table 4.5: Results of the leave-one out experiments performed on only the completion variables

....................................................................................................................................................... 97

Table 4.6: Results of the leave-one out experiments performed on geological, completion and

spacing variables ........................................................................................................................... 97

Table 5.1: The stage variables that were used as inputs in the sensitivity experiment. .............. 115

Table 5.2: The ranges of all the variable input parameters and the increments used for the

sensitivity experiment. ................................................................................................................ 117

Table 5.3: Results from the sensitivity analysis using the crosslinked gel as the fracture fluid . 121

Table 5.4: Results from the sensitivity analysis using the slickwater as the fracture fluid. ....... 123

Table 6.1: Results from the sensitivity analysis using crosslinked gel as the fracture fluid. ...... 151

Table 6.2: Results from the sensitivity analysis using slickwater as the fracture fluid. ............. 154

List of Figures

Figure 1.1: Resource triangle: the top region of the triangle represents conventional resources

available on the planet whereas the larger bottom region represents unconventional resources.

(Holditch, 2013). ............................................................................................................................. 1

Figure 1.2: Areal extent of the Montney Formation (Canadian Energy Regulator, 2018). ............ 5

Figure 1.3: Location of Study Area (Canadian Energy Regulator, 2018). ..................................... 6

Figure 2.1: Diagram of a hydraulic fracturing operation ( US Environmental Protection Agency,

2013). ............................................................................................................................................ 10

Figure 2.2: Example of multiple horizontal wells drilled form a pad (Energy Essentials, 2015). 15

Figure 2.3: Showing bounded and unbounded wells. Here well B is bounded from both sides, A

and C are bounded from one side, and well D is unbounded. (Belyadi et al., 2017) .................... 17

Figure 2.4: The accuracy of forecasts increases with the amount of system understanding. ....... 18

Figure 2.5: A schematic diagram of two bipolar neurons. (Mohaghegh, 2000). .......................... 25

Figure 2.6: Schematic diagram of a typical artificial neuron (Mohaghegh, 2000). ...................... 26

Figure 2.7: Structure of a three-layer ANN (Neville et al. 2004). ................................................ 27

Figure 2.8: ANN training model using PSO (adapted from Panja et al. 2017). ........................... 28

Figure 2.9: Visualization of overfitting (Bhande, 2018). .............................................................. 32

Figure 2.10: A plot showing how training and validation error evolves over the amount of

epochs. .......................................................................................................................................... 32

Figure 2.11: Dropout Neural Net Model. Left: A standard neural net with 2 hidden layers. Right:

An example of a thinned net produced by applying dropout to the network on the left. Crossed

units have been dropped (Srivatava et al., 2014). ......................................................................... 33

Figure 3.1: Plot of shear slowness (DTS) versus compressional slowness (DTP) measurements

for 14 wells in the Montney Formation. ....................................................................................... 41

Figure 3.2: A 3D visualization of all wells in the study area. Red = horizontal, blue = deviated,

and black = vertical. ...................................................................................................................... 43

Figure 3.3: 3D generated Montney and Belloy surfaces in Petrel. The vertical axis is 25x

exaggerated compared to the X and Y scales and represent the subsea depth. ............................ 44

Figure 3.4: Areal view of the study area generated in Petrel. Black points with no labels are wells

with no DTS logs, labeled wells have the DTS logs..................................................................... 45

Figure 3.5: Architecture of the ANN (A) and c-RNN (B) used in the study. ............................... 50

Figure 3.6: Format of input and output data of the c-RNN. ......................................................... 51

Figure 3.7: Effect of batch size on the evolution of the validation error for blind well VT_42. .. 52

Figure 3.8: Training and validation error for two blind well tests. ............................................... 54

Figure 3.9: Validation error for all blind well tests, results are split into two graphs for clarity. 55

Figure 3.10: 20,000 iterations of the blind well DV_28 experiment. ........................................... 55

Figure 3.11: A zoomed in plot showing the training and blind well error tremor. ....................... 56

Figure 3.12: General procedure for training the synthetic DTS tool that can be applied to any

formation. ...................................................................................................................................... 58

Figure 3.13: Process for using the training tool to generate synthetic DTS curves. ..................... 59

Figure 3.14: Synthetic logs generated by the cheat ANN (left) and the average of three c-RNN

(right) runs compared to the true logs for one of the worst blind wells – VT_10. ....................... 64

(right) runs compared to the true logs for the best blind well – VT_18. ...................................... 65

(right) runs compared to the true logs for the most important well - VT_42. .............................. 66

Figure 3.17: Synthetic logs generated by the c-RNN for wells VT_26 and VT_29 compared to

the true logs. .................................................................................................................................. 68

the true logs. .................................................................................................................................. 69

Figure 3.19: Sequence of synthetic well logs generated for well VT_18 at different number of

epochs. Y-axis is DTS (µs/ft) and x-axis is the thickness steps from the top of the formation. ... 72

Figure 4.1: Areal extent of the 74 horizontal wells used in the study. ......................................... 79

Figure 4.2: An example of the long-range rock mechanical profiles surrounding the horizontal

wells. .............................................................................................. Error! Bookmark not defined.

Figure 4.3: A visual representation the Arps decline overlaying the true production. This is

shown on the rate-cumulative production plot for one of the wells. The red line represents the gas

rate assuming 100% on time every month, the orange line represents gas rate scaled down to

account for the actual on time during the month. ......................................................................... 85

Figure 4.4: A plot showing the differences in rate restrictions between the actual production and

that of the Arps decline curve. ...................................................................................................... 85

Figure 4.5: The three common types of plots used to describe the production performance of a

well. Cumulative versus time (top), rate versus cumulative (middle), rate versus time (bottom).

These plots were generated using one of the wells in the study. .................................................. 87

Figure 4.6: c-RNN structure that was used in the study. .............................................................. 89

Figure 4.7: Format of input and output data of the c-RNN model ............................................... 90

Figure 4.8: Plot showing the numerous predictions that a model trained on the same dataset could

make. ............................................................................................................................................. 93

Figure 4.9: Plots showing how the average prediction changes based on the number of runs ..... 94

Figure 4.10: Distribution of individual well MAPE from the best case. .................................... 100

Figure 4.11: Plot the worst (left) and best (right) well production profiles created by the best-case

model. The red line is the average of the 30 runs, the green line is the true profile. .................. 100

Figure 4.12: Plot the best-case aggregate production profile of all 74 wells vs time. ................ 102

Figure 4.13: Plot how aggregating well together affects the mean average percent error (MAPE).

..................................................................................................................................................... 103

Figure 5.1: 40 proposed well locations (blue) added to the existing 74 wells (red) in the study

area generated in Petrel. .............................................................................................................. 114

Figure 5.2: Aggregated cumulative 5-year production vs proppant amount and fluid amount per

stage using 15 stages per well and the crosslinked gel (top) and the slickwater (bottom) as the

fracture fluid................................................................................................................................ 127

Figure 5.3: Affect of stage count per well on the aggregated cumulative 5-year production using

crosslinked gel (top) and slickwater (bottom) as the fracture fluid. ........................................... 128

Figure 5.4: crosslinked gel vs slickwater results for 15 stage count. .......................................... 129

Figure 5.5: Aggregated cumulative production versus the total water injected for various stage

counts of crosslinked gel and slickwater wells using 120 tonnes of proppant per stage. ........... 130

Figure 5.6: injection efficiency for different stage count and fluid types versus total fluid injected

..................................................................................................................................................... 131

Figure 5.7: 5-year cumulative production (top) and injection efficiency (bottom) for the 74

existing crosslinked gel and slcikwater wells versus total fluid injected.................................... 133

Figure 6.1: One random combination of 20 wells (top), another random combination of 20 wells

(bottom)....................................................................................................................................... 147

Figure 6.2: Best and worst 20 well positions found in the 40 well prediction using the slickwater

(water intense) stimulation. ......................................................................................................... 149

Figure 6.3: Best and worst 20 well positions found in the 40 well prediction using the crosslinked

gel (water conservative) stimulation. .......................................................................................... 150

Figure 6.4: Results of the 104 combinations presented in a histogram using both crosslinked

(top) and slickwater (bottom) as the stimulation fluid. ............................................................... 157

Figure A.1: Cross sectional schematic of a typical SAGD process (Gotawala and Gates, 2010).

..................................................................................................................................................... 177

Figure A.2: Schematic diagram of a three zone non-condensing model used for SAGD start

up (modified from Zhu et al., 2012). ........................................................................................ 181

Figure A.3: Discretization domain for one-dimensional radial heat transfer - temperature grid

layout. ......................................................................................................................................... 183

Figure A.4: Schematic diagram of a three-zone condensation model used for SAGD ramp up

(modified from Zhu and Zeng, 2014). ...................................................................................... 185

Figure A.5: Decision tree for steam quality grid calculation. .................................................... 187

Figure A.6: Decision tree for steam temperature grid calculation. ............................................. 187

Figure A.7: Discretization domain for one-dimensional radial phase change - steam quality

grid layout. ................................................................................................................................. 188

Figure A.8: Non-condensing model evolution of radial temperature profile. ............................ 193

Figure A.9: Condensing model evolution of radial temperature profile. .................................... 194

Figure A.10: Condensing model evolution of radial steam quality profile. ............................... 194

Figure A.11: Temperature change with time at the center of the well bore during start up and

ramp up. ...................................................................................................................................... 195

Figure A.12: Slope of temperature plotted against time at the center of the well bore during star

up and ramp up............................................................................................................................ 196

Figure A.13: Non-condensing radial temperature profile after 2 weeks for variable and constant

kh cases. ....................................................................................................................................... 197

Figure A.14: Non-condensing temperature evolution at the center of the wellbore for variable

and constant kh cases. .................................................................................................................. 197

Chapter 1: Introduction

1.1 Background

Unconventional hydrocarbon resources differ from conventional ones in that they reside in tight,

low quality formations and are more difficult to extract (Cander, 2012). Unconventional

resources are also much more abundant on the planet than conventional ones as shown in Figure

Figure 1.1: Resource triangle: the top region of the triangle represents conventional resources

available on the planet whereas the larger bottom region represents unconventional resources.

(Holditch, 2013).

Many types of geological formations are classified as bearing unconventional resources

including tight gas sands, gas shales, heavy oil sands, coalbed methane, oil shales, and gas

hydrates. The focus of this research is on tight sands and shale, which are low permeability and

the methods and technology required to extract hydrocarbons from these reservoirs is different

from that of conventional reservoirs. For production, tight sands and shale reservoirs require

higher well density. Furthermore, the wells drilled will not produce economically until they are

stimulated using methods such as hydraulic fracturing (Ma et al., 2016)

Conventional Resources

Unconventional Resources

Hydraulic fracturing has become the most widely used technique of producing hydrocarbons

from low permeability reservoirs. The process includes pumping massive amounts of sand, water

and chemicals at a high rate and pressure into the formation to induce and prop open fractures

that increase the surface area of the reservoir that is connected to the wellbore (Belyadi et al.,

2017). Modern hydraulic fracturing is typically performed in multiple stages along the entire

horizontal wellbore. Advancements in hydraulic fracturing and horizontal drilling techniques

over the past several decades have made producing hydrocarbons from unconventional reservoirs

economically feasible – this has led to the shale boom in North America and now throughout the

world (Morton, 2013; Trembath et al., 2012).

The geological heterogeneity and flow mechanisms in unconventional reservoirs are not well

understood especially in the context of the geomechanical dynamics during the fracturing

process and during production of fluids from the reservoir. Furthermore, stimulation adds

complexity stemming from interactions between pre-existing natural fracture networks and

fractures that are formed during the stimulation. Fracture dynamics are extremely complex and

exact fracture morphology before and after stimulation is not possible to observe although

passive and microseismic provides options to visualize aspects of the fractured reservoir rock.

The injection of proppant during stimulation only adds to these complexities. One of the

challenges that the industry is facing today is the lack of understanding of unconventional

reservoir stimulation mechanisms and their effect on production as well as the impact of pre-

existing hydraulic fractures within the formation. The development of robust and reliable tools

would allow to forecast production from these types of reservoirs with reasonable certainty

(Cohen et al., 2012). Accurate forecasts would allow for better economic evaluations of assets,

implementation of optimization strategies, minimizing expenditures, and lowering environmental

impact of oil and gas operations.

Over the past 15 years, the drastic increase of hydraulic fracturing activity was paralleled with

significant advances in digital data capture and storage (Canvanillas et al., 2016). The

combination has generated a massive, ever-expanding dataset of information on hydraulic

fracturing. Over the same period computing power has increased exponentially and innovations

in machine learning models have made them well positioned in finding hidden patterns in data

and making accurate predictions (Goel et al., 2019).

Machine learning is a subset of artificial intelligence which utilizes computer algorithms capable

of improving performance automatically through experience (Tom, 1997). There are three broad

categories of machine learning: supervised learning, unsupervised learning and reinforcement

learning. In supervised learning both the inputs and outputs are presented with the goal of finding

their general correlations and being able to make prediction when only inputs are shown.

Supervised learning is used for classification and regression problem. In unsupervised learning

no outputs are presented and the algorithm is left to try and find similarities in the inputs.

Unsupervised learning is most commonly used for clustering, association, anomaly detection and

dimensionality reduction (Leung and Leckie, 2005). In reinforced learning the algorithm

interacts with a dynamic environment and learns to achieve goals through rewards

(Chandramouli et al., 2018). The research presented in this thesis deals with forecasting and

regression and the supervised learning approach was utilized.

The motivation for this research was driven by three factors:

1. the lack of current understanding of unconventional reservoirs,

2. the very large and largely unexplored digital data set generated from hydraulic

fracturing operations in unconventional reservoirs, and

3. recent advances in machine learning models and computing power.

By utilizing new machine learning techniques, we may be able to increase the understanding of

unconventional reservoirs and how hydraulic fracturing affects production by exposing patterns

and input-output relationships that have not been identified before in physics-based modelling.

The application of machine learning algorithms could lead to significant opportunities for

increasing production, lowering costs and lowering the environmental impact of oil and gas

operations.

1.2 Montney Formation and Study Area

The Montney formation shown in Figure 1.2 is a tight siltstone/shale oil and gas play that spans

over 500 km from Alberta in the southeast to British Columbia in the northwest and covers over

130,000 km2. The quality of reservoir varies significantly over the formation, with the general

trend of conventional sandstone in the southeast to pure shale in the northwest. The thickness and

depth also vary greatly from less than 1 m thickness and 500 m depth in the southeast to over

350 m thickness and 4,000 m depth in the northwest. The in-place resource volumes for the play

are estimated at over 2,000 trillion cubic feet of gas and over 150 billion barrels of oil and

condensate. (Reynolds et al., 2014; Wang and Chen, 2016).

Figure 1.2: Areal extent of the Montney Formation (Canadian Energy Regulator, 2018).

Due to the enormous volumes of in-place hydrocarbons and the major developments in

horizontal drilling and multi-stage hydraulic fracturing technologies, the Montney Formation has

become a major gas and natural gas condensate producer. With thousands of multistage

horizontal wells drilled to date, it accounts for over 38% of total Canadian natural gas production

(Canadian Energy Regulator, 2018). Due to its size and expected production growth, the

Montney Formation was chosen for the research documented in this thesis.

The study area is a small section of the Montney Formation and is show in Figure 1.3. It is

located near Dawson Creek in the province of British Columbia, Canada, covering an area of

around 30 km by 30 km with an average thickness of 300 m. In this area, the Montney Formation

underlies the Doig Formation and overlays the Belloy Formation. This particular study area was

chosen because it contains the highest density of vertical and horizontal wells – this creates a

large data set. The study area contains 145 vertical, 35 deviated wells and 255 horizontal wells

that penetrate the Montney Formation with only 60 of these wells penetrating all the way to the

Belloy Formation. Most of the horizontal and vertical wells are drilled in the middle of the study

area with some horizontal wells located in the northwest. All the horizontal wells in the study

target the Montney Formation with the majority drilled in the top 20% of its thickness.

Figure 1.3: Location of Study Area (Canadian Energy Regulator, 2018).

1.3 Problem Statement

The overall objective of the research presented in this thesis is to explore the use of machine

learning algorithms for understanding relationships between geology, geomechanical properties,

completion design, and fluid injection properties and the production performance of the well.

From this analysis, models can be devised to predict the production performance of multistage

hydraulically fractured horizontal wells. The approach makes minimal assumptions about the

exact structure of the reservoir and fluid flow mechanisms within the reservoir but instead

attempts to link the horizontal well input parameters and surround geology and geomechanical

properties to its production profile. As there are a myriad of variables driving production

behavior, machine learning was applied in the research documented in this thesis. More

specifically, the convolutional recurrent neural network (c-RNN) was chosen for its ability to

process large amount of data as well as find patterns in sequential data. This machine learning

method also needs to have the ability to aid in optimization problems such as optimizing

completion strategies and well placements. In general, for the research conducted, the approach

was developed following these steps:

1. With a machine learning model, create synthetic shear sonic logs to get a better

understanding of rock mechanical properties within a formation.

2. Create a three-dimensional (3D) rock mechanical model of the study by using the

density, compressional sonic and shear sonic (real and synthetic) logs of vertical and

deviated wells in the study area.

3. Use the rock mechanical properties, completion design, well spacing and timing

parameters as inputs to the machine learning algorithm to predict the five-year

cumulative gas production profiles of horizontal wells.

4. Use the machine learning model to minimize water usage while maximizing

production during hydraulic fracturing in future wells.

5. Apply the machine learning model to find optimal positioning of future drills to

maximize production

1.4 Organization of Thesis

Chapter 2 is a literature review of multistage hydraulic fracturing, the variables that affect

postproduction performance, machine learning tools, as well as forecasting and optimizing post

fracture production performance.

In Chapter 3, a procedure is developed that can generate highly accurate synthetic shear sonic

logs. This procedure utilizes a c-RNN as it is capable of learning patterns in sequential data and

links the deviation survey along with the compressional and bulk density logs to the shear sonic

log. This procedure is a cost effective and fast alternative to running shear sonic (DTS) logs and

provides a greater insight into the rock mechanics of a formation.

In Chapter 4, a model is proposed that utilizes a c-RNN to predict the five-year cumulative gas

production profiles in multistage hydraulically fractured wells. The model was trained by linking

the cumulative production to a combination of completion parameters, rock mechanical

properties and well spacing for every stage of a well of 74 wells in the Montney Formation. The

accuracy of the model’s predictions was found to increase exponentially as the production of

multiple wells was aggregated.

In Chapter 5, we use the model developed in Chapter 4 to forecast the production of 40

proposed wells to be drilled alongside existing producers. 1,080 sensitivities were run to explore

how different combinations of stage count, fluid type, water amount, and proppant amount affect

the aggregated 5-year cumulative gas production of the 40 proposed wells. The study focuses on

reducing water usage while maximizing production and shows that it is possible to achieve a

cumulative production that is 76% of the maximum by injecting only 3% of the total water

volume required to achieve this maximum.

In Chapter 6, we use the model developed in Chapter 4 to forecast the production of 20

proposed wells. These 20 wells can only be drilled in the 40 fixed well locations identified in

Chapter 5, and the objective is to find the best placement for these wells. This resembles a

situation where budget constrains have forced a partial field development to be the only option.

Optimal combinations were found using two types of completion strategies, the water intense 30-

stage slickwater treatment and a water conservative 15-stage crosslinked gel treatment.

Chapter 7 highlights the major findings and conclusions of the thesis as well as

recommendations for future work.

Appendix A is a separate study not directly related to the research that was completed over the

course of the degree. Here, temperature fall-off from steam chambers where injection has

stopped is examined and a new three-zone model is designed with heat losses taking into account

temperature-dependent thermal conductivity. The model solves the transient heat conduction

and steam condensing equations.

Chapter 2: Literature Review

2.1 Multi-Stage Hydraulic Fracturing

First used in 1947, hydraulic fracturing has become the most widely used technique of producing

hydrocarbons from low permeability reservoirs. The process includes pumping massive amounts

of sand, water and chemicals at a high rate and pressure into the formation in order to induce and

prop open fractures that increase the surface area of the reservoir that is connected to the

wellbore. This in turn, increases the volumes that can be recovered (Belyadi et al., 2017). Figure

2.1 shows a visualization of the hydraulic fracturing process.

Figure 2.1: Diagram of a hydraulic fracturing operation ( US Environmental Protection Agency,

2013).

In the field, the two most popular methods to hydraulically fracture a formation are the plug and

perf and the sliding sleeve. Although there is no overall recommendation on which method is

better, typically the choice between the two is driven by economics as well as an operator’s

success in a particular formation. The plug and perf method is typically used for cased holes and

offers more flexibility in adjusting the position of the perforations. The sliding sleeve is used in

open hole wells and contains pre-perforated sleeves, so that a perforating gun is not needed

(Belyadi et al., 2017). Most hydraulic fluids are water-based and contain 90 to 97% water by

volume (U.S. Environmental Protection Agency, 2016). Two of the most common water-based

fluids used today are slickwater and crosslinked gel (Montgomery, 2013). Modern hydraulic

fracturing is typically performed in multiple stages along the entire horizontal wellbore.

Advancements in hydraulic fracturing and horizontal drilling techniques over the past several

decades have made producing hydrocarbons from unconventional reservoirs economically

feasible which lead to a unconventional boom in North America and throughout the world

(Morton, 2013).

2.2 Variables Affecting Post Fracture Production Performance

The production performance of a horizontal well that has been hydraulically fractured is

influenced by a multitude of variables, in this study we identify three major categories:

1. geological properties of the formation surrounding the perforations,

2. completion design of each stage, and

3. well spacing and completion order.

2.2.1 Geological properties surrounding a wellbore location

The geological properties surrounding the perforations in a horizontal well have arguably the

greatest influence on the production performance. Geological properties include the volume of

hydrocarbons in place, the permeability, the natural fracture geology, in situ stresses, and rock

mechanical properties. The selection of the right place to drill a well is critical to ensure an

ongoing expanding commercial operation.

The geological properties of a formation can be described as a mixture of both the properties of

the matrix and the morphology of the fracture network. The matrix properties include the gas and

fluid saturation, porosity, as well as matrix permeability. These properties cannot be altered to

any meaningful extent. Fractures, whether natural or induced are not static and evolve with

changing conditions. (Holland et al., 2009). Fractures on the other hand do have a significant

effect on fluid flow within a reservoir, either in the form of increased reservoir permeability or

increased permeability anisotropy (Nelson, 2001). The fracture morphology, distribution and

connectivity to a wellbore will significantly impact the production that is measured at the

wellhead. Mechanical rock properties are a major driver of how the fracture network evolves and

are especially important during a hydraulic fracture treatment, when the natural fracture network

gets drastically altered.

The horizontal wells in this study area are drilled within a small areal extent of the Montney

Formation and are all located at a similar depth near the top of the formation. The Montney

Formation was deposited in a shallow marine shelf environment which makes the geological

properties of the matrix fairly constant through the study area (Moslow, 2000). The rock

mechanics, which govern fracture networks differ significantly both areally and vertically. This

gives each horizontal well a unique rock mechanical profile. Mechanical properties can be

estimated using three sets of log measurements: shear sonic travel time (DTS), compressional

sonic travel time (DTP) and bulk density (RHOB) (Fjar et al., 2008). Horizontal wells do not

usually have these logs and the only reasonable way to estimate these horizontal logs is by

constructing a 3D rock mechanical model upscaled and interpolated using logs from nearby

vertical wells.

2.2.2 Completion design

After a well has been drilled, the completion design is the only thing the operator can control.

Completion design is a complicated process with many variables. A typical completion is

divided into multiple stimulation stages along the wellbore that are isolated from each other,

each stage can contain one or more perforation clusters. Since each stage is stimulated

separately, the stimulation process may differ between stages.

The stage count and spacing are one of the first decisions to be made during a completion design,

the idea is to contact as much surface area as possible while minimizing fracture interference

within clusters (Belyadi et al., 2017). The optimal spacing is difficult to determine and is tied to

both economics and the geology of the formation. The amount of proppant and water pumped

per stage are also very important variables as they dictate how large the fracture network will get

and how efficiently the fractures will stay propped open once the well starts to flowback. There

are many other variables in the design of a completion that influence production, some of the

common variables are described in Table 2.1.

Table 2.1: Completion variables.

Service company performing the stimulation Amount of foam

Sliding sleeve or plug and perf Proppant concentration

Gun diameter Proppant size and type

Charge size Size of flush stage

Number of perforations per cluster Amount of crosslinked gel

Distance of the cluster from the well head Flowrate of each stage

Amount of acid used in spearhead stage Pressure of each stage

Fluid type (slickwater or gelled) Length of time of each stage

Size of pad stage Flowback design

Amount of energizer (CO2 or N2)

2.2.3 Well spacing and completion order

Whether a well is bounded or unbounded, the distance between producing wells, and the length

of time one well has been on production before an offsetting well is drilled all influence

production performance (Belyadi et al., 2017). During the early stages of field development,

many single wells are drilled. These primary wells usually produce unbounded, meaning that

they have no offsetting wells to interact with. As production data is gathered from the primary

wells it makes more economical sense to drill infill wells beside them, this type of drilling is

done from a pad which is shown in Figure 2.2.

Figure 2.2: Example of multiple horizontal wells drilled form a pad (Energy Essentials, 2015).

The production performance from primary wells is known to be better than that of the secondary

infill wells. This is because primary wells produce for the original reservoir which has been

unaltered for millions of years. The primary well production causes a depletion in original

reservoir pressure known as the drainage area that propagates away from it. As the pressure in

the drainage area drops so too does the flow rate to the wellbore. The reservoir stresses along

with the geological properties such as porosity and permeability also get altered. If drilled close

enough, the primary and secondary well will share a drainage area leading to lower production

rates in the secondary well. Lindsay et al. (2018) analyzed 564 wells in the Wolfcamp Formation

in the Permian’s Delaware Basin and found that 66% of the time the primary wells outperformed

secondary wells that were drilled within 300 m. When the results were adjusted for the more

modern, larger completions of the secondary wells, 79% of the primary wells performed better.

It is not just the distance between wells that has an affect on production, but the length of

unbounded depletion time and consequently the amount of volume that one well had already

produced before a nearby secondary well was drilled. Defeu et al. (2018) showed both of these

effects in an optimization study on wells from the Wolfcamp formation. Wells with unbounded

depletion time of 6 months had the same minimal impact on the production performance of

secondary wells drilled at 180, 230 and 275 m away from the primary. Wells with unbounded

depletion time of 36 months or more showed a 50% reduction in production performance of

secondary wells drilled 180 m away versus a 15% reduction for wells drilled 275 m away.

A primary well produces unbounded until a secondary infill well is drilled nearby making both

wells bounded. Wells can be bounded from one or both sides, as more wells are drilled in a field,

more wells become bounded from both sides. The spacing and completion order impacts the

overall performance of a field and needs to be carefully selected before a field is developed.

Figure 2.3 depicts typical bounded and unbounded configurations. Accounting for the distance

between neighboring wells and the length of time and volume that each neighbor produced is an

indirect way to account for changes in the geological properties and reservoir stresses which are

caused by well production.

Figure 2.3: Showing bounded and unbounded wells. Here well B is bounded from both sides, A

and C are bounded from one side, and well D is unbounded. (Belyadi et al., 2017)

2.3 Hydrocarbon Production Forecasting

2.3.1 Forecasting

Forecasting is a useful tool that is used in many situations that require decision making: a city

decides on the width of highways based on forecasts of population growth and vehicle usage;

stores decides on how much stock to buy based on forecasts of customer purchases. The length

of forecasts differs based on what they are used for, telecommunication routing only requires

forecasting several minutes ahead, large capital investments require multiyear forecasts

(Brockwell and Davis, 1996).

Certain things are easy to predict while others are extremely difficult. The trajectory of a comet

can be forecasted thousands of years into the future with extreme accuracy; however, forecasting

even one day of a stock market price with any degree of certainty is nearly impossible. This is

because the accuracy of a prediction depends on our understanding of the system, as show in

Figure 2.4

Figure 2.4: The accuracy of forecasts increases with the amount of system understanding.

2.3.2 Forecasting Hydrocarbon Production

Forecasting the production of oil and gas has been a major part of petroleum engineering for

most of the industry’s history. This is because having a forecast that can reasonably predict

future oil production grant the ability to estimate the assets remaining value and to optimize the

operation of existing wells and future drilling programs.

Forecasting production is possible because reservoirs behave in a manner that is driven by the

underlying physics and geology. The oil rate at the well head is a function of drawdown pressure,

quality of the wellbore, porosity, permeability, net pay, oil saturations and several other

variables. Further to that many reservoirs around the world have been on production for several

decades and have produced a large dataset of reservoir geology, drilling and production and

pressure data. Even though reservoir physics is well understood and there are large data sets

available, the error in many forecasts still remains high, this is because

• reservoirs are very large,

• reservoir geology is very heterogenous,

• reservoirs contain vast complex natural fracture networks,

• reservoirs are deep underground and are not visible,

• the only direct measurement of the reservoir comes from vertical well bores which only

cover a tiny fraction of the entire reservoirs, and

• reservoir geology can change as hydrocarbons are produced and pressures drop.

For a long time, forecasting oil production for existing producers was done empirically by

decline curve analysis (DCA) or material balance (Aldeman and Jacoby, 1979). Forecasting

production for undeveloped (not yet drilled) wells was only done by analogy and by volumetric

calculations taking recovery efficiencies into account. The problem with traditional methods is

that they are based on subjective interpretation of the data, for example picking the proper slope

for DCA. Traditional methods suffer from human cognitive biases and because of this, long-term

production forecasts have been inaccurate and have to be revised through time as the reservoir is

produced. This is the reason why the volumetric accumulation in petroleum reservoirs (e.g. the

original oil or gas in place) have tended to be updated every few years as the petroleum resource

is produced.

Numerical simulation is another popular method to forecast production, based on historical data

and physics-based models. Numerical simulation develops a dynamic model using available

static and dynamic data. The dynamic model is based on a static geological model which is built

by upscaling interpreted data from vertical well logs. The flow model is developed using the

well-known engineering fluid flow principles followed by the history matching process. Despite

handling complex physics and chemistry, phase behavior, and approximations to the

heterogeneity of the geological system, numerical methods tend to do even worse than empirical

methods at long-term forecasting (Mohaghegh, 2017).

Numerical reservoir simulators build a model from the bottom-up by combining historical data

measurements (flowing bottom hole pressure, production rates, etc.) and geological

interpretations (static geological model of faults, porosity, permeability, etc.) with functional

relationships (heat and mass transfer, multiphase flow in porous media with fluid constitutive

equations and thermodynamics). In most cases, reservoir simulation modelers assume that field

measurements and geological models are uncertain and the functional relationships are certain

and fundamentally true. To match historical data the geological model is tuned until a good

historical match is achieved. It is then assumed that the model provides a reasonable

representation of the reservoir system and the model is suitable for forecast. The problem is that

the geological model is highly uncertain and the functional relationships that are assumed to be

true are nowhere close to being able to model the entire complexity of nature (Mohaghegh,

2017).

Empirical and numerical methods have proven to be bad at forecasting long-term, yet today they

are still the most widely used methods for forecasting oil production, reserves, and resources.

Governments, operators and investors rely heavily on these forecasts to make major decisions,

and that is why an improvement in the forecast accuracy may be of high practical value. Since oil

and gas plays such a large role with respect to global energy supply and economics, these

decisions not only affect the industry but the entire global economy, the environment, and overall

quality of human life on this planet. To reduce the amount of bad decisions made there is a need

for more accurate way of long-term forecasting reserves and resources that is driven by data.

2.4 Time Series Forecasting and Machine Learning Algorithms

Time series (TS) modeling saw its beginning from the work of Yule (1927). A time series is a set

of data recorded over a period of time, most commonly taken in regular intervals to insure

uniformity. Hourly power consumption, monthly product sales or daily oil production are all

examples of a time series. The importance of time series is twofold: firstly, it can be used to

identify patterns or behaviors in the past and secondly it can be used to make predictions about

the future. The latter has gained much popularity because of its vast applications in nearly every

field from forecasting sales to predicting the weather. Time series analysis is important since it

has the ability to identify causal factors, and these factors can then be manipulated to optimize

the future. Today, it is heavily used by investors, statisticians, meteorologists, engineers,

governments and has countless applications in any field that requires decision making. Due to its

usefulness and high popularity in industry it has also become a highly active field of research.

(Kumura et al. 2013).

Most time series analytics in the past were done manually but due to the recent trend of data

digitization, ease of storage and transition the amount of raw data available has made manual

processing an impossible task (De Gooijer and Hyndman, 2006). Today most data processing is

done automatically making use of modern computing power.

As described in detail by Chen et al. (1997), time series algorithms fall mainly into two classes:

parametric and non-parametric. A summary of these is listed in Table 2.2. Early methods were

mostly parametric approaches with the goal of fitting a statistical model to time series data.

Parametric approaches make certain assumptions and tend to suffer from severe limitations. Due

to this, most models used today are non-parametric which make no assumptions about the

underlying structure of the process (Meir, 2000).

Table 2.2: Parametric and non-parametric algorithms for analyzing time series data.

Time Series Forecasting

Parametric Non-parametric

• Exponential Smoothing

• Autoregressive Moving Average

• Autoregressive Integrated Moving

Average (ARIMA)

• Multivariate Local Polynomial Regression

• Functional Coefficient Autoregressive

• Adaptive Function Coefficient Autoregressive

• Additive Autoregressive

• Artificial Neural Networks (ANN)

• Support Vector Machines (SVM)

• Genetic and Evolutionary Approaches (GA)

• Particle Swarm Optimization (PSO)

• Simulated Annealing (SA)

2.4.1 Parametric Models

Exponential smoothing was first proposed in the late 1950s (Brown 1959; Holt 1957; Winters

1960) and has been the foundation of most successful forecasting methods. Forecasts produced

using exponential smoothing are simple moving averages of past observations which are

weighted equally. Exponential functions are used to assign exponentially decreasing weights

over time. Exponential smoothing models are based on a description of the trend and seasonality

in the data and can quickly generate fairly reliable forecasts for a wide variety of time series. The

AutoRegressive Integrated Moving Average (ARIMA) model is the most popular and frequently

used stochastic time series model (Box and Jenkins, 1970). ARIMA models aim to describe the

autocorrelations in the data and provide a complimentary approach to exponential smoothing.

Some other examples of parametric models include the autoregressive conditional

heteroscedasticity (ARCH) method introduced by Engle (1982) which captures the time-varying

conditional variance or volatility. The generalized autoregressive conditional heteroskedasticity

(GARCH) (Bollerslev, 1986) represents the variance of the error term as a function of its

autoregressive terms, thereby allowing a more parsimonious representation of the time series.

Krishnamurthy and Yin (2002) combined a hidden Markov model and AR models under a

Markov regime where AR parameters switch in time according to the realization of a finite-state

Markov chain, for nonlinear time series forecasting. These parametric models tend to be limited

when modeling nonlinear and stationary time series forecasting by assuming local linearity with

an AR-type structure.

2.4.2 Supervised Machine Learning and Artificial Neural Networks

Most non-parametric time approaches utilize supervised machine learning. The goal in

supervised machine learning is to learn how the outputs are connected to the input. During

supervised learning the algorithm is presented with training data that contains both the inputs and

outputs, the samples are presented multiple times and gradually the machine begins to learn the

patterns and reduce its forecast error. During the training round a validation dataset is also

monitored to see how accurate the algorithm is performing on data it has not seen and to monitor

overfitting (Chandramouli et al., 2018).

Out of the non-parametric methods of forecasting, Artificial Neural Networks (ANNs) have

become the most popular (Agrawal and Adhikari, 2013). ANNs are information processing

systems inspired by biological neural networks found in the brains of animals and allow for

complex nonlinear relationships between the input and response variables. ANNs have the ability

to both memorize and reason and are a method in which a computer can learn or train itself to

solve a problem it was not programmed to solve. ANNs are trained by reading a large number of

input patterns and have been applied to forecasting problems in various disciplines including

finance, sociology, medicine, engineering and many others (Cartwright, 2015). To understand

how an ANN works a basic understanding of the workings of a biological neural network is

required.

2.4.2.1 Biological Neural Networks

A very simple explanation of biological neurons is as follows. Typical biological neurons contain

three main parts: cell body in which the nucleolus is contained, dendrites and an axon. Figure 2.5

depicts two bipolar neurons. Information enters the cell body through the dendrites in the form of

electrical pulses or signals. Depending on the nature of the input signal the cell body will activate

in either an excitatory or inhibitory way, the cell body will then send an output signal through the

axon to the surrounding neurons. The electrical signal travels from one neuron and activates a

train of signals in another. A neural pathway is the point between two neurons and a synapse is

the point where the ends of an axon of one neuron come into close contact with the dendrites of

another. Neurons are the basic building blocks of neural networks.

Figure 2.5: A schematic diagram of two bipolar neurons. (Mohaghegh, 2000).

Biological neural networks can contain up to 100,000,000,000 neurons and are extremely

complex. In a neural network, one neuron is usually connected to thousands of other neurons.

Although individual electrical pulses in a neuron travel much slower than a signal in a computer,

the parallel structure of the brain allows for extremely fast processing; hundreds of times faster

than the capabilities of household computers (Mohaghegh, 2000).

2.4.2.2 Basic Structure of ANNs

Like biological neural networks, ANNs are composed of basic processing elements known as

perceptrons or artificial neurons (Mohaghegh, 2000). ANNs are composed of multiple neurons

that pass signals between one another, each neuron has connections to multiple other neurons.

Each input signal carries an associated weight which multiplies the signal being transmitted. The

neuron applies an activation function to the weighted sum of all the inputs to determine the

output signal. The output of one neuron then becomes an input to another. The neurons have

multiple inputs but only one output. Figure 2.6 is a schematic of a typical artificial neuron.

Figure 2.6: Schematic diagram of a typical artificial neuron (Mohaghegh, 2000).

Individual neurons in an ANN are arranged in rows or layers known as a multilayer feed-forward

network (Hyndman and Athanasopoulos, 2017). As depicted in Figure 2.7 the basic feed-forward

ANN has three layers: an input layer, hidden layer and an output layer. The hidden layer enables

the network to model nonlinear and complex functions and is responsible for feature extraction

and provides increased dimensionality. ANNs can have multiple hidden layers, and the more

hidden layers that are added, the more complex and nonlinear the model becomes (Mohaghegh,

2000). The number of neurons in the input layer corresponds to the number of parameters

presented to the network.

Figure 2.7: Structure of a three-layer ANN (Neville et al. 2004).

2.4.2.3 ANN training methods

Training is an optimization process where the weights of the inputs between neurons along with

the biases are calibrated until a desired output is reached. Once the network is trained it can

predict with greater accuracy (Ahmadi et al. 2015). A widely used training technique is the

backward propagation of errors (backpropagation). In backpropagation the difference between

desired and actual output (error) is calculated from the output layer backwards to the input layer

and makes small incremental adjustments in the input weights (Cheng et al. 2012).

Backpropagation is an iterative procedure that takes a certain time until the weights have been

calibrated. Other methods to make adjustments to the weights include genetic algorithms (GA),

particle swarm optimization (PSO), unified particle swarm optimization and imperialist

competitive algorithms. Their ability to optimize makes them not only a valuable tool in training

but also optimizing an objective function of a field. Figure 2.8 shows how PSO can be used to

train an ANN.

Figure 2.8: ANN training model using PSO (adapted from Panja et al. 2017).

2.4.2.4 Deep Neural Networks

Deep neural networks (networks with many successive layers) were introduced in the early

1990s and due to their impressive performance have become one the most popular machine

learning tools today. With each successive layer in the network, more complex representations

are developed, through this they completely automate the task of feature engineering which in

the past was a very time consuming a laborious process. There are many types of network layers

with more discoveries being made year over year. The studies in this thesis make use of three

types of layers (Chollet, 2017):

1. Densely connected – this is the simplest type of layer, every node in one layer is fully

connected to every node in the next. The traditional ANN uses only densely connected

layers as shown in Figure 2.7. The main problem with a network that uses only densely

connected layers is that it cannot capture sequential information in the input data. ANNs

suffer from the vanishing and exploding gradient problem which is associated with the

backpropagation algorithm. This problem appears if a network has too many hidden

layers. If the derivates of the error in the output layer are too large, the gradient will

increase exponentially with each layer and eventually explode, if the derivates of the

output layer error are too small, they will decrease exponentially with each layer and

eventually disappear (Pykes, 2020).

2. Recurrent – this type of layer is used for sequence-based data such as time series or logs.

It iterates through a sequence of a sample while maintaining a state of memory relative to

what has already been seen before. A network that uses recurrent layers is known as a

recurrent neural network (RNN). The two most popular types of recurrent layers are Long

Short-Term Memory (LSTM) and the gated recurrent unit (GRU). Both units have

similar performance and are related in that they gate information to prevent the vanishing

gradient problem. The GRU was chosen for this thesis because of its simpler structure

and is higher computational efficiency over the LSTM (Chung et al., 2014). RNNs also

suffer from the vanishing and exploding gradient problem.

3. Convolutional – this is a layer that learns spatial hierarchies of patterns. When staked

together, these layers make a convolutional network (CNN), they are able to capture both

local and global patterns. Filters (or kernels) are the building block of CNNs, and are

used to extract relevant features from the input using the convolutional operation. Due to

this, they are mostly used for image identification tasks; however, they have also found

success in processing sequence data. Their performance is competitive with recurrent

layers and is usually computationally less intensive. As with the other type of networks

the CNN suffers from the vanishing and exploding gradient problem.

Deep networks are able to combine many different types of layers, which opens nearly

unlimited potential for experimenting with configurations to find an optimal structure for a

certain type of problem. There is no optimal network structure since it depends largely on the

problem at hand. Some rules of thumb exist that can be used as a starting guide. However,

the only way to find the optimal configuration is by manually tuning hyperparameters and

running many experiments. Hyperparameters are parameters whose value is set before the

training occurs. Important hyperparameters are the type and number of layers, number of

nodes in each layer, dropout (which introduces randomness), batch size and optimizer.

2.4.2.5 Convolutional Recurrent Neural Network

For the experiments presented in this thesis a convolutional recurrent hybrid (c-RNN) network

was chosen. This is because it is able to combine the speed and ability to process large amounts

of data of a convolutional network (CNN) with the sequence processing ability of a recurrent

network (RNN). The study presented in Chapter 3 of this thesis required forecasting a sequence

4,000 steps long, but standalone RNNs are able to memorize patterns in only a few hundred steps

and would not be able to predict this long a sequence with great accuracy. Convolutional

networks are able to convert long sequences into shorter ones of higher-level features, which

gives a c-RNN hybrid the ability to process sequences with thousands of steps (Chollet, 2017).

2.4.2.6 Overfitting

Deep neural networks are good at learning complicated relationships between inputs and outputs;

however, if trained too long the networks start to confuse noise with signals and start to overfit to

the training set which results in lower generalization capabilities (Srivatava et al. 2014). This is

especially problematic when the amount of training data is limited as in the studies presented in

this thesis. A visual of overfitting is shown below in Figure 2.9.

Figure 2.9: Visualization of overfitting (Bhande, 2018).

Most machine learning experiments are run with three sets of data: training, validation and test.

The model is trained on the training set, the validation set is used to monitor overfitting and the

test set is then run after network has finished training. The typical approach is to limit overfitting

stop the training at the point where validation error is at a minimum as shown in Figure 2.10.

Figure 2.10: A plot showing how training and validation error evolves over the amount of

epochs.

This approach is good when there are at least a few hundred samples available. The studies in

this thesis had a maximum of 74 samples and running multiple experiments showed that the

point at which the validation error was at a minimum was inconsistent with different samples in

the training data. After numerous experimentations we found that the best way to limit

overfitting with the small sample sizes was to maximize the batch size and apply dropout.

Dropout is one of the most effective and commonly used methods to reduce overfitting (Chollet,

2017). The term “dropout” refers to the temporary removal (setting to zero) of a fraction of

nodes from a network along with their incoming and outgoing connections during training. The

units to drop are chosen at random with a probability of p which is known as the dropout rate

which is usually set between 0.2 and 0.5 (Chollet, 2017). At test time, no units are dropped,

instead layer output values are scaled down by p. At its core, dropout adds randomness which

breaks up patterns that are not significant (Srivatava et al. 2014). A visualization of dropout is

shown in Figure 2.11.

Figure 2.11: Dropout Neural Net Model. Left: A standard neural net with 2 hidden layers. Right:

An example of a thinned net produced by applying dropout to the network on the left. Crossed

units have been dropped (Srivatava et al., 2014).

Batch size refers to the number of training examples that are seen before a network update is

made. Batch size comes in three options (Brownlee, 2017):

• Batch mode: batch size is equal to the total number of samples in the training set,

• Mini-batch mode: batch size is smaller than that in batch mode but greater than one, and

• Stochastic mode: batch size is equal to one.

2.5 History of Machine Learning in Production Forecasting and Optimization

Time series forecasting in the petroleum engineering is relatively new with mostly parametric

models being used. Ayeni and Pilat (1992) used the ARIMA technique to forecast crude oil

reserves in South Louisiana. Ediger and Akar (2007) used ARIMA and S-ARIMA models to

forecast total primary energy demand over time in Turkey.

Machine learning-based, non-parametric forecasting have also been used: He et al. (2001) used

ANN forecasting to predict existing and infill oil well production using only production data.

The experiments included two data sets (for existing and infill wells prediction) of 9 wells each.

One and a half years of production history was used to train the ANN, the authors conclude good

predictive capacities for short term (1-2 year) forecasts. Chakra et al. (2013) applied a higher

order neural network (HONN) using limited reservoir data to predict the production of oil, gas

and water. Qiao et al (2017) used least square-SVM coupled with PSO to predict the production

of oil and gas. These studies although having shown promising results in short-term forecasting,

have had significant errors in forecasts of two or more years.

Aizenberg et al. (2016) applied a complex multilayer network with multi-value neurons

(MLMVN) was constructed to forecast oil production for 12 years into the future on 14 long

producing wells using their own production history for an oil field in Mexico. The forecasts

where then compared with actual production and the results had an average symmetric mean

absolute percentage error (SMAPE) of 17% with one 12-year forecast having an SMAPE of only

6.14%.

In terms of optimization, algorithms such as GA or PSO have been used mostly as a

complimentary tool to speed up the training process in ANNs and SVMs (Ahmadi et al., 2015).

However there have been numerous studies that employed these algorithms for optimizing field

development and operations. GAs were used to maximize NPV by optimizing the schedule and

location of horizontal wells with fixed orientations (Abukhamsin, 2009); GAs were also

employed in large-scale field development optimization involving the placement of several

hundred wells (Tanaka et al., 2018). PSO has been applied to geophysical inverse problems

(Fernández-Martínez et al., 2010) as well as determining the optimum well location and types

(Onwunalu and Durlofsky, 2010).

2.6 What is Missing in Literature?

There are many models that have been built to forecast production from horizontal oil and gas

wells of which many of them employ new and complex machine learning models. The overall

trend in these studies is that they do not sufficiently cover the broad spectrum of inputs that

affect production especially for the case of hydraulically fractured reservoirs. Many studies on

hydraulically fractured reservoirs train a network to link production to completion parameters

and top hole x,y coordinates; however, they miss the fundamental fact that geology is

heterogenous and the proper inputs, the entire geological profile that surround a 2 km long

horizontal wellbore needs to be taken into account. Also, horizontal wells are completed in

multiple stages (sometimes up to 100) and many studies take the average completion parameters

of the entire well bore. Taking the average is a huge oversimplification because each stage is

unique. Some stages are completed quickly, some take longer. Some stages only place partial

amounts of the proppant or fail to break the rock at all. The stages are also spaced far apart (60

meters or more) which means each stage has a unique surrounding geology leading to every

stage having extremely different fracture networks. Furthermore, many studies do not account

for the distance between producing wells and the interaction effects that occur if wells are

sharing a common drainage area.

To the author’s knowledge there are currently no studies that have attempted to link the

completion and geological parameters at a stage level (instead of averaging to a well level) along

with well spacing information to the production performance of horizontal wells. The research

documented in this thesis attempts to fill this gap.

Chapter 3: A New Neural Network Procedure to Generate Highly Accurate Synthetic Shear

Sonic Logs in Unconventional Reservoirs

3.1 Preface

This chapter has been published as an SPE paper 201453-MS after presenting at the SPE Annual

Technical Conference and Exhibition, October 26-29, 2020. https://doi.org/10.2118/201453-MS.

This manuscript is co-authored by Ian D. Gates.

3.2 Abstract

Shear sonic travel time (DTS), along with compressional sonic travel time and bulk density are

required in order to estimate rock mechanical properties which play an important role in fracture

propagation and the success of hydraulic fracture treatments in horizontal wells. DTS logs are

often missing from the log suite due to their costs and time to process. The following study

presents a machine learning procedure capable of generating highly accurate synthetic DTS

curves. A hybrid convolutional-recurrent neural network (c-RNN) was chosen in the

development of this procedure as it can learn sequential data which a traditional neural network

(ANN) cannot. The accuracy of the c-RNN was superior when compared to that of the ANN,

simple baselines and empirical correlations. This procedure is a cost effective and fast alternative

to running DTS logs and with further development, has the potential to be used for predicting

production performance from unconventional reservoirs.

3.3 Introduction

Mechanical rock properties such as the Young’s modulus and Poisson’s ratio are important to

understand the way fractures propagate within reservoir rocks during hydraulic fracturing.

Furthermore, the mechanical properties of the rocks in an unconventional reservoir are

anisotropic and heterogenous, which makes the production performance of a horizontal well

highly dependent upon its location and orientation within the reservoir. It remains unclear how

fractures propagate within these tight reservoirs especially in the context of the state-of-stress but

a better understanding of the mechanical properties can yield insights that enable the ability to

ultimately predict the production performance of each stage and cumulatively, of the well.

In typical practice, measuring mechanical properties is done through direct and indirect methods

(Maleki et al., 2014). Direct methods involve taking core samples from the reservoirs and

performing laboratory tests, for example, triaxial stress tests, to understand the stress-strain

behavior of the rock sample as well as the stress at which the rock fails. Due to the high costs

and difficulty of rock sample extraction, typically only a few core samples are retrieved from the

formation of interest. Also, laboratory tests are expensive and time consuming (Maleki et al.,

2014). Even when obtained, core data only provides point-value (discrete) information on the

formation of interest – the fraction of the reservoir sampled is typically under 1%. Discrete data

is not very useful when describing the entire formation since the mechanical properties of rock

vary greatly with depth and the loading conditions of the surrounding formations which can vary

areally due to variable thickness and densities of the formations and unconformities. A

continuous mechanical property measurement along the thickness of the formation is desired.

Fortunately, mechanical rock properties can be derived indirectly from a combination of three

sets of log measurements, specifically: bulk density (RHOB), compressional slowness (DTP),

and shear slowness (DTS) (Fjar et al., 2008). The dynamic Young’s modulus dynamic and

Poisson’s ratio can be derived from the compression and shear wave velocity (which are the

inverse of the DTP and DTS log measurements) along with the bulk density using the following

formulas:

𝐸𝑀𝐷𝑌𝑁 =𝜌𝑉𝑠

2(3𝑉𝑝2 − 4𝑉𝑠

𝑉𝑝2 − 𝑉𝑠

ν𝐷𝑌𝑁 =𝑉𝑝

2 − 2𝑉𝑠2

2(𝑉𝑝2 − 𝑉𝑠

Where EMDYN is the dynamic Young’s modulus (pascals); νDYN is the dynamic Poisson’s ratio;

Vp is the compression wave velocity (m/s); Vs is the shear wave velocity (m/s); ρ is bulk density

(kg/m3).

The problem is that shear sonic data is often missing from a well log suite due to its high cost

and length of time to acquire. Without log or core data, the only way to estimate the shear sonic

value is either through empirical correlations or statistical methods (Hadi and Nygaard, 2018).

Most empirical correlations attempt to express the relationship between the DTP and DTS

measurements obtained from the wells with DTS logs. Empirical correlations were the main way

to estimate DTS for many years with the more familiar works from Caroll et al. (1969), Castagna

et al. (1985), and Han et al. (1986). The difficulty is that although DTS is highly correlated to

DTP, multiple variables influence the DTS including pore pressure, fluid saturation, clay and

shale content, stress profiles and other factors (Barre, 2009). Empirical correlations are also

highly dependent upon the location and size of the study area, and result in poor estimates if

taken outside that area.

Figure 3.1: Plot of shear slowness (DTS) versus compressional slowness (DTP) measurements

for 14 wells in the Montney Formation.

It is easy to see the highly nonlinear, seemingly uncorrelated nature of the relationship which

makes predicting DTS difficult. Since it is impossible to know all of the causal factors

contributing to DTS, the better approach is to apply statistical methods. Machine learning

algorithms have capabilities to integrate subtleties between data sets more than simple regression

models. These algorithms work by learning patterns between input and output variables that are

too complex and subtle for regular statistics to capture. There have been multiple studies

applying various machine learning techniques to DTS prediction. Eskandari et al. (2004) used

artificial neural networks (ANN) and showed that they performed better than empirical

correlations. Rajabi et al. (2010) used genetic algorithms (GA) and fuzzy logic (FL) which were

able to capture the general shape of the log but did not perform well on localized variations.

Maleki et al. (2014) used both support vector regression (SVR) and ANN and found that

although SVR was a better predictor its accuracy was limited and was not a good replacement to

the true log. Al-Anazi and Gates (2015) used support vector regression (SVR) to predict

Poisson’s ratio and Young’s modulus with a fuzzy-based ranking algorithm to select the most

significant input variables and filter out dependency. Their results demonstrated that the learning

and predictive capabilities of SVR was similar or superior to that of a backpropagation neural

network depending on the control parameters of the SVR. There is no single algorithm that is

superior to others and several types should be used in a particular study; however, ANNs have

recently become the popular choice (Parapuram et al. 2018; Hadi and Nygaard, 2018). Recurrent

neural networks (RNNs) are of particular interest as they are useful for sequence data such as

logs as they retain a memory of previous patterns. RNNs have shown to overperform ANNs

(Zhang et al. 2018). Studies to date have been able to capture the general trend of the

relationship; however, they do not adequately capture smaller localized variations. These

localized variations are important since they can play a major role in the mechanical properties

which in turn can affect the way fractures propagate.

In the following, we present a procedure that uses a hybrid variation of the RNN to generate

synthetic DTS curves from DTP and RHOB logs and well deviation survey. The procedure is

applied to a set of wells in the Montney Formation in Alberta, Canada.

3.4 Study Area and Data Processing

3.4.1 Study Area and Structure Model

The study area is the same for all research chapters and is described in Chapter 1: introduction of

this thesis and shown in Figures 1.2 and 1.3. A three-dimensional (3D) structural model of the

study was constructed to get a better visual understanding of the reservoir and to serve as a

placeholder for data. Figure 3.2 shows the model along with the Montney and Belloy tops.

Figure 3.2: A 3D visualization of all wells in the study area. Red = horizontal, blue = deviated,

and black = vertical.

The geological model was constructed in the Petrel geological modelling package

(Schlumberger, 2018) using well information, deviation surveys, formation tops as well as logs

from public data sources. Public data requires quality checks which are performed on deviation

surveys, surface coordinates, surface elevation and well tops. It was found that most publicly

reported well top depths needed correction. This was done by using the gamma ray (GR) and

resistivity logs. The Montney formation top surface was generated by using ~400 well tops. Due

to the high number and density of penetrations, the Montney surface is believed to have

relatively low uncertainty. The Belloy Formation on the other hand only had 60 penetrations.

The Montney has a much lower variation in thickness as compared to the variation in true

vertical depth, so a thickness map should have more certainty than a Belloy surface generated

using only 60 penetrations. The thickness map was generated using Montney thickness data from

the 60 wells that penetrate the entire formation and the Belloy surface was generated by adding

the thickness map to the Montney surface map. In this way, the uncertainty of the Belloy surface

is low. The 3D map of the generated Montney and Belloy tops is shown in Figure 3.3.

Figure 3.3: 3D generated Montney and Belloy surfaces in Petrel. The vertical axis is 25x

exaggerated compared to the X and Y scales and represent the subsea depth.

Having a 3D model not only helps with visualization of the study area but also provides a means

of estimating the reservoir thickness for the 85% of wells that do not penetrate the entire

formation. The Montney Formation has a dip and varies in its true vertical depth, so a geological

layer or feature at location x1, y1 and a given depth z would not necessarily be present at the

same depth z at location x2, y2. The same layer is likely however to be present along the same

thickness interval, so this type of measurement is a better baseline for comparison than true

vertical depth.

3.4.2 Data Preparation

Estimating rock mechanical properties requires three sets of logs: RHOB, DTP, and DTS. The

majority of the 180 vertical and deviated wells have RHOB and DTS logs but only 14 (8% of the

wells) have all three sets of logs that covered >90% of the reservoir thickness. Figure 3.4 shows

all the vertical and deviated wells without DTS marked by black points and the wells with DTS

marked by a label. If accurate synthetic DTS logs could be generated for the remaining 166

vertical and deviated wells with no DTS, the certainty of a rock mechanical model could

potentially increase significantly.

Figure 3.4: Areal view of the study area generated in Petrel. Black points with no labels are wells

with no DTS logs, labeled wells have the DTS logs.

The following data was extracted for the 14 wells in the Montney Formation: deviation survey,

RHOB, DTP, and DTS logs. Initially, GR logs were also extracted, however through much

preliminary testing it was found that GR did not improve model accuracy and, in some cases,

even worsened the performance. Even after the GR log was converted to a volume of shale (Vsh)

log, it was still found to be unhelpful since the GR reaction for similar layers of shale varied

greatly from well to well.

Log measurements for each well have a specific resolution or measurement frequency that

ranges from 1 to 6 inches (2.5-15 cm). This caused some wells to have 6 times more

measurements than others for an interval of equal thickness. This is a problem since all wells

need to carry the same weight to have an unbiased analysis. To combat this discrepancy all logs

were converted to a resolution of 4,000 steps per 100% reservoir thickness, which is 0.025% per

step. The 4,000 step scale was chosen since it represented the average number of measurements

in the well logs and did not lower the resolution for the majority of the wells. Wells that did not

penetrate the entire Montney Formation had an estimated formation thickness which was

calculated using the top and bottom surfaces. As a result, every well had 4,000 measurements of

RHOB, DTP, and DTS from top to bottom. Each step also contained x and y coordinates

(constant for vertical wells). Deviated well coordinates for every step were calculated from the

deviation survey. Finally, the data was checked for quality and consistency, sections of the

formation that had missing values of one log were removed. Although not frequent, anomalies in

measurement (such as a sudden 40% spike in density) were also removed. This cleaning of data

reduced the total number of points per well; however, the step change of 0.025% thickness

remained the same leaving the experiment unbiased towards a certain well. Table 3.1

summarizes the number of data points for each of the 14 wells.

Table 3.1: Total number of data points in each well used for the study.

Well # of data points % thickness coverage

VT_02 3,913 98%

VT_10 3,543 89%

VT_18 3,999 100%

VT_26 3,574 89%

VT_29 3,999 100%

VT_42 3,520 88%

VT_49 3,988 100%

VT_69 3,845 96%

VT_108 3,904 98%

VT_132 4,000 100%

VT_137 3,946 99%

DV_16 3,952 99%

DV_19 3,372 84%

DV_28 3,999 100%

Total 53,554 96%

3.5 Experimental Setup and Types of Neural Network Algorithms

3.5.1 Experimental Setup

The synthetic DTS curve was generated by using the following five inputs: x coordinate, y

coordinate, depth measured in true vertical depth subsea (TVD subsea), RHOB and DTP for

every available point along the Montney thickness. Since there are only 14 wells in the data set,

the only suitable method to determine accuracy is the “leave-one-out” cross-validation method.

In this method, 13 wells are used as training data with one well to validate (blind), the

experiment is then run a total of 14 times (one for each blind well) and the results of all 14

experiments are examined. This “blind” well is not truly blind as its performance can be

measured. However, it is important that an individual blind well does not change the way the

tests are performed. This means the network structure and training procedure must be fixed

throughout the evaluation. The network can be tuned after all 14 experiments are completed to

see if an improvement is made. As the network makes 4,000-point sequence predictions, multiple

metrics are required to understand the accuracy of the prediction. For this study we compare four

types of accuracy metrics:

1. MAPE – Mean Absolute Percentage Error,

2. MaxPE – Maximum Percentage Error,

3. amount of points with >3% error, and

4. amount of points with >5% error.

MAPE is an indicator of the network performance on all points in the sequence. It does not

reveal how much of the sequence has poor accuracy, or what the maximum error is. MaxPE

shows the maximum error in the entire predicted sequence, but does not give insight into the

average performance (Zhang et al. 2015). The amount of points greater than (3 or 5%) is a metric

between MAPE and MaxPE and gives insight into what fraction of the predicted sequence is

reliable. These metrics although helpful, are not very intuitive and the best way to compare

algorithm performance is by plotting the entire synthetic log generated by the algorithm overtop

the measured log. Visually comparing synthetic logs generated by different algorithms can more

easily point to how well each algorithm predicts small variations and which log sections are

predicted better than others.

Consistency is also important, if network performance differs too much for the same blind well

experiment, there is a problem and the network cannot be trusted for true blind wells. To

measure consistency, each of the 14 blind well experiments were reset and trained three times,

without modifying any network parameters or training procedures. Metrics including the

maximum minus minimum error and the standard deviation of errors of the three runs were

measured. Running multiple experiments on each blind well is also a good way to average the

results by removing really good or really bad results.

3.5.2 Comparisons

To test if an algorithm has statistical significance, it must be compared to a baseline such as:

1. Taking the average DTS/DTP ratio of training wells and applying it to the blind well, or

2. Taking the average DTS minus DTP value of training wells and applying it to the blind

The networks are also compared to the DTS estimates of two popular empirical correlations:

1. Castagna et al. (1985) for shales:

Vs (km/s) = 0.77 Vp – 0.8674

2. Han et al. (1986) for shaly sandstones:

Vs (km/s) = 0.85 Vp – 1.14

3.5.3 Networks used in the experiments

Two types of networks were chosen to run experiments: a simple feedforward network (ANN)

and c-RNN. The networks were programmed in Python using the keras library (Chollet, 2015).

The ANN was chosen as it is the simplest type of network and a good baseline to compare to the

more complicated c-RNN. The c-RNN hybrid network was chosen because it combines the

speed and ability to process large amounts of data of a convolutional network with the sequence

processing ability of a recurrent network. Traditional RNNs are only able to hold memory a few

hundred steps and since each of the wells has 4,000 steps in the input sequence, a standalone

RNN would not be able to predict with great accuracy. Convolutional networks are able to

convert long sequences into shorter ones of higher-level features, which gives a c-RNN hybrid

the ability to process sequences with thousands of steps (Chollet, 2017). For both the ANN and

the c-RNN hybrid the activation function was chosen to be the mean absolute error (MAE), this

activation function was found through trial and error and was chosen because it was able to train

the network faster other activations function such as the root mean square error (RMSE) or the

MAPE. The architecture of the networks is shown in Figures 3.5 and 3.6 is a visual

representation of the input and output data format of the c-RNN.

Figure 3.5: Architecture of the ANN (A) and c-RNN (B) used in the study.

Figure 3.6: Format of input and output data of the c-RNN.

3.5.4 Overfitting

Deep neural networks are good at learning complicated relationships between inputs and outputs;

however, if trained too long the networks start to confuse noise with signals and start to overfit to

the training set which results in lower generalization capabilities (Srivatava et al. 2014). This is

especially problematic when the amount of training data is limited as in the case in this study.

Since the aim of this study is to generate synthetic DTS for wells with no actual DTS there

should be a way to determine the stopping point of the training without looking at blind well

performance. Most machine learning experiments are run with three sets of data: training,

validation and test. The model is trained on the training set, the validation set is used to monitor

overfitting and the test set is then run after network has finished training. The typical approach is

to stop the training at the point where validation error is at a minimum. This approach is good

when there are at least a few hundred samples available; this study only has 14 wells in total.

Running multiple experiments showed that the point at which the validation error was at a

minimum was inconsistent between different blind wells and in repeat trials of the same blind

well; this implied that a way to limit overfitting was needed. To limit overfitting, we maximized

the batch size and applied dropout.

Figure 3.7 plots how the validation error evolves with different batch sizes (as a percent of the

maximum batch size) for the VT_42 blind well test. The plot reveals that fluctuations in blind

error become less aggressive and overfitting becomes less of an issue with larger batch sizes. For

these reasons it was decided to run all experiments in full batch mode.

Figure 3.7: Effect of batch size on the evolution of the validation error for blind well VT_42.

3.5.5 Stopping procedure

3.5.5.1 ANN

The ANN is meant to serve as a baseline to compare the c-RNN too. If the c-RNN performs

better than the best possible ANN results than the complexity and longer run time of the c-RNN

are justifiable. To reach the best possible result of the ANN a cheat run was performed for every

blind well. In the cheat run, the blind well error was observed during training and the network

was forced to stop when the blind error value was at its lowest. The point at which the blind error

is minimum was unique for every blind well and the only way to find it was to observe the blind

error rate at every epoch. Each blind well test was reset and run multiple times and the run with

the lowest blind error was used to generate a synthetic DTS log. This type of procedure is only

meant to test the maximum potential of the network and is not applicable when generating

synthetic DTS for wells with no DTS since the error on these wells cannot be observed.

3.5.5.2 c-RNN

Figure 3.8: Training and validation error for two blind well tests.

Figure 3.8 shows how the training and validation error evolves over 1,000 training epochs for

two of the blind well experiments. “Tremors” in both the training set and blind well error start to

occur at fairly regular intervals after ~400 epochs and the error rate seems to stabilize at a very

low constant error between them. These tremors were found to be universal across all wells, as

can be seen in Figure 3.9.

Figure 3.9: Validation error for all blind well tests, results are split into two graphs for clarity.

A long run was performed on one of the wells to see if this trend continued. Figure 3.10 shows

the train and test error for 20,000 iterations of data.

Figure 3.10: 20,000 iterations of the blind well DV_28 experiment.

Overfitting seemed to have stopped and the blind well error is more or less constant between

tremors. After about 2,000 iterations the tremors in test data get very large but still converge

back to a low and stable error rate. The tremors in the blind well point to a possibility for

improvement in accuracy if a way to stop the algorithm at the low point of the blind tremor is

found.

Since the validation error does not exist when generating synthetic DTS curves for wells with no

DTS, one can only use the training error to determine when to stop the network. From the figures

it is seen that the tremors in both the training set and blind well occur nearly at the same time

(validation tremor starts a bit earlier and ends a bit later than the training tremor). Figure 3.11

presents a zoom into one of the tremors.

Figure 3.11: A zoomed in plot showing the training and blind well error tremor.

Since the training and validation tremors where found to coincide with one another, a simple

empirical procedure was developed to determine the stopping criteria for the c-RNN:

A. Run network for 1,000 epochs (when network has stopped improving results).

B. If training error is stable and far enough away from the previous tremor, stop training.

C. If training error is experiencing a tremor or very close to previous tremor, continue

running network for 50-100 more epochs (but before next tremor occurs) until you reach

a state in step B, stop training.

D. If another tremor occurs, repeat C until you reach a state of B.

Finally, to see how close this procedure comes to the maximum potential of this network, a cheat

run was performed for every blind well. Like the ANN cheat run, the blind well error was

observed during training and the network was stopped at the lowest blind value.

3.5.6 Summary of Procedure

Table 3.2 presents the architecture and hyperparameters used for the c-RNN network used in this

study. Figure 3.12 shows the procedure required to generate a trained version of the synthetic

DTS tool presented in this study.

Table 3.2: Starting hyperparameters for the c-RNN developed in this study.

Layer Input Conv Conv

Pooling Conv GRU GRU Output

Number of

nodes 5 32 32 - 32 100 100 1

Activation

Function -

rectified

linear unit rectified

linear unit - rectified

linear unit rectified

linear unit -

Recurrent

Dropout - - - - - 0.5 0.5 -

Figure 3.12: General procedure for training the synthetic DTS tool that can be applied to any

formation.

Generate top and bottom surfaces for the formation of interest

Load all pertinent data into Petrel (well tops, logs and deviation surveys)

Extract RHOB, DTP and DTS logs as well as deviation surveys in formation of interest for

all wells with DTS logs

Convert well logs to a set number of points per 100% formation thickness

Select 1 DTS well as validation set and remaining as training set

Train c-RNN network on training set using recommended starting architecture and

hyperparameters listed in Table 2

Retrain network with previous hyperparameters on different train/validation sets to ensure

network is generalizing to other wells (if generalization poor, re-tune hyperparameters)

Save network hyperparameters

Run network for 10,000 epochs on several validation wells to confirm overfitting has been

removed

Tune network hyperparameters until best validation accuracy is achieved

Take note at which epoch validation error is stable for most validation wells; this is the

stopping point (a stopping point of 1,000 epochs was chosen for this study)

Figure 3.13 shows how to use the trained tool to generate synthetic DTS curves. The training and

generating procedures are not limited to the Montney and can be applied to any formation of

interest.

Figure 3.13: Process for using the training tool to generate synthetic DTS curves.

3.6 Results and Discussion

Tables 3.3a to 3.3f summarize the results of the leave-one-out experiments. Tables 3.3a to 3.3d

compare the four different metrics for the two empirical correlations, the two simple baselines,

the best possible results achieved with the ANN cheat run, the average of the three c-RNN runs

along with the best possible result achieved with the c-RNN cheat run. Tables 3.3e and 3.3f show

the consistency of the three c-RNN runs. These metrics are the difference between maximum and

minimum error values and the standard deviation of the three runs for each blind well. The

MAPE and MaxPE (Tables 3.3a and 3.3b) are not the best metrics as they do not describe the

amount of local errors, but they do provide more information about the results. The more

revealing metric is the fraction of points >3% error (Table 3.3c) and the visual comparison.

Table 3.3: Results of the study comparing the performance of the various methods of generating

synthetic DTS curves.

Table 3.3a - MAPE

Castagna et

al. Han et al. DTS/DTP DTS-DTP ANN Cheat

Ave of 3

c-RNN Runs c-RNN Cheat

VT_02 2.80% 2.19% 1.24% 1.03% 0.91% 1.00% 0.83%

VT_10 2.31% 3.17% 1.61% 1.53% 1.88% 1.76% 1.27%

VT_18 1.79% 3.47% 1.35% 0.93% 0.62% 0.66% 0.50%

VT_26 3.39% 2.55% 1.80% 2.29% 1.12% 1.02% 0.83%

VT_29 3.80% 1.89% 1.87% 2.15% 1.16% 1.27% 1.09%

VT_42 7.05% 3.27% 4.07% 3.11% 1.17% 1.02% 0.85%

VT_49 1.53% 3.95% 1.66% 1.01% 1.81% 1.40% 0.66%

VT_69 6.41% 2.84% 3.29% 2.11% 1.28% 1.20% 0.91%

VT_108 2.05% 3.01% 1.43% 1.31% 0.92% 0.99% 0.82%

VT_132 2.73% 2.73% 1.36% 1.37% 1.01% 0.95% 0.75%

VT_137 4.59% 2.93% 2.17% 1.37% 1.26% 1.99% 1.31%

DV_16 1.60% 4.11% 2.17% 1.75% 0.74% 1.01% 0.75%

DV_19 2.57% 4.92% 2.95% 2.32% 1.59% 1.27% 1.18%

DV_28 2.67% 3.12% 1.78% 2.15% 1.62% 1.52% 0.85%

Total 3.21% 3.14% 2.03% 1.73% 1.23% 1.22% 0.89%

Table 3.3b - MaxPE

Castagna et

Ave of 3

VT_02 8.37% 9.50% 4.96% 4.06% 4.36% 4.18% 3.52%

VT_10 10.44% 14.00% 12.90% 13.32% 14.63% 14.00% 12.98%

VT_18 6.85% 10.93% 5.87% 5.71% 4.72% 4.44% 3.82%

VT_26 14.76% 11.14% 8.45% 6.15% 5.09% 5.32% 5.52%

VT_29 11.42% 7.70% 7.03% 6.59% 5.11% 4.87% 4.12%

VT_42 16.84% 13.59% 9.34% 7.61% 5.66% 6.07% 5.18%

VT_49 6.02% 10.54% 6.31% 5.50% 4.79% 7.18% 4.32%

VT_69 15.84% 12.57% 8.54% 6.31% 3.87% 4.43% 3.73%

VT_108 6.79% 11.06% 7.10% 5.93% 3.97% 4.03% 3.77%

VT_132 12.15% 11.89% 6.65% 8.88% 6.07% 7.08% 6.75%

VT_137 14.48% 11.36% 7.13% 7.47% 6.15% 5.98% 6.74%

DV_16 8.73% 13.20% 6.73% 5.88% 5.71% 5.05% 5.51%

DV_19 10.59% 14.82% 10.53% 7.53% 7.32% 6.32% 5.67%

DV_28 11.49% 14.23% 8.64% 9.65% 10.07% 10.36% 11.32%

Total 16.84% 14.82% 12.90% 13.32% 14.63% 14.00% 12.98%

Table 3.3c – Percentage of points above 3% error

Castagna et

Ave of 3

VT_02 41.53% 27.17% 5.39% 2.12% 1.97% 1.92% 0.43%

VT_10 32.30% 50.65% 10.33% 9.37% 19.00% 15.50% 8.58%

VT_18 14.60% 53.84% 4.25% 1.18% 0.15% 0.18% 0.13%

VT_26 47.01% 34.42% 18.33% 25.63% 3.19% 2.63% 1.15%

VT_29 61.24% 19.55% 19.08% 25.81% 4.80% 7.87% 4.20%

VT_42 94.29% 42.67% 69.03% 56.65% 3.55% 4.01% 1.99%

VT_49 13.59% 65.92% 11.69% 1.71% 10.88% 2.96% 0.68%

VT_69 92.09% 32.35% 49.67% 18.99% 3.07% 3.34% 0.60%

VT_108 23.95% 44.54% 8.97% 5.84% 2.38% 1.71% 0.54%

VT_132 35.08% 43.78% 6.70% 7.08% 0.78% 0.97% 1.20%

VT_137 62.19% 38.55% 28.08% 5.22% 3.57% 24.84% 5.25%

DV_16 13.36% 68.09% 21.18% 16.37% 0.89% 3.13% 1.01%

DV_19 35.29% 67.53% 42.41% 28.32% 13.14% 5.13% 5.46%

DV_28 31.48% 48.66% 13.05% 17.75% 8.35% 5.50% 2.10%

Total 42.30% 45.42% 21.45% 15.37% 5.26% 5.66% 2.31%

Table 3.3d – Percentage of points above 5% error

Castagna et

Ave of 3

VT_02 14.16% 5.49% 0.00% 0.00% 0.00% 0.00% 0.00%

VT_10 8.98% 16.23% 3.81% 3.13% 5.51% 4.64% 3.11%

VT_18 2.13% 21.71% 0.15% 0.10% 0.00% 0.01% 0.00%

VT_26 20.03% 9.26% 5.65% 1.71% 0.03% 0.09% 0.08%

VT_29 25.81% 3.85% 4.95% 4.93% 0.08% 0.00% 0.00%

VT_42 75.28% 21.73% 26.90% 5.40% 0.17% 0.49% 0.06%

VT_49 0.85% 25.53% 0.53% 0.13% 0.00% 0.05% 0.00%

VT_69 65.80% 15.14% 16.33% 0.52% 0.00% 0.00% 0.00%

VT_108 2.87% 19.19% 0.38% 0.41% 0.00% 0.00% 0.00%

VT_132 14.03% 10.55% 0.25% 0.68% 0.20% 0.30% 0.60%

VT_137 39.38% 17.82% 6.06% 0.23% 0.10% 0.74% 0.15%

DV_16 1.77% 32.82% 0.63% 0.51% 0.25% 0.03% 0.10%

DV_19 12.51% 42.26% 16.96% 4.51% 1.99% 0.34% 0.33%

DV_28 16.35% 16.33% 0.90% 3.20% 1.15% 0.52% 0.53%

Total 21.09% 18.22% 5.67% 1.76% 0.63% 0.49% 0.34%

Table 3.3e – Maximum minus minimum error of the three c-RNN runs

well MAPE MaxPE

Percentage of

points > 3% error

Percentage of

points > 5% error

VT_02 0.29% 1.24% 2.58% 0.00%

VT_10 0.78% 1.36% 11.94% 0.62%

VT_18 0.18% 1.07% 0.08% 0.03%

VT_26 0.14% 0.70% 0.39% 0.11%

VT_29 0.07% 0.32% 0.95% 0.00%

VT_42 0.42% 2.14% 4.89% 1.25%

VT_49 0.57% 4.67% 3.06% 0.05%

VT_69 0.50% 1.28% 5.28% 0.00%

VT_108 0.03% 0.55% 0.56% 0.00%

VT_132 0.52% 0.89% 0.08% 0.03%

VT_137 0.29% 0.21% 8.64% 0.30%

DV_16 0.32% 0.76% 1.49% 0.05%

DV_19 0.06% 0.84% 0.50% 0.06%

DV_28 0.16% 0.49% 1.20% 0.03%

Average 0.31% 1.18% 2.97% 0.18%

Table 3.3f – Standard deviation of errors of the three c-RNN runs

well MAPE MaxPE

Percentage of

points > 3% error

Percentage of

points > 5% error

VT_02 0.14% 0.68% 1.32% 0.00%

VT_10 0.41% 0.76% 6.39% 0.36%

VT_18 0.10% 0.54% 0.04% 0.01%

VT_26 0.07% 0.35% 0.22% 0.06%

VT_29 0.04% 0.18% 0.50% 0.00%

VT_42 0.22% 1.11% 2.55% 0.72%

VT_49 0.33% 2.34% 1.53% 0.03%

VT_69 0.25% 0.67% 2.65% 0.00%

VT_108 0.01% 0.30% 0.30% 0.00%

VT_132 0.26% 0.46% 0.04% 0.01%

VT_137 0.15% 0.11% 4.42% 0.18%

DV_16 0.16% 0.38% 0.78% 0.03%

DV_19 0.03% 0.48% 0.26% 0.03%

DV_28 0.08% 0.28% 0.61% 0.01%

Average 0.16% 0.62% 1.54% 0.10%

The results show that the empirical correlations did not work for this study. The likely reason for

this is because the empirical correlations were developed from a very broad (global) set of

reservoirs and do not translate well to specific (local) reservoirs well. The simple baselines

performed about twice as well as the empirical formulas. This is no surprise as they were derived

from the average of 13 training wells, making them very local and only applicable to this

particular study area. Both neural networks show statistical significance as they outperformed the

simple baselines and the empirical correlations.

The metrics show that the c-RNN performance is similar that of the ANN cheat run; however, a

visual comparison reveals that the c-RNN has much better predictive capabilities. Figures 3.14,

3.15 and 3.16 compare the ANN cheat and average c-RNN synthetic logs for one of the worst

wells (VT_10), the best well (VT_18) and the most central well VT_42 respectively. The visual

comparison shows how the average c-RNN run can predict the small variations much better than

the best result achieved by the cheat ANN run.

(right) runs compared to the true logs for one of the worst blind wells – VT_10.

MAPE = 1.76% MAPE = 1.88%

(right) runs compared to the true logs for the best blind well – VT_18.

MAPE = 0.62% MAPE = 0.66%

(right) runs compared to the true logs for the most important well - VT_42.

Depth of majority of

nearby horizontal wells

MAPE = 1.17% MAPE = 1.02%

Wells VT_10 and VT_137 had the highest error rate in the c-RNN runs. As shown in Figure 3.4

these wells are located near the edge of the study area where not many wells have been drilled,

so their accuracy is not of great importance for the purpose of this study. Most of the vertical and

horizontal wells are drilled near the center of the study area closest to well VT_42 and four other

nearby wells: VT_26, VT_29, VT_69 and VT_108. It is important to focus on these wells as they

dictate the success of the network to generate synthetic curves for the majority of wells with no

DTS. From the results, VT_42 had a MAPE of only 1% and only ~7% of the points had an error

>3% error, the other four wells also performed fairly well with an MAPE close to 1%. Figures

3.17 and 3.18 show the synthetic curves generated by the c-RNN compared to the true DTS

curves for the more centrally located wells VT_26, VT_29, VT_69 and VT_108. The cheat run

of the c-RNN had the best performance. This suggests potential for improvement in the stopping

criteria and design of the c-RNN network.

the true logs.

MAPE = 1.27% MAPE = 1.02%

the true logs.

MAPE = 1.20% MAPE = 0.99%

Figure 3.19 shows how the synthetic curves generated for blind well VT_18 improve in accuracy

with the number of training iterations. The results show that the broad trend of the DTS log is

captured early in the network training and more detailed variations are captured with more

training iterations.

Epoch 1:

Epoch 10:

Epoch 40:

Epoch 70:

Epoch 120:

Epoch 380:

Figure 3.19: Sequence of synthetic well logs generated for well VT_18 at different number of

epochs. Y-axis is DTS (µs/ft) and x-axis is the thickness steps from the top of the formation.

3.7 Conclusions

A procedure to generate highly accurate synthetic DTS curves has been developed. This

procedure can act as a cost effective and fast alternative to running actual DTS logs. This does

come with the condition that the area of interest is small enough and that there are enough wells

with true DTS measurements available. Generation of a synthetic DTS measurement at a given

point along the reservoir thickness only required five inputs: x and y coordinates, depth subsea,

RHOB and DTP. The GR log was tested and was found to not improve the performance of the

network. As shown by the cheat run, the procedure does still have room for improvement. Some

potential reasons for the error in the synthetic DTS curves are:

• 13 training wells are not enough to learn every possible signal.

• Some signals are caused by variables not captured in the study (ie. fractures, stress

profiles, pore fluids, high porosity, shale, thin beds, gas presents, high total organic

content (TOC)).

• DTS measurement error is caused by the tool itself or poor borehole conditions.

Although the network did perform poorly on two of the blind well tests (VT_137 and VT_10) it

is more important to look at the performance of wells located closer to the majority of vertical

and horizontal wells. As there is no real way to measure synthetic DTS curve accuracy for wells

with no DTS one can only rely on the results of the network for nearby wells. Most wells are

located in the center of the study area and most of the horizontal wells reside in the top 25%

thickness and the network has shown to be able to generate fairly accurate curves for that region

of the study. The c-RNN has also shown to have fairly consistent results between the runs, but it

is still suggested to take the average of three runs when generating synthetic curves for wells

with no DTS. This procedure is not limited to the Montney Formation. As long as the area is

small enough and contains enough wells with DTS logs it could be applied to any other

formation. Even in this small window of the Montney, the 250 existing horizontal wells only

cover a small fraction of the thick reservoir and there is remaining opportunity for further

development.

Having a predictive procedure like this could potentially give the user a way to optimize where

to drill and how to complete a horizontal well before it is actually drilled. If the number of wells

per area is large enough the procedure could be applied to any other formation.

Chapter 4: A Convolutional-Recurrent Neural Network Model for Predicting Multi-Stage

Horizontal Well Production

4.1 Preface

This Chapter has been published as a manuscript in the Journal of Petroleum Science &

Engineering on November 21, 2020. This article is co-authored by Ian D. Gates.

4.2 Abstract

In this study, a hybrid convolutional-recurrent neural network (c-RNN) is evaluated for making

predictions of the five-year cumulative production profiles in multistage hydraulically fractured

wells. The model was trained by using a combination of completion parameters, rock mechanical

properties, and well spacing and completion order for each stage of 74 wells in the Montney

Formation in British Columbia. The prediction accuracy of the various combinations was

measured by using the mean average percent error (MAPE) generated through the leave-one-out

method. The best combination of inputs was found to be the rock mechanical properties

surrounding each perforation cluster, the proppant amount used for every stage, and the spacing

and completion order of neighboring wells. The novelty of this study is that the input variables

used are at the stage level rather than the average of the entire well. The accuracy of the model

was found to increase exponentially as the production of multiple wells was aggregated. The

approach yields insights for planning new well drills in fields with existing development since it

provides the ability to run multiple field development scenarios without having to spend capital.

4.3 Introduction

The post hydraulic fracture production performance of a horizontal well is influenced by a

multitude of variables, in this study we identify three major categories:

3. Geological properties of the formation surrounding the perforations,

4. Completion design of each stage, and

5. Well spacing and completion order.

The geological properties surrounding the perforations in a horizontal well have arguably the

greatest influence on the production performance. Geological properties include the volume of

hydrocarbons in place, the permeability, the natural fracture geology, in situ stresses, and rock

mechanical properties. The selection of the right place to drill a well is critical to ensure an

ongoing expanding commercial operation.

A model that predicts the production profile of a well before it is drilled with reasonable

accuracy would help to ensure a commercial operation allowing an operator to run multiple

scenarios of well placements and completion types and give the ability to optimize the

development of the field to their needs. Such a model could only predict the production

performance given a set of input parameters at each perforation cluster along the wellbore; it

would be difficult to predict how much each individual cluster parameter contributes to the

output. This is partly because the production in a horizontal wellbore is measured as a sum of the

production coming into each perforation cluster. The wells in our study had between 5 and 47

clusters that were spaced from 58 to 300 m apart and the completion design of each stage in the

same well was not identical. This results in each cluster having a unique post stimulation

fracture network making each cluster contribute non-equal amounts to overall production.

Furthermore, fracture stimulation is far from a perfect process: sometimes the reservoir never

reaches breakdown pressure and/or a plug breaks down and the fluid meant for one stage goes

into another. In other cases, the injected fluid may only flow into one thick natural fracture

thereby not really increasing the contact surface area. If there are multiple clusters in one stage,

there is no guarantee that the total stage fluid would divide itself evenly into each cluster and

create identical copies of fracture networks. This non-uniformity in production is well known,

He et al. (2017) and Al-Shamma et al. (2014) showed that up to 30% of fracture clusters in a

horizontal wellbore are non-productive.

The typical way to predict production from hydraulically fractured wells is to use numerical

reservoir simulators. Numerical simulation develops a dynamic model using available static and

dynamic data. The dynamic model is based on a static geological model which is built by

upscaling interpreted data from vertical well logs. The flow model is developed using the well

known engineering fluid flow principles followed by the history matching process. Fracture

propagation is modeled in various 2D or 3D geometries: some common types include the

Kristianovich-Geertsma-de Klerk (KGD) and the Perkins-Kern-Nordgren (PKN) geometry as

well as the pseudo-three-dimensional (P3D) model. Traditional numerical simulation to predict

production is known as a bottom-up approach (Mohaghegh, 2017). This is because these models

assume that all of the reservoirs complexities and fracture networks are known. Typically, the

geological models are upscaled from single wells and have large uncertainty (Mohaghegh, 2011).

Furthermore, the models that describe the morphology of fractures are grossly oversimplified:

real hydraulic fracture networks rarely resemble perfect ellipsoids and are impossible to perfectly

model. Due to these assumptions, numerical simulators are usually poor estimators of actual

production performance.

An alternative approach is to use an empirical, top-down approach using machine learning

algorithms. In this approach, minimal assumptions are made about the structure of the reservoir

or how fluid flows through it. Rather the input variables of each stage are presented to a machine

learning algorithm that attempts to link them to the production. The purpose of this study is to

develop a model that can predict, with reasonable accuracy the five-year production

performance, post hydraulic fracture of horizontal wells that targets a field with existing

development. The model utilizes a convolutional-recurrent neural network (c-RNN) hybrid to

link the input variables of each stage to a well’s production performance. The input variables are

highly detailed and include the rock mechanical properties surrounding each perforation interval,

the type and size of stimulation of each stage along with the spacing, and completion order of

neighboring wells. Different combinations of inputs are tested to find which combination leads to

the best predictions. The study focuses on 74 horizontal wells located in the Montney Formation

in Alberta, Canada.

4.4 Input Data Preparation

this thesis and shown in Figures 1.2 and 1.3. To make sure that matrix properties are similar in

the wells, the study focuses only on the top layer of the Montney, Figure 4.1 shows an areal

extent of the 74 wells in the study area.

Figure 4.1: Areal extent of the 74 horizontal wells used in the study.

For this study we gathered the geological properties and completion variables of each stage and

perforation interval for all 74 wells. We also gathered the well spacing and completion order.

Table 4.1 shows the stage input variables that were used in the experiments. The five-year

production profile of the wells was used to train the network.

Table 4.1: The stage variables that were used as inputs in the experiments.

Rock mechanical properties Completion Well spacing and completion order

RHOBnear Type of fluid Length of unbounded time

DTSnear Total proppant placed Unbounded gas production

DTPnear Total fluid injected Length of time each offset well

produced before well is drilled

RHOBfar Total CO2 injected Volume of gas each offset well

DTSfar Distance between

perforation clusters

Percentage of length that each offset

well covers this well

DTPfar Average perpendicular distance from

each offset well to this well

The number of clusters per stage was typically only one. However, some of the newer wells had

up to three clusters per stage. The clusters in a stage share the total proppant and fluid amounts

of the stage. This makes it difficult to allocate the amount of proppant and fluid entering each

cluster. To account for this, the rock mechanical variables of those stages were taken as the

average of all the clusters in that stage.

4.4.1 Geological Properties

Both the matrix and the natural fracture network contribute to the overall geological properties.

As previously mentioned, the matrix properties are expected to be similar between wells.

Fracture networks which are based on the rock mechanics would be expected be vastly different

not only between wells but between perforation clusters along the wellbore. The rock mechanical

properties were used to describe the geology surrounding each perforation cluster. Rock

mechanics are a function of the DTS, DTP and RHOB logs. As shown in Figure 3.4, only 14/180

of the vertical and deviated wells in the study area had DTS measurements. In Chapter 3 a

procedure was described that generates accurate synthetic shear sonic logs (with an average

MAPE of 1.2%). For this study, this procedure was used to generate DTS logs for 166 of the

vertical and deviated wells that were missing one. The Petrel software (Schlumberger, 2019) was

used to upscale and interpolate each of the three logs from all 180 wells throughout the

formation, generating a 3D model of each of the three logs enabling an estimation of the DTS,

DTP and RHOB log profiles along the horizontal wellbores. Two types of profiles were built: the

near and the far. The near profiles were used to estimate the rock mechanical properties at the

wellbore; they would be expected to directly influence how the fractures start to propagate. The

far profiles shown were built using 50 m3 cubes that follow the wellbore profile. Each cube

resembled the average of the properties within that cube. The cubes were an attempt to describe

the how hydraulic fractures would propagate through the cube after they were initiated. The

measured depth (MD) of each cluster was used to get the corresponding rock mechanics

surrounding it. Each perforation cluster in each well had 6 rock mechanical values: (DTS, DTP,

RHOB)near and (DTS, DTP, RHOB)far.

4.4.2 Completion Variables

Table 4.2: Completion variables

Completion Variable Range per stage

Service company performing the stimulation Numerous companies

Type of completion Sliding sleeve or plug and perf

Gun diameter 73 mm – 86 mm

Charge size 15 - 31

Number of perforations per cluster 11 - 42

Distance between successful stages 27 m – 875 m

Amount of acid used in spearhead stage 0 m3 – 9 m3

Fluid type slickwater or crosslinked gel

Size of pad stage 1 m3 – 138m3

Amount of energizer (CO2) 20 m3 - 560 m3

Proppant concentration 25 – 1,920 kg/m3

Proppant size and type 50/40, 30/50, 20/40 mesh

Total proppant placed 0 tonnes – 300 tonnes

Total proppant inejcted 10 m3– 1400 m3

Size of flush stage 0.5 m3 – 98 m3

Amount of crosslinked gel 0 m3 – 8 m3

Flowrate of each stage 1 m3/min - 13 m3/min

Pressure of each stage 5 – 75 MPa

All of the completion variables that are listed in Table 4.2 were gathered for each stage of the 74

horizontal wells in the study using publicly available completion reports. Not every well had all

these variables publicly available. From the various stage variables gathered, to limit the number

of variables, the experiments were conducted with the following variables: the type of fluid, total

proppant placed, total fluid injected, total CO2 injected, and the distance between perforation

clusters. These variables were chosen since they were available at every well.

The 74 horizontal wells in the study were completed between 2006 and 2017, the amount of

stages per well ranged from 5 to 31. In total there was 943 attempted stages for all 74 wells.

Some of the stages failed due to pump problems, screen off or the breakdown pressure never

being reached. There were a total of 23 (about 2%) failed stages and all of the failed stages

resulted in minimal amounts of proppant entering the formation (maximum 3 tonnes). For the

purposes of this study the failed stages were ignored, thus the total amount of stages used was

920. Each successful stage has completion variables as well as rock mechanical properties linked

to it.

Well spacing and completion order were considered at a well level not a stage level. Because of

this, the input values representing spacing and completion order for the well would be equal

every stage in that well. The following inputs were gathered for each well:

a) The length of time the well produced unbounded;

b) The volume of gas the well produced unbounded;

c) The length of time that each offset well produced before this well began production

(could be positive or negative depending on the order);

d) The volume of gas that each offset well produced before this well began production;

e) The percentage of length that each offset well covers this well;

f) The average perpendicular distance from each offset well to this well.

Each well in the study could have up to three offsetting wells (one on one side and two wells

sharing a portion of the length of another). The reason that (c) above could be positive or

negative is because the number of offsets each well has during field development is not constant,

a negative value in (c) means that this well was drilled before the offset was drilled. This is

important to account for as future offsetting wells may negatively impact the production of the

current well. If we use the model to predict the production performance of a new well,

accounting for when a new offset would be drilled beside this well would require some

forecasting; however, this decision is fully controlled by the operator and for all of the wells in

this study, the timing is already known. Having the ability to tell the network when a future

offset well would be drilled would enable the model to run various sensitivities.

4.4.4 Production Data

Predicting the gas production profile is the focus of this study. Gas production data, along with

the operational hours were gathered from the public database on a monthly basis for each well

from the start of production to January 2020. The 74 wells in this study produced mainly dry gas,

condensate production was rare and only reported in a few wells for a small portion of their

production history. The condensate production made up only 0.01% of the total barrel of oil

equivalent (BOE) production of the 74 wells and due to this the condensate production was

ignored for this study and all wells were assumed to produce dry gas.

The amount of gas that a well produces in a month is controlled by the reservoir and operating

conditions. Reservoirs conditions behave in a somewhat predictable manner, but well operation

is human driven which makes it impossible to predict. For example, most of the wells in the

study had rate a restriction between 5,000 and 8,000 Mscf/day and most of the wells did not

produce 100% percent of the time for every single month. In order to have a model that predicts

reservoir behavior, the operating conditions needed to be constant. To do this, we created an

Arps decline curve that mimicked the true production data in the rate cumulative plot. The rate

restriction for all wells was set at 5,000 Mscf/day and the production time was set to 100%. An

example of the Arps overlaying process for one of the wells is shown in Figure 4.2. Figure 4.3

shows how the Arps overlay rate restriction is lower than the actual for some of the wells. The

Arps decline is a smooth curve and represents what the production trend would look like under

perfect conditions.

Figure 4.2: A visual representation the Arps decline overlaying the true production. This is

shown on the rate-cumulative production plot for one of the wells. The red line represents the gas

rate assuming 100% on time every month, the orange line represents gas rate scaled down to

account for the actual on time during the month.

Figure 4.3: A plot showing the differences in rate restrictions between the actual production and

that of the Arps decline curve.

Production trends can be described by volume produced, time produced or the production rate

which is a derivative of the two. Figure 4.4 shows the various ways to visualize a production

trend. The most commonly used are the cumulative production versus time plot, rate versus

cumulative production plot and rate versus time a plot. This study focuses on the cumulative

versus time plot, specifically a five-year cumulative production curve at one-year increments.

The cumulative versus time plot was chosen because it provides a simple, high level view of

production performance that can aid an operator when running sensitivities on where to drill a

new well and how to complete it. Although the rate-based plots provide much more detail such

as the initial rate, decline rate, decline exponents, shut down time and flush production, these

would add unnecessary complexity and cause the network to make less accurate predictions.

Figure 4.4: The three common types of plots used to describe the production performance of a

well. Cumulative versus time (top), rate versus cumulative (middle), rate versus time (bottom).

These plots were generated using one of the wells in the study.

Some wells in the study have not yet reached five years of production, which creates a problem

when training the network. For these wells, the Arps decline curve was used to forecast out

production beyond the end of historical data. Only 16 out of the 74 wells required to have their

production to be forecasted between 1 and 30 months beyond January 2020. Although this

inevitability introduced some uncertainty to the analysis it was very useful in that it allowed the

experiment to be run on all 74 wells. The experiment was also run using only the 58 wells with

no forecasted production to see if forecasting production for the 16 well lead to better prediction

accuracy.

4.5 Experimental Setup

4.5.1 Networks used in the experiments

Two types of networks were used in this study: a simple feedforward network (ANN) and a multi

headed convolutional-recurrent hybrid (c-RNN). The networks were programmed in Python

using the keras library (Chollet, 2015).

The ANN was chosen as it is the simplest type of network and serves as good baseline to

compare the performance to the more complicated c-RNN. The c-RNN hybrid network was

chosen because it combines the speed and ability to process large amounts of data of a

convolutional network with the sequence processing ability of a recurrent network. The c-RNN

also uses a 3D input which allows it to use a stage level input. In Chapter 3, the c-RNN was

found to have superior performance to that of the ANN.

In multi-headed architectures each input variable is handled by a separate convolutional network

(head) and the output of each of these networks (heads) are merged and inputted into a recurrent

network before a prediction is made; these types of models are known to offer better

performance in some instances (Bagnall, 2015). The architecture of the multi-headed c-RNN is

Figure 4.5: c-RNN structure that was used in the study.

4.5.2 Input shape and normalization

An ANN requires a 2D input (sample, variable). Since the output of the networks (production

profile) is forecasted on a well level, the input into an ANN must be on a well level as well. The

problem is that the input variables are different for every stage in a well, so the only way to use

an ANN for these experiments is to average an input variable along all stages in a well. The

expectation then is that this would lead to higher error rates because certain variables such as

rock mechanical properties can change drastically from stage to stage over a 2 km horizontal

wellbore.

The c-RNN on the other hand uses a 3D input (step, variable, sample). The c-RNN input shape

works much better with our study because our input data is on a stage level, each stage goes into

the step axis. The input shape is depicted in Figure 4.6 as a series of tables, where the columns

represent the input variables and the rows represent the stages and each table is a different well.

Figure 4.6: Format of input and output data of the c-RNN model

Each value in the tables represents a particular input variable at a particular stage. The maximum

number of successful stages per well in this study was 31. The input shape cannot change during

an experiment so wells with less than 31 successful stages simply had a “0” value associated

with the stages it did not have. This made the total size of the training input = (n, 31, 73) and the

total size of the test = (n, 31, 1), where n is the number of input parameters chosen for the

experiment. Because the number of stages in a well is in a different axis than the rest of the

variables, it can be thought of as a baseline parameter that is always present no matter what

combination of input parameters are shown to the network. The c-RNN was used as the main

network in the experiment because of its 3D input shape. The ANN performance was compared

to the c-RNN in the last experiment which used the best configuration of input parameters.

All the input variables along with the 5-year predictions where normalized to values between 0

and 1, where 0 represents the minimal value in that parameter and 1 represents the maximum.

Normalizing the inputs and outputs is a very common step in pre-processing of data, as it

removes the differences in the scales across variables that may decrease model performance. For

example, the RHOB values in this study ranged between 2.45 g/cc and 2.66 g/cc but the total

fluid placed in a stage ranged from 66 m3 to 1,805 m3 if the data was presented in its raw form.

The network would put more bias on the fluid placed per stage as the numerical values are orders

of magnitude larger than RHOB.

4.5.3 Experimental Setup

The study only contained 74 wells. In machine learning this is considered to be a small sample

size. The best method to determine model accuracy for limited sample sizes is the “leave-one-

out” cross-validation method. In this method, 73 wells are used to train the network and one well

known as the blind well is used to evaluate. The network is trained 74 time using 74 different

blind wells and the average accuracy of all 74 wells are examined.

The metrics used to measure the accuracy of the model was the mean absolute percentage error

(MAPE) and mean absolute error (MAE):

𝑀𝐴𝑃𝐸 = 1

𝑛∑

|𝐴𝑡−𝐹𝑡|

𝐴𝑡

𝑛𝑡=1 𝑀𝐴𝐸 =

𝑛∑ |𝐴𝑡 − 𝐹𝑡|𝑛

𝑡=1

Where At is the actual value, Ft is the forecasted value and n is the number of samples. MAPE is

the most common accuracy metric and works best if there are no extremes in data. The MAPE

was used as the main metric in this study because it scales the error to the data and give the

ability to compare the results between wells with different cumulative productions. Due to the

scaling of errors, each individual well MAPE contributes an equal amount to the entire field

MAPE, which removes bias towards wells with larger production volumes. For this study the

individual well MAPE was calculated by averaging the volumetric error of every year in the 5-

year cumulative production trend. The field MAPE (average of all 74 wells) was then used as the

ultimate metric when comparing the results of various combinations of input variables and

network hyperparameters. The MAE was used as a secondary measure of accuracy to make sure

that the MAPE is not affected by extreme outliers, the MAE is a volumetric measurement and

represents the average error of all 74 wells in all 5 years of production forecast. If there are no

extremes in data, the total field MAE and MAPE are expected to follow similar trends.

The outcomes of a neural network are not only dependant on the training data, but also the

starting values of the weight parameters. In machine learning, a model is typically initiated with

random weights, so the model can have a range of predictions while using the same training data,

an example of this is shown in Figure 4.7. Having a range of outcomes is problematic as it is not

consistent and makes the model less trustworthy. The only way to have consistent predictions is

by taking the average of multiple runs. The more times a model is restarted and run the less the

average prediction will differ, Figure 4.8 shows how the average prediction changes with the

amount of runs for a well. We chose to use 30 runs per blind well because the average did not

change much with additional runs which required more computing time. Each run had a total of

300 training epochs, this value was chosen since the blind test error for all 74 wells no longer

decreased at that point.

Figure 4.7: Plot showing the numerous predictions that a model trained on the same dataset could

Figure 4.8: Plots showing how the average prediction changes based on the number of runs

2 runs

5 runs 10 runs

30 runs 100 runs

500 runs

4.5.4 Hyperparameter Tuning

Network hyperparameters such as the number of nodes and layers, and number of epochs can be

tuned after all 74 experiments are completed to see if an improvement is made. The network

hyperparameters used for the experiments were chosen by a trial and error procedure that

includes the following steps:

1. Choose which hyperparameters can be modified – in our case this was the number of filters

and pooling layers in the convolutional heads as well as the number of nodes and layers in

the RNN part.

2. Define a range for these hyperparameters (ie, maximum and minimum number of nodes and

layers)

3. Run sensitivities using the same experimental setup but changing the hyperparameters. First

change one hyperparameter at a time then run different combinations of each to see which

configuration yields the best result.

The amount of combinations is infinite, so finding the optimal hyperparameters is impossible;

however, this approach does help tune the network to produce fairly accurate results. The tuned

hyperparameters used for the network are presented in Table 4.3.

Table 4.3: Starting hyperparameters for the c-RNN developed in this study

Same for every head

Layer Input Conv Conv

Pooling Merge Conv GRU GRU Output

Number of

nodes 1 64 64 -

32 50 50 1

Activation

Function -

rectified

linear

rectified

linear

unit -

rectified

linear

rectified

linear

rectified

linear

unit -

Recurrent

Dropout - - - -

- 0.5 0.5 -

Once the network hyperparameters were chosen, multiple experiments were run on different

combinations of stage input parameters to see how they affected the error metrics. The first

experiment was a base case prediction which was run using zero stage variables. In this case the

input shape of the training data was (1, 31, 73) where the x-axis was either a 1 or a 0 representing

a stage being present or not. By doing this, the only information presented to the network was the

number of stages in a well. Next, the experiments were run on individual factors, followed by a

combination of factors from the geological and completion categories. Finally, once the best

combination of parameters was found the well spacing and completion order information was

added to see if it had any affect on the outcome.

Table 4.4: Results of the leave-one out experiments performed on only the geological variables

Variables Used Average MAPE Maximum MAPE Minimum MAPE MAE (MMcf)

No variables (base case) 21.18% 112.31% 1.73% 685

RHOBnear 19.60% 84.49% 1.82% 648

DTSnear 19.56% 83.44% 2.08% 663

DTSfar, DTPfar and RHOBfar 19.08% 89.79% 1.63% 634

DTPnear and DTSnear 18.74% 91.22% 0.90% 629

DTPnear 18.39% 91.71% 1.72% 612

RHOBnear and DTSnear 17.91% 68.86% 1.56% 606

RHOBnear and DTPnear 16.73% 85.17% 1.74% 565

(DTS, DTP, RHOB)far and (DTS, DTP, RHOB)near 16.60% 84.91% 1.64% 563

DTSnear, DTPnear and RHOBnear 16.57% 84.42% 1.78% 562

Table 4.5: Results of the leave-one out experiments performed on only the completion variables

Distance between successful stages m 23.07% 100.84% 2.00% 757

Total CO2 injected 21.96% 124.02% 1.94% 737

Total proppant injected and total fluid injected 21.14% 105.50% 1.72% 687

Total fluid injected 21.08% 124.23% 1.17% 682

Type of fluid used 20.68% 118.72% 1.36% 658

Total proppant injected 20.43% 99.04% 1.88% 662

Total proppant, fluid and CO2 injected 19.38% 100.55% 0.95% 636

Total proppant, fluid, CO2 injected, and type of fluid used 19.21% 100.30% 0.80% 633

Table 4.6: Results of the leave-one out experiments performed on geological, completion and

spacing variables

(DTS, DTP, RHOB)near, total proppant, fluid, CO2

injected, and type of fluid used 17.34% 79.88% 1.41% 583

(DTS, DTP, RHOB)near, total proppant and fluid injected 16.37% 84.95% 1.85% 563

(DTS, DTP, RHOB)near and total proppant injected 16.27% 89.99% 0.82% 556

(DTS, DTP, RHOB)near, total proppant injected, well

spacing and completion order 14.90% 83.12% 0.97% 534

Tables 4.4 through 4.6 summarize the results of the leave-one-out experiments using the c-RNN

network. Each table has the base case prediction included for comparison; the MAPE and MAE

of the base case came out to be 21.18% and 685 MMcf respectively. The MAE was used as a

secondary check to make sure the MAPE is not affected by very large or very small

measurements. From the tables it is clear that the MAE and MAPE follow similar trends,

because of this we will only comment on the MAPE for the remainder of the study.

Table 4.4 shows the results using combinations of only rock mechanical variables. Using all

three near profile logs resulted in average prediction error of 16.57% which is lower than the

19.08% error resulting from the far profile logs. Also, using both far and near profile logs as

inputs did not have any improvements on the prediction performance over using just the near

profile logs. This is likely because the far profile logs were averages of 50 m3 blocks, that ended

up being poor representations of how fractures propagate through a formation whereas the near

profile logs were not averaged and describe the conditions at fracture initiation. Predictions were

also made using individual near profile logs, out of the individual well logs, the DTP log lead to

a better prediction than the DTS or RHOB. Using all three rock mechanical logs as inputs

resulted in better performance than just one or two; this suggests that using all three logs is

important when describing the rock mechanics.

Table 4.5 shows the results using only completion variables. Runs were done on individual

variables as well as combinations of variables. Out of the individual variables, the total proppant

injected had the lowest average error of 20.43%. The combination of total proppant, fluid, CO2

injected, and the type of fluid used had the lowest error of 19.21%. Using the distance between

successful stages as the only input made the model predict with and error of 23.07% which was

worse than that of the base case. The prediction accuracy of the model was greater using only the

rock mechanical properties versus only the stimulation variables. This suggests that the rock

mechanics surrounding a well have a greater effect on production performance than the

stimulation design.

Table 4.6 shows the results from the combination of rock mechanical properties and completion

variables as well as the addition of the well spacing and completion order. The best prediction

with an average error rate of 16.27% was made by using the three near profile logs in

combination with the total proppant injected. Adding more stimulation variables to the input

such as total fluid/CO2 injected, or type of fluid used did not improve the accuracy of the model.

Finally, when information about the well spacing and completion timing was added to the input,

the network was able to make even better predictions with an average error rate of 14.90%, this

was the best case in this study. This points to the fact that this information plays an important

role in production performance. The difference between the best case and the base case was only

6.28%, this suggest that the number of stimulation stages in a well (which is the input of the base

case) has a large affect on production.

Figure 4.9 shows the distribution of the individual well MAPE values for the best case, 47 out of

the 74 wells had a MAPE <15%, and almost all the wells had a MAPE <50%, only one well had

an error rate of 83%. Figure 4.10 shows the production profiles of the best and worst wells.

Figure 4.9: Distribution of individual well MAPE from the best case.

Figure 4.10: Plot the worst (left) and best (right) well production profiles created by the best-case

model. The red line is the average of the 30 runs, the green line is the true profile.

There were 16 wells used in the training set that have not yet reached five years of on production

time, and their trend had to be forecasted out to using Arps decline. To see if this forecasting of

production had an impact on results the best case was also run with the 16 wells removed. The

average error rate for the 58 non-forecasted wells was 15.2% when the 16 forecasted wells were

included in the training set. When the 16 forecasted wells were removed from the training set,

the average error rate for the 58 non-forecasted wells turned out to be 18.0%, this suggests that

using more wells in the training set is beneficial even if production from these wells needed to be

partially forecasted.

The input parameters for the best-case experiment were also used to test ANN accuracy. Since

the ANN can only take a 2D input, the input variable had to be averaged along the stages. The

MAPE from the ANN turned out to be 20.6%. This demonstrates that the c-RNN is the better

network to use.

Predicting individual well production before it is drilled is difficult, and it comes at no surprise

that the best case MAPE of the model was 14.9%. There are several reasons for this. Firstly, the

information about the geology surrounding a horizontal well comes with great uncertainty.

Secondly, the process of how hydraulically induced fractures propagate through a formation is

extremely complex and depends on a myriad of factors that cannot be measured; the possibilities

of network structures are endless. Also, the inability of measuring production coming out of each

perforation interval makes it impossible to link the rock mechanics and stage parameters directly

to the stage production. Finally, the interaction affects between wells are difficult to quantify and

add more complexity to the problem.

An interesting observation was that the accuracy of the model increased if the individual well

predictions were added together and compared to the actual. Adding up the best-case production

profiles of all 74 wells resulted in an average MAPE of only 0.8%, the aggregated production

profile of the 74 wells is depicted in Figure 4.11.

Figure 4.11: Plot the best-case aggregate production profile of all 74 wells vs time.

This reduction in MAPE is due to the network underpredicting the performance of some wells

and overpredicting the performance of others. When these differences are added they begin to

cancel each other out. To see how the number of wells in aggregated group affect the MAPE, we

chose aggregate numbers of 2, 3, 5, 10, 30 and 50 wells and ran 500 different combinations of

randomly chosen wells for each aggregate amount and recorded each combinations MAPE. The

aggregate MAPE was taken as the average of the 500 runs. Doing 500 runs on each aggregate

amount produces a statistical representation of the average of all possible combinations of wells

in that aggregate. The results of this analysis is shown in Figure 4.12.

Figure 4.12: Plot how aggregating well together affects the mean average percent error (MAPE).

The MAPE drops exponentially with the increase in the number of aggregated wells. By

predicting the summed production profile of 10 wells the prediction accuracy drops to 4.5%,

which is a significant improvement over the 15% error rate of individual wells. Due to this, the

model developed in this study becomes more useful with larger development plans. The

aggregated production profile assumes all wells will come on production at the same time, which

is not the case in real life development, but it would still be of use to provide an overall picture

of total production that would be expected from the new wells.

Since the DTP was an important input it is important to understand how the DTP at the

perforation intervals affect the cumulative production at the wellhead. The DTP affects the

brittleness of a rock which dictates the way fractures form within a reservoir. A more brittle

reservoir is expected to produce more hydrocarbons than a ductile one since brittle rock forms

more complex fracture networks which increase the contact area between reservoir and wellbore.

Brittleness is related to the mechanical properties and rocks with a large Young’s modulus and

small Poisson’s ratio are more brittle than those with a small Young’s modulus and large

Poisson’s ratio (Wang and Gale, 2009). When the DTP is increased and all other variable

including the DTS are held constant both the Poisson’s ratio and the Young’s modulus decrease;

however, the Poisson’s ratio decreases more drastically. Conversely when the DTP is lowered

both the Poisson’s ratio and the Young’s modulus increase, however the Poisson’s ratio increases

more drastically. Because of this, the brittleness increases with increasing DTP and since

brittleness is related to production increase in DTP is expected to increase production as well. To

test this a sensitivity was run to see what affect the DTP input has on cumulative production. In

this sensitivity all input variables for all 74 wells were held constant and only the DTP stage

inputs were multiplied by a factor ranging between 0.5 and 1.5 to see how this affected the

aggregated 74 well 5-year cumulative production. Figure 4.13 shows the results of the

sensitivity. The results show a linear correlation between DTP and cumulative production and

that the cumulative production increases with increased DTP as expected.

Figure 4.13: Results of a sensitivity showing how the 5 year cumulative production from all 74

wells is affected with changes in the DTP.

4.7 Conclusions

In this study we developed a machine learning model to predicts the five-year cumulative

production profiles in multistage hydraulically fractured wells. The novelty of this study is that it

uses not just the completion variables of a well but also the rock mechanical properties

surrounding perforation clusters along with well spacing and completion timing of wells. The

model was able to predict individual well production profiles with an average error rate of

14.9%, with the best well having an error of only 1% and the worst well having an error of 83%.

The average error rate decreases exponentially as the production of multiple wells is aggregated.

There are many ways to improve this model’s performance, arguably the biggest would be to

have a flow rate measuring tool installed at every perforation cluster. Having this tool would

50% 60% 70% 80% 90% 100% 110% 120% 130% 140% 150%

f 74 w

% of oiginal DTP

enable linking the completion variables and the rock mechanics of each stage directly to the

amount of production coming from that stage. Having more vertical wells with actual DTS logs

would lessen the need to generate synthetics which would improve the accuracy of the 3D

models of the DTS logs. Also, it would be useful to have a 3D seismic survey of the formation

since it can be used to identify faults which could be used as inputs into the network as well.

Finally, it is unlikely that the neural network used in this study has the optimal configuration.

Finding this optimal would improve the accuracy but would require many more tests. Installing

this many measuring devices and having DTS logs on every well would most likely be

uneconomic and would probably never occur in a profit driven industry.

Unfortunately, even if economics was not a factor, there is no way to build a model capable of

predicting the production performance of each well with 100% certainty. This is because we are

not able to see with full clarity what goes on under the ground. Geological formations are very

heterogenous and any computer model will fall short of describing their true structure, in

addition to that, fracture dynamics are extremely complex and the exact fracture configurations

before and after stimulations are not possible to observe.

That being said, this model would be of great use to anyone planning on drilling multiple wells

in fields with existing development. This is because the model accuracy was shown to increase

exponentially if it is used for predicting production of more than one well. The lower error gives

the ability to run multiple field development scenarios without having to spend capital. This

would grant the ability to optimize well placement and completion which would lower overall

costs and potentially lower the environmental impact of hydraulic fracture operations.

Chapter 5: Optimizing Water Usage during Multi-Stage Hydraulic Fracturing with a

Convolutional-Recurrent Neural Network

5.1 Preface

This Chapter has been submitted as a manuscript to the Journal of Petroleum Science &

Engineering in 2020. This article is co-authored by Ian D. Gates.

5.2 Abstract

A machine learning model was developed and trained on 74 existing wells in the Montney

formation in Canada, to run sensitivities on 40 proposed wells that are to be drilled alongside the

existing producers with the objective of minimizing water usage. 1,080 sensitivities were run to

explore how different combinations of stage count, fluid type, water amount, and proppant

amount affect the aggregated 5-year cumulative gas production of the 40 proposed wells. The

results from the machine learning analysis finds that the injection efficiency (cumulative

gas/total fluid injected) drops exponentially when more water is injected. The results show that it

is possible to achieve a cumulative production that is 76% of the maximum by using only 3% of

the water required to achieve this maximum. This represents a significant reduction of the water

usage which supports cleaner and more efficient drilling operations as well as lower costs

associated with water treatment and disposal as well as potential induced seismicity.

5.3 Introduction

Currently, hydraulic fracturing has become the most common method for producing

hydrocarbons from ultra-low permeability reservoirs. Advancements in hydraulic fracturing and

horizontal drilling techniques over the past several decades have made producing hydrocarbons

from unconventional reservoirs economically feasible which lead to a shale boom in North

America and throughout the world (Morton, 2013). In hydraulic fracturing, large amounts of

water and sand are pumped down a wellbore and injected into a target formation at high rates

and pressures (Montgomery and Smith, 2010). The injected fluid creates new fracture networks

starting at the well within the formation that are propped open by the sand effectively increasing

the surface area of the reservoir that is connected to the wellbore (Yew, 1997). Modern hydraulic

fracturing is typically performed in multiple stages along the entire horizontal wellbore. After

hydraulic fracturing, the well is put on production with relatively short peak production period

with a rapid decline to a plateau value that can be maintained for an extended period of time (Tan

et al., 2018).

Most hydraulic fluids are water-based and contain 90 to 97% water by volume (U.S.

Environmental Protection Agency, 2016). Two of the most common water-based fluid used

today are slickwater and crosslinked gel (Montgomery, 2013). Crosslinked fluids were

introduced in the 1960s after it was found that fractures heal unless a propping agent was co-

injected with the water (Palisch et al., 2010). Crosslinked gel is a relatively viscous fluid

(compared to that of water) typically used in ductile formations with higher permeability.

Commonly, guar gum is used as the thickening agent and borate ions as the crosslinking agents.

The advantage of using crosslinked gel is that they exhibit low fluid leak off, low pump rate, low

water usage and ability to pump high sand concentrations (Fink, 2015). The biggest concern with

this type of fluid is that the gel residue that remains in the formation may block the newly created

fractures (Belyadi et al., 2017). Slickwater fluid is typically used in low-permeability, high net

pay reservoirs. Slickwater is water that has had its viscosity reduced by the addition of polymers

such as polyacrylamides (Montgomery, 2013). The lower viscosity of slickwater leads to a

reduction of viscous effects (friction). From a pumping point of view, this lowers the energy

required to move the fluid but on the other hand, this reduces the ability of the liquid to suspend

and transport the proppant from the surface to the reservoir (Palisch et al., 2010). This also leads

to narrower fracture aperture which in turn results in lower hydraulic conductivity of the fracture

(Barati and Liang 2014). To offset this, slickwater is pumped at relatively higher rates (100+

bbl/min) and relatively low concentrations of proppant resulting in massive water requirements

(Barati and Liang 2014). The primary technical advantage of slickwater is the reduced damage

within fractures and reduced fracture height growth (Fink, 2015). Depending on water

availability and costs, low concentration slickwater treatments may also be lower cost than

crosslinked gel since they require less chemicals (Ba et al., 2019). Given the relatively large

volume of water for slickwater stimulation, there is a pressing need to reduce the amount of

water injected into the formation. Furthermore, there might be advantages of low volume

hydraulic fracturing due to the lower chance of induced seismicity (Schultz et al., 2018).

Although there has been a lot of research on which fluid is better for a particular type of tight

formation, for the most part, the choice of fluid is driven by the actual production data and the

operator’s success in a particular formation (Belyadi et al., 2017). Lately, slickwater treatments

have become the more popular choice for hydraulic fracturing stimulation. The three main

reasons for this are as follows (Palisch et al., 2010):

1. industry’s need to cut costs (via use of less chemicals),

2. reservoirs being fractured today have lower permeability and gel clean-up from the

induced fractures have become a stimulation challenge, and

3. fractures created by crosslinked gel were not performing as well as expected due to

formation damage.

Based on a data survey of around 40,000 wells in the United States, the average volume of water

used to create a hydraulic fracture is about 9,500 m3 (Freyman, 2014) with around 30,000 wells

being fractured per year between 2011 and 2014 (U.S. Environmental Protection Agency, 2016).

The fracture intensity, well length and stage count of multi-stage horizontal wells have also

increased over time which has led to an increased water usage. Kondash et al. (2018) found that

between 2011 to 2016, the average water injected along with the average flowback per well has

increased in all six of the major shale producing regions in the Unites States. The large amounts

of water injected during hydraulic fracturing impacts surrounding water resources and requires

more wastewater management (U.S. Environmental Protection Agency, 2016). The greater the

water used, the larger the impacts (Schultz et al., 2018).

The water that is used in hydraulic fracture fluid is typically sourced from groundwater and

surface water resources (U.S. Environmental Protection Agency, 2016). The water may also be

sourced from reused wastewater from previous fracture jobs however this is not widely practiced

and only ~5% of fracture jobs use recycled wastewater (U.S. Environmental Protection Agency,

2016). Groundwater and surface water resources are also the main source of drinking water,

household processes, irrigation, livestock and industrial processes. Water used for a hydraulic

fracture on a single well does not usually impact local water resources; however, if multiple

treatments for many wells are being performed in a single area, the total volume of fluid needed

to fracture the wells may take up a significant portion of locally available water resources

(Scanlon et al., 2014). Areas that are prone to drought or hot weather are significantly impacted

since high withdrawals of local surface and ground water may reduce drinking water availability

(U.S. EPA, 2015). For example, in 2011, water wells overlying the Haynesville Shale used for

extracting drinking water were excessively drained due to local hydraulic fracture operations and

drought (Louisiana Ground Water Resources Commission, 2012). Hydraulic fracturing may also

impact groundwater levels: a study conducted by Scanlon et al. (2014) in Texas showed that

groundwater levels dropped by 31 meters to 61 meters after hydraulic fracturing activity

increased in 2009. High local water withdrawal may lead to erosion, sedimentation, and habitat

fragmentation (Lin et al., 2018).

Large volumes of injected water result in large volumes of flowback wastewater which needs to

be treated, handled, transported, and disposed of (Veil, 2015). Wastewater is typically reinjected

or disposed of above ground. Reinjecting can either be done via injection wells or in other

hydraulic fracture operations. To handle the wastewater above ground, it can either be processed

in wastewater treatment facilities and released back into rivers or evaporated in evaporation

ponds (U.S. Environmental Protection Agency, 2016).

Wastewater handling carries risks and reinjecting water can potentially contaminate freshwater

aquifers (Faruque and Goldowitz, 2017) or cause inducing seismicity (Eaton et al., 2018). The

choice to reuse wastewater in hydraulic fracturing is rarely practiced since it depends on the

quality, quantity and cost associated with reusing the wastewater. Inadequate wastewater

treatment has been known to impact drinking water resources. For example, in Pennsylvania,

wastewater from Marcellus Shale gas wells was treated and released to surface waters. The

wastewater treatment facilities were unable to properly remove the high levels of total dissolved

solids and the discharged wastewater contributed to elevated levels of total dissolved solids

(particularly bromide) in the Monongahela River Basin (Pennsylvania Department of

Environmental Protection, 2015).

The literature suggests that there is a compelling desire to reduce the amount of water used in

hydraulic fracturing jobs. In this study, we use the model developed and trained in Chapter 4 to

run sensitivities on a proposed future field development scenario. In this scenario, 40 additional

wells are to be drilled adjacent to the existing 74 wells and are scheduled to all come on

production at the same time. 1,080 sensitivities are run to explore how different combinations of

stage count, fluid type, water amount, and proppant amount affect the aggregated 5-year

cumulative gas production of the 40 proposed wells. The ultimate purpose of the study is to see if

injected water can be minimized without having a large negative impact on the 5-year

cumulative production.

5.4 Study Area and Proposed Wells

this thesis and shown in Figures 1.2 and 1.3. Figures 4.1 shows the areal extent of the 74

producing wells that currently exist in the area. The 40 proposed wells chosen to be drilled are

Figure 5.1: 40 proposed well locations (blue) added to the existing 74 wells (red) in the study

area generated in Petrel.

The positioning of these 40 locations represent a full field infill drilling plan that would take

place in a field with existing development. The length of the proposed wells was fixed at 2 km as

this was the representative length of existing wells. Variable stage counts of 10, 15 and 30 per

well were used for the sensitivities. This range was chosen as it represents the minimum, average

and maximum number of completion stages that were done in the existing 74 producers.

The model used in this study uses three groups of input parameters at every stage along the

wellbore to make a prediction about the gas production. These 3 groups are:

• well spacing and completion order,

• rock mechanical properties, and

• completion parameters

The individual parameters of each group are listed in Table 5.1.

Table 5.1: The stage variables that were used as inputs in the sensitivity experiment.

Rock mechanical properties Completion Well spacing and completion order

Density (RHOB) Type of fluid Length of unbounded time

Compressional Sonic (DTP) Total fluid injected Unbounded gas production

Shear Sonic (DTS) Total proppant placed Length of time each offset well

Volume of gas each offset well

Percentage of length that each offset

well covers this well

Average perpendicular distance from

each offset well to this well

Existing producers in this study are drilled close to each other so the production of gas from one

well affects the production of its neighbors. This was shown in Chapter 4 where the individual

well error rate of the model decreased from 16.3 to 14.9% when well spacing and completion

order parameters (time and gas volume that a well produced in an unbounded state, time and gas

volume that offset wells produced before a well was drilled, percentage of length that each offset

well covers a well, average perpendicular distance between offsetting wells) were taken into

account. Because the well spacing and completion order was important, they were also used for

the 40 proposed wells in this study. The spacing parameters were based on distance from a

proposed well location to existing producers as well the distance between the proposed wells

themselves. Well spacing and completion order makes more sense on a well level not a stage

level, so the input values representing spacing and completion order for the well would be

constant at every stage in that well. This group of input parameters was also constant for every

sensitivity run as the trajectory of the wells was fixed.

5.4.2 Rock mechanical properties

The rock mechanical properties at each stage of the proposed wellbore trajectory where extracted

using the rock mechanical model built in Chapter 4. Briefly, this model was developed by

upscaling and 3D interpolating the density (RHOB), compressional sonic (DTP) and shear sonic

(DTS) logs of 180 vertical and deviated wells located in the study area. The DTS log was present

in only 14 of these wells and had to be synthetically generated for the other 166 using the neural

network developed in Chapter 3. Because stage count per well can be either 10, 15 or 30

depending on the scenario, the rock mechanical properties are different depending on the stage

count. For each proposed well we extracted 3 sets of rock inputs based on the number of stages.

5.4.3 Completion Parameters

For this study the fluid type, fluid amount and proppant amount are the completion parameters.

Just like the stage count, the completion parameters had a range of values in the sensitivity

analysis. The range of fluid and proppant amount as well as the proppant density was limited to

within the values of the existing 74 wells. This was done because the model was trained on the

input parameters of the 74 existing wells and extrapolating values outside the ranges that already

exist may lead to large errors. Table 5.2 shows the ranges of all the variable input parameters

used for the sensitivities as well as the increments used. On average, the 26 existing slickwater

wells used 650 m3 of liquid per stage while the 40 existing crosslinked gel wells used 145 m3.

Due to the much higher stage water volumes of the slickwater wells we split the input water

amount into two ranges, one for each fluid type.

Table 5.2: The ranges of all the variable input parameters and the increments used for the

sensitivity experiment.

Parameter Range Stage Count 10, 15 or 30 Fluid Type Slickwater or crosslinked gel Proppant Density 85-1,050 kg/m3 Proppant Amount per stage 40-300 tonnes (20 tonne increments) Fluid Amount per Stage Crosslinked: 90-260 m3 (10 m3 increments)

Slickwater: 250-1,000 m3 (50 m3 increments)

5.4.4 Input shape and normalization

As depicted in Figure 4.7, the input to the neural network is a series of tables where the columns

represent the stage variables and rows represent the stages. Each value in this table represents a

particular input variable at a particular stage. The maximum number of successful stages per well

in the existing 74 wells was 31. Since the network was trained on existing wells the input shape

cannot, so wells with less than 31 successful stages simply had a “0” value associated with the

stages it did not have. All the input variables along where normalized to values between 0 and 1,

where 0 represents the minimal value in that parameter and 1 represents the maximum from the

existing 74 wells.

5.5 Neural Network Algorithms and Experimental Setup

The neural network used in this study was developed in Chapter 4. In this work, the network was

trained using the 74 existing producers in the area with the supervised learning approach. During

supervised learning both the inputs and the outputs are provided to the network. The inputs are

the well spacing and completion order, rock mechanical properties and the completion

parameters. The output was the 5-year cumulative production profile in one-year increments. The

network predicts an output given the inputs, this predicted output is compared to the actual and

the error is backpropagated back through the system adjusting the weight parameters. The weight

parameters are started with random values but get adjusted as more of the well inputs and outputs

are shown to network. Once the network sees the entire data hundreds of times over the network

beings to find general trends and patterns within the study area and the overall prediction error

rate drops (Reed and Marks, 1998). Once the network is trained to a sufficiently low error rate it

can be used to make predictions.

The trained network cannot distinguish between existing or proposed wells, as long as the input

data is in the correct format the network will make a prediction. The trained network can be used

to see how the 5-year cumulative production of a proposed well is affected when the inputs are

changed, i.e. it can be used to run sensitivities. As long as the proposed wells are located within

the study area (the geology is similar to what the network has been trained on), of similar length

and of similar completion designs the aggregate error rate should be similar to that of the existing

producers. Figure 4.13 shows how the error of the model drops as more wells are aggregated, at

40 wells the error rate should be around 2%.

The model consists in this study is a multi-headed convolutional-recurrent hybrid network (c-

RNN). The c-RNN hybrid network was chosen because it combines both the speed and ability to

process large amounts of data of a convolutional network with the sequence processing ability of

a recurrent network. The c-RNN hybrid has been shown to outperform traditional neural

networks in Chapter 3. In multi-headed architectures each input variable is handled by a separate

convolutional network (head) and the output of each of these networks (heads) are merged and

inputted into a recurrent network before a prediction is made; these types of models offer better

performance (Bagnall, 2015). The network was programmed in Python using the keras library

(Chollet, 2015). The architecture of the network is shown in Figure 4.6 and the network

hyperparameters are shown in Table 4.3.

As depicted in Figure 4.7, the inputs to the neural network are a series of tables where the

columns represent the stage variables and rows represent the stages. Each value in this table

represents a particular input parameter at a particular stage. The maximum number of successful

stages per well in the existing 74 wells was 31. Since the network was trained on existing wells

the input shape cannot change, so wells with less than 31 successful stages simply had a “0”

value associated with the stages it did not have. All the input variables were normalized to values

between 0 and 1, where 0 represents the minimal value in that parameter and 1 represents the

maximum from the existing 74 wells.

The network was used to run 1,080 sensitivities to explore how different combinations of stage

count, fluid type, water amount, and proppant amount affect the aggregated 5-year cumulative

gas production of the 40 proposed wells.

Tables 5.3 and 5.4 show the results from the 1,080 sensitivity runs, the tables depict how the

aggregated 5-year cumulative production (known from now as “the output”) from the proposed

40 wells is affected by changes in the input. Tables 5.3a, 5.3b and 5.3c depict the crosslinked gel

results using 10, 15 and 30 stages per well respectively. Table 5.4a, 5.4b and 5.4c depict the

slickwater results using 10, 15 and 30 stages per well respectively.

Out of the existing 74 existing wells, 48 were completed using the crosslinked gel and 26 were

completed using the slickwater. Crosslinked gel was the original fluid choice and was used from

2006 to 2014 after which point all wells were fractured using slickwater. Since the slickwater

stimulations used newer technology, they tended to have more stages. The stage count for the

crosslinked gel wells had a range of 5 to 17 with an average value of 10. The stage count for the

slickwater wells had a range of 9 to 31 with an average value of 18. For this study, the

cumulative production is forecasted using stage counts of 10, 15 and 30 for both fluids. Since the

largest crosslinked gel stage count was 17, forecasting using 30 stages per crosslinked gel well

would be an extrapolation. Only two of the existing slickwater wells stages counts are lower than

14 and because of this forecasting using 10 stages per slickwater well is also an extrapolation. 15

stages per well should be used when comparing the results of the two types of fluids as these are

not extrapolations and would be have the best basis for comparison.

Table 5.3: Results from the sensitivity analysis using the crosslinked gel as the fracture fluid

Table 5.3a – 10 Stages Crosslinked Gel (Results in Bcf)

Proppant

amount

per stage (tonnes)

proppant

amount (tonnes)

Row 1 - Fluid injected per stage (m3), Row 2 - Fluid injected per well (m3)

90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260

900 1,000 1,100 1,200 1,300 1,400 1,500 1,600 1,700 1,800 1,900 2,000 2,100 2,200 2,300 2,400 2,500 2,600

40 400 193.0 193.1 193.2 193.3 193.5 193.6 193.8 194.1 194.4 194.7 195.0 195.3 195.5 195.7 195.9 196.1 196.3 196.5

60 600 194.0 194.1 194.2 194.3 194.5 194.6 194.8 195.1 195.4 195.7 196.0 196.3 196.5 196.8 197.0 197.2 197.4 197.6

80 800 197.0 197.1 197.2 197.3 197.4 197.6 197.8 198.1 198.4 198.7 199.0 199.3 199.5 199.7 200.0 200.2 200.4 200.6

100 1,000 203.8 203.9 204.0 204.1 204.2 204.4 204.8 205.1 205.4 205.7 205.9 206.2 206.4 206.6 206.8 207.1 207.3

120 1,200 204.6 204.7 204.8 205.0 205.3 205.7 206.0 206.3 206.5 206.8 207.0 207.2 207.4 207.6 207.8

140 1,400 202.4 202.6 202.9 203.2 203.5 203.8 204.1 204.3 204.5 204.8 205.0 205.2 205.4

160 1,600 200.4 200.7 201.0 201.3 201.5 201.7 202.0 202.2 202.4 202.6 202.8

180 1,800 198.6 198.9 199.1 199.4 199.6 199.8 200.0 200.2 200.4

200 2,000 197.0 197.3 197.5 197.7 197.9 198.1 198.3

220 2,200 195.4 195.6 195.8 196.0 196.2 196.4

240 2,400 194.0 194.2 194.4 194.6

260 2,600 192.8 193.0

Table 5.3b – 15 Stages Crosslinked Gel (Results in Bcf)

Proppant

amount

per stage (tonnes)

proppant

amount (tonnes)

90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260

1,350 1,500 1,650 1,800 1,950 2,100 2,250 2,400 2,550 2,700 2,850 3,000 3,150 3,300 3,450 3,600 3,750 3,900

40 600 195.2 195.4 195.5 195.6 195.8 195.9 196.2 196.5 196.9 197.3 197.6 197.9 198.2 198.4 198.7 198.9 199.1 199.4

60 900 198.9 199.1 199.2 199.3 199.5 199.6 199.8 200.2 200.6 201.0 201.3 201.6 201.8 202.1 202.4 202.6 202.8 203.1

80 1,200 204.8 204.9 205.1 205.2 205.3 205.5 205.7 206.1 206.5 206.8 207.1 207.4 207.7 208.0 208.2 208.4 208.7 208.9

100 1,500 214.8 214.9 215.0 215.2 215.3 215.5 215.9 216.3 216.6 216.9 217.2 217.4 217.7 217.9 218.1 218.4 218.6

120 1,800 217.5 217.6 217.8 218.0 218.3 218.7 219.0 219.3 219.6 219.8 220.1 220.3 220.5 220.7 221.0

140 2,100 216.8 217.1 217.4 217.7 218.1 218.4 218.6 218.9 219.1 219.3 219.6 219.8 220.0

160 2,400 216.2 216.6 216.9 217.2 217.5 217.7 218.0 218.2 218.4 218.6 218.8

180 2,700 215.8 216.0 216.3 216.5 216.8 217.0 217.2 217.4 217.6

200 3,000 215.2 215.5 215.7 215.9 216.1 216.3 216.5

220 3,300 214.5 214.7 214.9 215.1 215.3 215.5

240 3,600 213.9 214.1 214.3 214.5

260 3,900 213.4 213.6

Out of existing proppant density range

Table 5.3c – 30 Stages Crosslinked Gel (Results in Bcf)

Proppant Amount

per stage

(tonnes)

Well proppant

amount

(tonnes)

90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260

2,700 3,000 3,300 3,600 3,900 4,200 4,500 4,800 5,100 5,400 5,700 6,000 6,300 6,600 6,900 7,200 7,500 7,800

40 1,200 222.7 222.9 223.0 223.2 223.3 223.5 223.8 224.2 224.7 225.1 225.5 225.9 226.3 226.7 227.0 227.4 227.7 228.1

60 1,800 226.8 226.9 227.0 227.2 227.4 227.5 227.8 228.2 228.6 229.0 229.4 229.8 230.1 230.5 230.8 231.2 231.5 231.8

80 2,400 232.5 232.6 232.7 232.9 233.0 233.1 233.4 233.7 234.1 234.5 234.9 235.2 235.5 235.8 236.1 236.4 236.7 237.0

100 3,000 241.3 241.4 241.5 241.6 241.7 241.9 242.1 242.5 242.7 243.0 243.3 243.5 243.8 244.0 244.2 244.5 244.7

120 3,600 243.9 244.0 244.1 244.3 244.5 244.8 245.1 245.3 245.5 245.8 246.0 246.2 246.4 246.6 246.8

140 4,200 245.1 245.3 245.5 245.8 246.1 246.3 246.6 246.8 247.0 247.2 247.5 247.7 247.9

160 4,800 246.5 246.8 247.0 247.3 247.5 247.7 248.0 248.2 248.4 248.6 248.9

180 5,400 247.7 248.0 248.2 248.4 248.7 248.9 249.1 249.3 249.5

200 6,000 248.9 249.2 249.4 249.6 249.8 250.0 250.2

220 6,600 249.8 250.1 250.3 250.5 250.7 250.9

240 7,200

250.9 251.1 251.3 251.5

260 7,800 251.8 252.0

Out of existing proppant density range

Table 5.4: Results from the sensitivity analysis using the slickwater as the fracture fluid.

Table 5.4a – 10 Stages Slickwater (Results in Bcf)

Proppant

amount per stage

(tonnes)

proppant amount

(tonnes)

250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1,000

2,500 3,000 3,500 4,000 4,500 5,000 5,500 6,000 6,500 7,000 7,500 8,000 8,500 9,000 9,500 10,000

40 400 225.2 226.1 226.7 226.7 226.5 226.1 225.6 225.0 224.3 223.7 223.0 222.3 221.6 221.0 220.3 219.7

60 600 225.5 226.4 227.0 227.1 226.9 226.5 226.1 225.5 224.8 224.2 223.5 222.9 222.2 221.6 220.9 220.3

80 800 227.5 228.4 229.0 229.1 228.9 228.6 228.2 227.6 227.0 226.4 225.7 225.1 224.5 223.9 223.3 222.7

100 1,000 232.9 233.7 234.3 234.5 234.3 234.0 233.6 233.1 232.5 231.9 231.4 230.8 230.2 229.6 229.1 228.5

120 1,200 232.7 233.5 234.1 234.2 234.1 233.8 233.4 232.9 232.4 231.8 231.2 230.7 230.1 229.6 229.1 228.5

140 1,400 229.6 230.4 231.0 231.2 231.0 230.7 230.4 229.9 229.3 228.8 228.2 227.6 227.1 226.6 226.0 225.5

160 1,600 226.3 227.1 227.7 227.8 227.7 227.4 227.1 226.5 226.0 225.5 224.9 224.4 223.8 223.3 222.8 222.3

180 1,800 223.0 223.8 224.4 224.5 224.4 224.1 223.8 223.3 222.8 222.2 221.7 221.2 220.7 220.2 219.7 219.2

200 2,000 219.8 220.6 221.2 221.4 221.2 221.0 220.7 220.2 219.7 219.2 218.7 218.2 217.7 217.3 216.8 216.4

220 2,200 216.9 217.7 218.3 218.4 218.3 218.1 217.8 217.4 216.9 216.4 216.0 215.5 215.1 214.7 214.2 213.8

240 2,400 214.1 214.9 215.5 215.7 215.6 215.4 215.1 214.7 214.3 213.9 213.4 213.0 212.6 212.2 211.8 211.5

260 2,600 211.5 212.3 213.0 213.1 213.1 212.9 212.6 212.3 211.9 211.5 211.1 210.7 210.3 210.0 209.6 209.3

280 2,800 209.9 210.6 210.8 210.7 210.6 210.3 210.0 209.6 209.2 208.9 208.5 208.2 207.8 207.5 207.2

300 3,000 207.7 208.3 208.5 208.5 208.4 208.1 207.8 207.5 207.1 206.8 206.5 206.2 205.9 205.6 205.4

Table 5.4b – 15 Stages Slickwater (Results in Bcf)

Proppant

amount per stage

(tonnes)

proppant amount

(tonnes)

250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1,000

3,750 4,500 5,250 6,000 6,750 7,500 8,250 9,000 9,750 10,500 11,250 12,000 12,750 13,500 14,250 15,000

40 600 199.6 200.7 201.6 202.0 202.1 202.2 202.2 202.1 202.0 201.9 201.8 201.7 201.6 201.5 201.4 201.3

60 900 203.2 204.3 205.2 205.5 205.7 205.8 205.8 205.7 205.6 205.5 205.4 205.3 205.3 205.2 205.1 205.0

80 1,200 208.8 209.9 210.7 211.1 211.3 211.4 211.4 211.3 211.3 211.2 211.1 211.0 210.9 210.8 210.8 210.7

100 1,500 218.1 219.1 219.9 220.3 220.4 220.5 220.6 220.5 220.4 220.3 220.3 220.2 220.1 220.0 220.0 219.9

120 1,800 220.4 221.4 222.2 222.6 222.7 222.8 222.9 222.8 222.7 222.7 222.6 222.5 222.4 222.4 222.3 222.2

140 2,100 219.6 220.6 221.4 221.7 221.9 222.0 222.0 222.0 221.9 221.8 221.7 221.7 221.6 221.5 221.4 221.4

160 2,400 218.5 219.5 220.3 220.7 220.9 220.9 221.0 220.9 220.9 220.8 220.7 220.6 220.5 220.5 220.4 220.3

180 2,700 217.5 218.5 219.3 219.6 219.8 219.9 219.9 219.9 219.8 219.7 219.6 219.6 219.5 219.4 219.4 219.3

200 3,000 216.5 217.5 218.3 218.6 218.8 218.9 218.9 218.9 218.8 218.7 218.7 218.6 218.5 218.5 218.4 218.4

220 3,300 215.6 216.6 217.4 217.7 217.9 218.0 218.0 218.0 217.9 217.9 217.8 217.7 217.7 217.7 217.6 217.6

240 3,600 214.7 215.7 216.5 216.8 217.0 217.1 217.2 217.1 217.1 217.0 217.0 216.9 216.9 216.8 216.8 216.8

260 3,900 213.9 214.8 215.6 216.0 216.2 216.3 216.3 216.3 216.3 216.2 216.2 216.1 216.1 216.1 216.0 216.0

280 4,200 214.0 214.8 215.2 215.3 215.4 215.5 215.5 215.4 215.4 215.4 215.3 215.3 215.3 215.3 215.3

300 4,500 213.2 214.0 214.4 214.5 214.6 214.7 214.7 214.7 214.7 214.6 214.6 214.6 214.6 214.6 214.6

Table 5.4c – 30 Stages Slickwater (Results in Bcf)

Proppant

amount per stage

(tonnes)

proppant amount

(tonnes)

250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1,000

7,500 9,000 10,500 12,000 13,500 15,000 16,500 18,000 19,500 21,000 22,500 24,000 25,500 27,000 28,500 30,000

40 1,200 233.2 234.7 236.0 236.8 237.4 237.9 238.3 238.7 239.0 239.3 239.6 239.9 240.2 240.4 240.7 240.9

60 1,800 236.5 237.9 239.1 239.8 240.4 240.9 241.3 241.6 241.9 242.2 242.5 242.8 243.1 243.3 243.6 243.8

80 2,400 241.0 242.2 243.2 243.9 244.4 244.9 245.2 245.5 245.8 246.1 246.4 246.6 246.9 247.1 247.4 247.6

100 3,000 247.6 248.4 249.1 249.6 250.0 250.3 250.6 250.9 251.1 251.3 251.6 251.8 252.0 252.2 252.4 252.6

120 3,600 249.4 250.1 250.7 251.2 251.5 251.8 252.1 252.3 252.5 252.7 252.9 253.1 253.3 253.5 253.7 253.8

140 4,200 250.5 251.3 251.9 252.3 252.7 253.0 253.3 253.5 253.7 253.9 254.0 254.2 254.4 254.5 254.7 254.8

160 4,800 251.5 252.3 253.0 253.4 253.8 254.1 254.3 254.5 254.7 254.9 255.0 255.2 255.3 255.5 255.6 255.7

180 5,400 252.4 253.2 253.8 254.3 254.6 254.9 255.1 255.3 255.5 255.6 255.8 255.9 256.1 256.2 256.3 256.4

200 6,000 253.0 253.8 254.5 255.0 255.3 255.6 255.8 255.9 256.1 256.3 256.4 256.6 256.7 256.9 257.0 257.1

220 6,600 253.7 254.5 255.2 255.6 256.0 256.2 256.4 256.6 256.8 257.0 257.1 257.3 257.4 257.5 257.6 257.8

240 7,200 254.4 255.2 255.8 256.3 256.6 256.9 257.1 257.3 257.4 257.6 257.7 257.9 258.0 258.1 258.2 258.3

260 7,800 254.9 255.8 256.4 256.9 257.2 257.4 257.6 257.8 257.9 258.1 258.2 258.4 258.5 258.6 258.7 258.8

280 8,400 256.2 256.9 257.3 257.6 257.9 258.1 258.2 258.3 258.5 258.6 258.7 258.8 259.0 259.1 259.2

300 9,000 256.6 257.2 257.7 257.9 258.2 258.4 258.5 258.6 258.8 258.9 259.0 259.1 259.2 259.3 259.4

Figure 5.2 compares how the 15-stage output for both the crosslinked gel and slickwater is

affected by changes in proppant and fluid amounts. Crosslinked gel does see a small

improvement in the output when the fluid per stage is increased. The slickwater case peaks at

400 m3/stage with no further improvements to the output. The output for both the slickwater and

crosslinked gel peaks at 120 tonnes per stage; when more proppant is added the output beings to

Figure 5.3 shows how the stage count affects the output. Here each stage has different lines that

represent varying fluid amount. The output using the crosslinked gel increases with stage count.

The 10-stage slickwater output outperformed the 15-stage slickwater output. As mentioned

previously the 10 stages per well using slickwater are likely extrapolated and may be unreliable.

The 30 stage output for the crosslinked gel is also extrapolated. However, it does follow a similar

trend of the 30 stages using the slickwater.

Figure 5.4 compares the 15-stage output of crosslinked and slickwater wells. The results are

close between the two types of fluids, especially when the fluid amount per stage is similar

(example 260 m3/stage for crosslinked and 250 m3/stage for slickwater)

Figure 5.2: Aggregated cumulative 5-year production vs proppant amount and fluid amount per

stage using 15 stages per well and the crosslinked gel (top) and the slickwater (bottom) as the

fracture fluid.

Slickwater - 15 stages

Crosslinked gel - 15 stages

Figure 5.3: Affect of stage count per well on the aggregated cumulative 5-year production using

crosslinked gel (top) and slickwater (bottom) as the fracture fluid.

Crosslinked gel

Slickwater

Figure 5.4: Crosslinked gel vs slickwater results for 15 stage count.

To effectively observe how the injected water amounts affect the output, the cases where 120

tonnes of proppant per stage were used since this is the amount at which most of the cases hit

either the maximum or an inflection point. Figure 5.5 shows the output versus the total water

injected for various stage counts for both crosslinked gel and slickwater wells using 120 tonnes

of proppant per stage. From this result, it can be seen that output increases with stage count (with

the exception of the 10 stage slickwater treatment), also the output is minimally impacted when

more fluid is injected per stage and after a certain amount of fluid it no longer improves output.

Figure 5.5: Aggregated cumulative production versus the total water injected for various stage

counts of crosslinked gel and slickwater wells using 120 tonnes of proppant per stage.

The maximum output from the sensitivities was 259 Bcf. If maximum output was the ultimate

goal, then the best way to complete the 40 wells would be to be with slickwater, 30 stages, 1,000

m3 of fluid per stage and 300 tonnes of proppant per stage. This would result in a total injected

fluid of 1,200,000 m3 and 360,000 tonnes of proppant. If, however the goal is to minimize the

fluid input, the best strategy would be to complete with crosslinked gel, 10 stages, 120 m3 of

fluid per stage and 120 tonnes of proppant per stage. This would result in a total injected fluid of

48,000 m3 and 48,000 tonnes of proppant. This type of completion would result in an output of

205 Bcf which is 79% of the maximum possible output, the water usage is only 4% and the

proppant usage is 13% of the maximum possible case.

Since the goal of this study is to minimize water injection a new parameter can be identified

known as injection efficiency:

Injection Efficiency (Mcf

Aggregated 5 year cumulative production (Mcf)

Total fluid injected in all wells (m3)

Figure 5.6 shows how the injection efficiency for different stage count and fluid types changes as

more fluid is injected. The results show that the crosslinked gel has a better injection efficiency

than slickwater and that the injection efficiency decreases as more stages are added to wells. This

makes sense since the crosslinked gel uses far less water injection than slickwater and the lower

the stage count, the less the water use.

Figure 5.6: injection efficiency for different stage count and fluid types versus total fluid injected

If the goal was to maximize injection efficiency, the best strategy would be to complete with

crosslinked gel, 10 stages, 90 m3 of fluid per stage and 80 tonnes of proppant per stage. This

would result in total injected fluid of only 36,000 m3 and 32,000 tonnes of proppant. This type of

completion would result in an output of 197 Bcf which is 76% of the maximum possible output,

the water usage is only 3% and the proppant usage is 9% of the maximum possible case.

With all proposed well forecasts, it is worth crosschecking the outputs with the existing wells.

Figure 5.7 shows the individual well 5-year cumulative production and injection efficiency for

the 74 existing crosslinked gel and slickwater wells versus total fluid injected. The injection

efficiency of the crosslinked gel is much higher than the slickwater, the results are also similar to

the injection efficiency curves generated from the sensitivity runs in Figure 5.5 and 5.6.

Figure 5.7: 5-year cumulative production (top) and injection efficiency (bottom) for the 74

existing crosslinked gel and slcikwater wells versus total fluid injected

5.7 Conclusions

In this study we used a machine learning model developed and trained on 74 existing wells in the

Montney formation in Canada, to run sensitivities on 40 proposed wells to be drilled adjacent to

the existing producers. 1,080 sensitivities were run to explore how different combinations of

stage count, fluid type, water amount, and proppant amount affect the aggregated 5-year

cumulative gas production of the 40 proposed wells.

The results of the sensitivities show that injecting greater amounts of water during a hydraulic

fracture does result in slightly more cumulative production in both slickwater and crosslinked gel

completions. The injection efficiency on the other hand drops exponentially with greater water

injection. The results show that it is possible to achieve a cumulative production that is 76% of

the maximum by using only 3% of the water required to achieve this maximum. The results also

show that after cumulative production peaks at about 120 tonnes of proppant per stage for wells

with 10 or 15 stages and only marginally higher for wells with 30 stages.

The model in this study can be used to find the optimal stage count, fluid type, water and

proppant usage when adding new wells to a field with existing development. The model can be

used for any formation as long as there are sufficient vertical wells with rock mechanical logs in

the area. The accuracy of the model increases when the field has more existing development.

Over the last decade the industry has shifted away from crosslinked gel in favor of slickwater, at

the same time, stage count per well along with water injected per stage has also increased. This

has led to a dramatic increase in water usage which has impacted local water resources and

increased the need for wastewater management. This study has shown that bigger is not

necessarily better and that using slickwater and many stages should not be the automatic choice

for completion design even though it the most popular. Conservation of water should be a top

consideration together with cost and profit, every aspect of completion design should be

carefully considered for every type of play. The choice should be driven by resulting production

history from the field and should evolve as more data is generated. Several types of completion

design should be experimented with to see which have the highest injection efficiency at the

lowest cost.

Chapter 6: Using a Convolutional-Recurrent Neural Network Forecasting Model to Optimize the

Positioning of New Wells in a Partially Developed Field

6.1 Preface

This Chapter has been submitted as a manuscript to the Journal of Petroleum Science &

Engineering in 2020. This article is co-authored by Ian D. Gates.

6.2 Abstract

The ultimate recoverable hydrocarbon volumes and economic value of a partially developed field

is controlled by the positioning of future wells. The placement of wells involves aspects of the

understanding of the geology and heterogeneity of the resource as well as the costs of surface

pads and facilities. In this study, a convolutional-recurrent neutral network (c-RNN) developed

and trained to forecast shale gas production given well positions, spacing parameters, and

geomechanical properties within the reservoir, as well as the completion strategies: either a water

intense 30-stage slickwater treatment or a water conservative 15-stage crosslinked gel treatment.

The forecasting model is used to guide the optimal positioning of 20 new wells selected from 40

possible well positions; there are exactly 137,846,528,820 ways to position 20 wells in 40

predetermined positions; this defines the well placement space. To run this many positioning

combinations is computationally not feasible. Here, an approach is described to optimize well

placement using a subspace of the well placement space. The results show that with only 100

random combinations a normal distribution describing the entire well placement space begins to

form. The approach described permits a simple and intuitive method that can be used to find a

well placement combination with high aggregated 5-year cumulative gas production volume and

demonstrates a practical application of the c-RNN forecasting model.

6.3 Introduction

Maximizing hydrocarbon recovery while minimizing costs has always been a priority of the oil

and gas industry and with green fields becoming increasingly rare, optimization of developed

fields has become of great interest to organizations. If completion design is held constant in a

particular area, two factors drive the production performance of a future well: 1. geology and 2.

well placement (location, orientation, and depth or trajectory). The geological properties of a

reservoir, such as permeability, porosity, fluid saturation, rock mechanical properties, in-situ

stresses, etc., have a large effect on how fracture networks form and interact during stimulation

and how fluids flow through the formation during production. Geological properties are also

heterogenous and can vary dramatically even over distances of order of one hundred meters.

Thus, no two wells in a formation will have the exact same geological profile along the wellbore.

Well placement – which gives rise to well spacing – is another important factor because wells

drilled to near each other can share drainage areas which can negatively impact the production of

both wells. This is especially true in multi-stage hydraulically fracture horizontal wells since the

fracture networks from neighboring wells can easily link up. Well spacing depends ultimately on

geology: in less permeable rock, wells can be drilled closer together than in reservoirs with high

permeability. Because of geological heterogeneity, the position of a wellbore path within a

reservoir plays an important role in the future production performance of that well. Most fields

are only partially developed which implies that the amount of hydrocarbon volume left in the

reservoir to produce is driven by the position of future wells. Therefore, a method that can

optimize future well placement would be of great use in field development.

Historically, the most common way for industry to find the optimal well placements has been to

use exhaustive physics-based reservoir simulation methods together with a set of candidate well

locations which are typically defined by the user. In that approach, a geological model and a

numerical reservoir simulator are used to forecast production from all possible well placements

in the set of candidate well locations to see which combination of well placements result in the

highest cumulative field production or economic value or both (Jang et al., 2018). This approach,

however, is time consuming as full field simulation models typically contain tens of millions of

grid blocks requiring large compute time for evaluating multiple development options. To

overcome this limitation, various optimisation techniques have been developed including genetic

algorithms (GA), simulated annealing (SA), neural networks (ANN) and particle swarm

optimisation (PSO) (Bittencourt and Horne,1997; Centilmen et al., 1999; Yeten et al., 2002;

Emerick et al., 2009; Salmachi et al., 2013; Onwunalu and Durlofsky, 2010). Despite these

optimization methods, many being automated, the required computation time is still significant.

An optimized well placement strategy is only as good as the geological, reservoir, and

forecasting model used to produce it. If the geological or reservoir model is poor or the

forecasting model is inaccurate or uncertain, then it is likely that any well placements that are

chosen by the optimization algorithm will not produce optimal volumes. Numerical reservoir

simulators are a bottom up approach (Mohaghegh, 2017) where the geological model or an

ensemble of equiprobable models is defined and the system is optimized against an objective

function taking into account the set of candidate well locations. In most cases, a single geological

model is defined and the model is constructed from upscaled data from wells that are kilometers

away from each other yielding an oversimplification of a reservoir, its geological properties and

the complexities of its fracture networks. Thus, many operators have little confidence in the

ability to use numerical reservoir simulation optimization for forecasting and the placement of

wells to maximize future production.

An alternative approach that has been recently getting attention is by using machine learning

algorithms. These algorithms make minimal assumptions about the structure or geology of the

reservoir or how fluid flows through it. Rather, they are trained using existing data to find links

between the inputs (such as geology, completion type and spacing between wells) and the output

(such as cumulative production). The greater the richness of the input data, the better the

accuracy of the forecasting model, and the more capable the machine learning methods are able

to optimize well placements.

In this study, we use a machine learning algorithm to determine where best to locate 20 wells

amongst 40 potential locations. In the context of a company, this could be interpreted as having

40 well location options for a full field development but having a budget for a partial field

development that requires selecting only a subset of 20 of them. The objective function is to

maximize the aggregated 5-year cumulative gas production using only 20 of the well locations

taking into account the factors described above about geology and secondary wells drilled after a

first well and spacing from adjacent older wells and how cumulative production is impacted.

If done manually, there are 137,846,528,820 different combinations (the well placement space)

to position 20 wells in 40 predetermined positions. If done with a physics-based reservoir

simulator and if each simulation took 10 minutes to run, it would take over 2,000,000 years to

run all the cases. Even if only 1% of the well placement space was evaluated, this would take

20,000 years of simulations. Even with parallel processing capabilities, for example, with 1,000

cores, a 1% subset evaluation would take 20 years. Thus, options to do this in meaningful time,

say on the order of days to weeks would be desired. Machine learning offers an option to do this.

6.4 Study Area and Completion Scenarios

this thesis and shown in Figures 1.2 and 1.3. The 40 possible future well positions are shown in

Figure 5.1 and are the same as the 40 well positions that where used to run sensitivities on

completion designs in Chapter 5.

The forecasting model developed in this study uses three groups of input parameters at every

stage along the wellbore to make a prediction about the gas production. These 3 groups are:

• well spacing and completion order,

• rock mechanical properties, and

• completion parameters.

The individual parameters of each group are listed in Table 5.1.

The existing producers in this study are drilled close to each other so the production of gas from

one well affects the production of its neighbors. This was shown by Chapter 4 where the

individual well error rate of the forecasting model decreased from 16.3 to 14.9% when well

spacing and completion order parameters (time and gas volume that a well produced in an

unbounded state, time and gas volume that offset wells produced before a well was drilled,

percentage of length that each offset well covers a well, average perpendicular distance between

offsetting wells) were taken into account. Because the well spacing and completion order was

important, they were also used for the proposed wells in this study. The spacing parameters were

based on distance from a proposed well location to existing producers as well as the distance

between the proposed wells themselves. Well spacing and completion order makes more sense

on a well level not a stage level, so the input values representing spacing and completion order

for the well would be constant at every stage in that well.

6.4.2 Rock mechanical properties

The rock mechanical properties at each stage of the proposed wellbore trajectory were extracted

using the rock mechanical model built in Chapter 4. Briefly, this model was developed by

upscaling and 3D interpolating the density (RHOB), compressional sonic (DTP) and shear sonic

(DTS) logs of 180 vertical and deviated wells located in the study area. The DTS log was present

in only 14 of these wells and had to be synthetically generated for the other 166 using the neural

network developed in Chapter 3. Because stage count per proposed well can be either 15 or 30

depending on the scenario, the rock mechanical properties are different depending on the stage

count. For each proposed well we extracted 2 sets of rock inputs based on the number of stages.

6.4.3 Completion Parameters

For this study the fluid type, fluid amount and proppant amount are used as the completion

parameters. Two completion cases are considered for each well: 1. water intense and 2. water

conservative. This makes the optimization problem more challenging. The completion

parameters for the two cases are as follows:

• Water intense:

o Fracture fluid - slickwater

o 30 stages per well

o 150 tonnes of proppant per stage

o 650 m3 of fluid per stage

• Water conservative:

o Fracture fluid - crosslinked gel

o 15 stages per well

o 100 tonnes of proppant per stage

o 150 m3 of fluid per stage

The stage count, proppant amount, and fluid amount for each of these cases were chosen based

on the average of the existing 26 slickwater wells and 40 crosslinked gel wells in the area of

interest.

6.5 Neural Network Algorithm for Gas Production Forecasting