basic geostatistics

155
Topic: BASIC GEOSTATISTICS Subtopic:  Status Introduction Classical Statistical Concepts Data Posting and Validation  Regionalized Variables Kriging Data Integration Conditional Simulation  Public Domain Geostatistics Programs Case Studies Selected Readings Geostatistics Glossary COMPILED BY JIMBA OLUWAFEMI SOLOMON

Upload: -

Post on 17-Oct-2015

111 views

Category:

Documents


7 download

TRANSCRIPT

  • Topic: BASIC GEOSTATISTICS

    Subtopic: Status

    Introduction

    Classical Statistical Concepts

    Data Posting and Validation

    Regionalized Variables

    Kriging

    Data Integration

    Conditional Simulation

    Public Domain Geostatistics Programs

    Case Studies

    Selected Readings

    Geostatistics Glossary

    COMPILED BY: JIMBA OLUWAFEMI SOLOMON

  • THERE IS LOVE IN SHARING!!

    THIS IS FOR THE BENEFIT OF MY COLLEAGUES IN

    PETROLEUM GEOSCIENCE IMPERIAL COLLEGE LONDON

    WHO WILL AND MUST GRADUATE IN SEPTEMBER 2008.

    WISH YOU ALL SUCCESS!!!

  • INTRODUCTION

    Before undertaking any study of Geostatistics, it is necessary to become familiar with certain key concepts drawn from Classical Statistics, which form the basic building blocks of Geostatistics. Because the study of Statistics generally deals with quantities of data, rather than a single datum, we need some means to deal with that data in a manageable form. Much of Statistics deals with the

    organization, presentation, and summary of data. Isaaks and Srivastava (1989) remind us that Data speaks most clearly when organized.

    This section reviews a number of classic statistical concepts that are frequently used during the course of geostatistical analysis. By understanding these concepts, we will gain the tools needed to analyze and describe data, and to understand the relationships between different variables.

    STATISTICAL NOTATION

    Statistical notation uses Roman or Greek letters in equations to represent similar concepts, with the distinction being that:

    Greek notation describes Populations: measures of a population are called parameters

    Roman notation describes Samples: measures of a sample are called statistics

    Now might be a good time to review the list of Greek letters. Following is a list of Greek letters and their significance within the realm of statistics.

    Letter Name

    Upper & Lower Case

    alpha

    beta

    gamma

    delta

    epsilon

    zeta

    eta

    theta

    iota

  • kappa

    lambda

    mu

    Statistical Notation: Mean of a Population

    nu

    xi

    omicron

    pi

    rho

    Statistical Notation: Correlation Coefficient

    sigma

    Statistical Notation: Summation

    Statistical Notation: Standard Deviation of a Population

    tau

    upsilon

    phi

    chi

    Statistical Notation: Mean of a Sample ( )

    psi

    omega

    It is important to note that in some cases, a letter may take on a different meaning, depending on whether the letter is upper case or lower case. Certain

    Roman letters take on additional importance as part of the standard notation of Statistics or Geostatistics.

  • Letter Name

    Statistical Notation

    E Event

    F f f

    Distribution Frequency, Probability function for a random variable

    h Lag distance (distance between two sample points)

    m Sample mean

    N n

    Population size Sample size (or number of observations in a data set)

    O o

    Observed frequencies Outcomes

    P p

    Probability Proportion

    s Standard deviation of a sample

    V Variance

    X x

    Random variable A single value of a random variable

    MEASUREMENT SYSTEMS

    Because the conclusions of a quantitative study are based in part on inferences drawn from measurements, it is important to consider the nature of the measurement systems from which data are collected. Measurements are numerical values that reflect the amount or magnitude of some property. The manner in which numerical values are assigned determines the measurement scale, and thereby determines the type of data analysis (Davis, 1986).

    There are four measurement scales, each more rigorously defined than its predecessor; and thus containing more information. The first two are the nominal and ordinal scales, in which we classify observations into exclusive categories. The other two scales, interval and ratio, are the ones we normally think of as

    measurements, because they involve determinations of the magnitude of an observation (Davis, 1986).

    Nominal Scale

    This measurement classifies observations into mutually exclusive categories of equal rank, such as red, green, or blue. Symbols like A, B, C, or numbers are also often used. In geostatistics, we may wish to predict facies occurrence, and may therefore code the facies as 1, 2 and 3, for sand, siltstone, and shale, respectively. Using this scale, there is no connotation that 2 is twice as much as 1, or that 3 is greater than 2.

  • Ordinal Scale

    Observations are sometimes ranked hierarchically. A classic example taken from geology is Mohs scale of hardness, in which mineral rankings extend from one to ten, with higher ranks signifying increased hardness. The step between successive states is not equal in this scale. In the petroleum industry, kerogen types are based on an ordinal scale, indicative of stages of organic diagenesis.

    Interval Scale

    This scale is so named because the width of successive intervals is constant.

    The most commonly cited example of an interval scale is temperature. A change from 10 to 20 degrees C is the same as the change from 110 to 120 degrees C. This scale is commonly used for many measurements. An interval scale does not have a natural zero, or a point where the magnitude is nonexistent. Thus, it is possible to have negative values. Within the petroleum industry, reservoir properties are measured along a continuum, but there are practical limits for the measurements. (It would be hard to conceive of negative porosity, permeability, or thickness, or of porosity greater than 100%.)

    Ratio Scale

    Ratios not only have equal increments between steps, but also have a zero point.

    Ratio scales represent the highest forms of measurement. All types of mathematical and statistical operations are performed with them. Many geological measurements are based on a ratio scale, because they have units of length, volume, mass, and so forth.

    For most of our geostatistical studies, we will be primarily concerned with the analysis of interval and ratio data. Typically, no distinction is made between the two, and they may occur intermixed in the same problem. For example, in trend surface analysis, the independent variable may be measured on a ratio scale, whereas the geographical coordinates are on an interval scale.

    POPULATIONS AND SAMPLES

    INTRODUCTION

    Statistical analysis is built around the concepts of populations and samples.

    A population consists of a well-defined set of elements (either finite or infinite).

    More specifically, a population is the entire collection of those elements. Commonly, such elements are measurements or observations made on items of a specific type (porosity or permeability, for example). A finite population might consist of all the wells drilled in the Gulf of Mexico in 1999, whereas, the infinite population might be all wells drilled in the Gulf of Mexico, past, present, and future.

    A sample is a subset of elements drawn from the population (Davis, 1986). Samples are studied in order to make inferences about the population itself.

    Parameters, Data, And Statistics

    Populations possess certain numerical characteristics (such as the population mean) which are known as parameters. Data are measured or observed values obtained by sampling the population. A statistic is similar to a parameter, but it applies to numerical characteristics of the sample data.

  • Within the population, a parameter consists of a fixed value, which does not change. Statistics are used to estimate parameters or test hypotheses about the parent population (Davis, 1986). Unlike the parameter, the value of a statistic is not fixed, and may change by drawing more than one sample from the same population.

    Remember that values from Populations (parameters) are often assigned Greek letters, while the values from Samples (statistics) are assigned Roman letters.

    Random Sampling

    Samples should be acquired from the population in a random manner. Random sampling is defined by two properties.

    First, a random sample must be unbiased, so that each item in the sample

    has the same chance of being chosen as any other item in the sample.

    Second, the random sample must be independent, so that selecting one item from the population has no influence on the selection of other items in the population.

    Random sampling produces an unbiased and independent result, so that, as the sample size increases, we have a better chance of understanding the true nature (distribution) of the population.

    One way to determine whether random samples are being drawn is to analyze sampling combinations. The number of different samples of n measurements that can be drawn for the population, N, is given by the equation:

    Where: CNn = the number of combinations of samples

    N = the number of elements in the population n = the number of elements in the sample

    If the sampling is conducted in a manner such that each of the CNn samples has an equal chance of being selected, the sampling program is said to be random and the result is a random sample (Mendenhall, 1971).

    Sampling Methods

    The method of sampling affects our ability to draw inferences about our data (such as estimation of values at unsampled locations) because we must know the probability of an observation in order to arrive at a statistical inference.

    Replacement

    The issue of replacement plays an important role in our sampling strategy. For example, if we were to draw samples of cards from a population consisting of a deck, we could either:

    Draw a card from the deck, and add its value to our hand, then draw another card

    Or

    )!nN(!n

    !NN

    nC

  • Draw a card from the deck, note its value, and put it back in the deck, then draw a card from the deck again.

    In the first case, we sample without replacement; in the second case we sample with replacement. Sampling without replacement prevents us from sampling that value again, while sampling with replacement allows us the chance to pick that

    same value again in our sample.

    Oilfield Applications to Sampling

    When observations having certain characteristics are systematically excluded from the sample, whether deliberately or inadvertently, the sampling is considered biased. In the oil industry, we face this situation quite frequently. Suppose, for example, we may be interested in the pore volume of a particular reservoir unit for pay estimation. Typically, we use a threshold or porosity cutoff when making the calculation, thus deliberately biasing the true pore volume to a larger value.

    Similarly, the process of drilling wells in a reservoir necessarily involves sampling without replacement.

    Furthermore, any sample data set will provide only a sparse and incomplete picture of the entire reservoir. The sampling routine (also known as the drilling program) is highly biased and dependent, and rightly so -any drilling program will

    be biased toward high porosity, high permeability, high structural position, and ultimately, high production. And the success or failure of nearby wells will influence further drilling. Because the sample data set represents a minuscule subset of the population, we will never really know that actual population distribution function of the reservoir. (We will discuss bias in more detail in our discussion of summary statistics.)

    However, despite these limitations, our task is to infer properties about the entire reservoir from our sample data set. To accomplish this, we need to use various statistical tools to understand and summarize the properties of the samples to make inferences about the population (reservoir).

    TRIALS, EVENTS, AND PROBABILITY

    INTRODUCTION

    In statistical parlance, a trial is an experiment that produces an outcome which consists of either a success or a failure. An event is a collection of possible outcomes of a trial. Probability is a measure of the likelihood that an event will occur, or a measure of that events relative frequency. The following discussion introduces events and their relation to one another, then provides an overview on probability.

    EVENTS

    An event is a collection of possible outcomes, and this collection may contain zero or more outcomes, depending on how many trials are conducted. Events can be classified by there relationship to one another:

    Independent Events

    Events are classified as Independent if the occurrence of event A has no bearing on the occurrence of event B, and vice versa.

  • Dependent Events

    Events are classified as Dependent if the occurrence of event A influences the occurrence of event B.

    Mutually Exclusive Events

    Events are Mutually Exclusive if the occurrence of either event precludes the occurrence of the other. Two events that are independent events cannot be mutually exclusive.

    PROBABILITY

    Probability is a measure of the likelihood that an event will occur, or a measure of that events relative frequency. The measure of probability is scaled from 0 to 1, where:

    0 represents no chance of occurrence, and

    1 represents certainty that the event will occur.

    Probability is just one tool that enables the statistician to use information from samples to make inferences or describe the population from which the samples were obtained (Mendenhall, 1971). In this discussion, we will review discrete and conditional probabilities.

    Discrete Probability

    All of us have an intuitive concept of probability. For example, if asked to guess whether it will rain tomorrow, most of us would reply with some confidence that rain is either likely or unlikely. Another way of expressing the estimate is to use a numerical scale, such as a percentage scale. Thus, you might say that there is a 30% chance of rain tomorrow, and imply that there is a 70% chance it will not rain.

    The chance of rain is an example of discrete probability; it either will or it will not

    rain. The probability distribution for a discrete random variable is a formula, table, or graph providing the probability associated with each value of the random variable (Mendenhall, 1971; Davis, 1986). For a discrete distribution, probability can be defined by the following:

    P(E) = number of outcomes corresponding to event E

    total number of possible outcomes

    Where: P = the probability of a particular outcome, and E = the event

    Consider the following classic example of discrete probability, used almost universally in statistics texts.

    Coin Toss Experiment Coin tossing is a clear-cut example of discrete probability. The event has two

    states and must occupy one or the other; except for the vanishingly small possibility that the coin will land precisely on edge, it must come up either heads or tails (Davis, 1986: Mendenhall, 1971).

  • The experiment is conducted by tossing two unbiased coins. When a single coin is tossed, it has two possible outcomes: heads or tails. Because each outcome is equally likely, the probability of obtaining a head is . This does not imply that every other toss results in a head, but given enough tosses, heads will appear one-half the time.

    Now let us look at the two-coin example. The sample points for this experiment with their respective probabilities are given below (taken from Mendenhall, 1971).

    Sample Point

    Coin 1 Coin 2 P(EI) y

    E1 H H 2

    E2 H T 1

    E3 T H 1

    E4 T T 0

    Let y equal the number of heads observed. We assign the value y = 2 to sample point E1, y = 1 to sample point E2, etc. The probability of each value of y may be

    calculated by adding the probabilities of the sample points in the numerical event.

    The numerical event y = 0 contains one sample point, E4; y =1 contains two sample points, E2 and E3; while y =2 contains one sample point, E1.

    The Probability Distribution Function for y, where y = Number of Heads

    y Sample Points in

    y p(y)

    0 E4

    1 E2, E3

    2 E1

    Thus, for this experiment there is a 25% chance of observing two heads from a single toss of the two coins. The histogram contains three classes for the random variable y, corresponding to y = 0, y = 1, and y = 2. Because p(0) = , the theoretical relative frequency for y = 0 is ; p(1) = , hence the theoretical relative probability for y = 1 is , etc. The histogram is shown in Figure 1 (Probability Histogram for p(y) (modified from Davis, 1986)).

  • Figure 1

    If you were to draw a sample from this population, by throwing two balanced coins, say 100 times, and recorded the number of heads observed each time to construct a histogram for the 100 measurements, your histogram would appear very similar to that of Figure 1. If you repeated the experiment with 1000 coin tosses, the similarity would be even more pronounced.

    Conditional Probability

    The concept of conditional probability is key to oil and gas exploration, because once a well is drilled, it makes more information available, and allows us to revise our estimates of the probability of further outcomes or events. Two events are often related in such a way that the probability of occurrence of one event depends upon whether the other event has or has not occurred. Such a dependence on a prior event describes the concept of Conditional Probability: the

    chance that a particular event will occur depends on whether another event occurred previously.

    For example, suppose an experiment consists of observing weather on a specific day. Let event A = snow and B = temperature below freezing. Obviously, events A and B are related, but the probability of snow, P(A), is not the same as the probability of snow given the prior information that the temperature is below freezing. The probability of snow, P(A), is the fraction of the entire population of observations which result in snow. Now examine the sub-population of observations resulting in B, temperature below freezing, and the fraction of these resulting in snow, A. This fraction, called the conditional probability of A given B, may equal P(A), but we would expect the chance of snow, given freezing temperatures, to be larger.

    In statistical notation, the conditional probability that event A will occur given that event B has occurred already is written as:

    P(A|B)

  • where the vertical bar in the parentheses means given and events appearing to the right of the bar have occurred (Mendenhall, 1971).

    Thus, we define the conditional probabilities of A given B as:

    P(A|B) = P(AB) P(B)

    and we define the conditional probabilities of B given A as follows:

    P(B|A) = P(AB) P(A)

    Bayes Theorem on Conditional Probability Bayes Theorem allows the conditional probability of an event to be updated as newer information becomes available. Quite often, we wish to find the conditional probability of an event, A, given that event B occurred at some time in the past. Bayes Theorem for the probability of causes follows easily from the definition of conditional probability:

    Where: P(A | B) = the probability that event A will occur, given that event B has already occurred P(B | A) = the probability that event B will occur, given that event A has already occurred P(A) = the probability that event A will occur P(B | A') = the probability that event B will occur, given that event A has not

    already occurred P(A') = the probability that event A will not occur

    A practical geostatistical application using Bayes Theorem is described in an article by Doyen; et al. (1994) entitled Bayesian Sequential Indicator Simulation of Channel Sands in the Oseberg Field, Norwegian North Sea.

    Additive Law of Probability

    Another approach to probability problems is based upon the classification of compound events, event relations, and two probability laws. The first is the Additive Law of Probability, which applies to unions.

    The probability of the union (A B) is equal to:

    P(A B) = P(A) + P(B) -P(AB)

    If A and B are mutually exclusive, P(AB) = 0 and

    P(A B) = P(A) + P(B)

    Multiplicative Law of Probability

    The second law of probability is called the Multiplicative Law of Probability, which applies to intersections.

    )'A(P)'A|B(P)A(P)A|B(P

    )A(P)A|B(P)B|A(P

  • Given two events, A and B, the probability of the intersection, AB, is equal to

    P(AB) = P(A)P(B|A)

    = P(B)P(A|B)

    If A and B are independent, then P(AB) = P(A)P(B)

    RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS

    INTRODUCTION

    Geoscientists are often tasked with estimating the value of a reservoir property at a location where that property has not been previously measured. The estimation procedure must rely upon a model describing how the phenomenon behaves at unsampled locations. Without a model, there is only sample data, and no inference can be made about the values at locations that were not sampled. The underlying model and its behavior is one of the essential elements of the geostatistical framework.

    Random variables and their probability distributions form the foundation of the geostatistical method. Unlike many other estimation methods (such as linear regression, inverse distance, or least squares) that do not state the nature of their model, geostatistical estimation methods clearly identify the basis of the models used (Isaaks and Srivastava, 1989). In this section, we define the random variable and briefly review the essential concepts of important probability distributions. The random variable is further explained later, in Spatial Correlation Analysis and Modeling.

    THE PROBABILISTIC APPROACH

    Deterministic models are applicable only when the process that generated the data is known in sufficient detail to enable an accurate description of the entire population to be made from only a few sample values. Unfortunately, few reservoir processes are understood well enough to permit application of deterministic models. Although we know the physics or chemistry of the fundamental processes, the variables we study in reservoir data sets are often the product of complex interactions that are not fully quantifiable. These processes include, for example, depositional mechanisms, tectonic processes, and diagenetic alterations.

    For most reservoir data sets, we must accept that there is an unavoidable degree of uncertainty about how the attribute behaves between sample locations (Isaaks and Srivastava, 1989). Thus, a probabilistic approach is required, and the following random function models introduced herein recognize this fundamental uncertainty, providing us with tools to estimate values at unsampled locations.

    The following discussion describes the two kinds of random variables. Next, well discuss the probability distributions or functions associated with each type random variable.

  • RANDOM VARIABLE DEFINED

    A random variable can be defined as a numerical outcome of an experiment whose values are generated randomly according to some probabilistic mechanism. A random variable associates a unique numerical value with every outcome, so the value of the random variable will vary with each trial as the experiment is repeated.

    The throwing of a die, for example, produces values randomly from the set 1,2,3,4,5,6. The coin toss is another experiment that produces numbers randomly. (In the case of a coin toss, however, we need to designate a numerical value to heads as 0 and tails as 1; then we can draw randomly from the set 0,1.)

    TWO CLASSES OF RANDOM VARIABLES

    There are two different classes of random variables, with the distinction based on the sample interval associated with the measurement. The two classes are the discrete and the continuous random variable. We will discuss each in turn.

    Discrete Random Variables

    A discrete random variable may be identified by the number and nature of the values it assumes; it may assume only a finite range of distinct values (distinct values being the operative phrase here, e.g.: 0,1,2,3,4,5 -as opposed to each and every number between 0 and 1 -which would produce an infinite number of values).

    In most practical problems, discrete random variables represent count (or enumerated) data, such as point counts of minerals in a thin section. The die and coin toss experiments also generate discrete random variables.

    Discrete random variables are characterized by a probability distribution, which

    may be described by a formula, table or graph that provides the probability associated with each value of the discrete random variable. The probability distribution function of discrete random variables may be plotted as a histogram. Refer to Figure 1 (Probability histogram) as an example histogram for a discrete

    random variable.

  • Figure 1

    Frequency Tables and Histograms Discrete random variables are often recorded in a frequency table, and displayed as a histogram. A frequency table records how often data values fall within certain intervals or classes. A histogram is a graphical representation of the frequency table.

    It is common to use a constant class width for a histogram, so that the height of each bar is proportional to the number of values within that class. Data is conventionally ranked in ascending order, and thus can be represented as a cumulative frequency histogram, where the total number of values below certain cutoffs are shown, rather than the total number of values in each class.

  • Table 1 Frequency and Cumulative Frequency tables of 100 values, X, with a class width of one (modified from Isaaks and Srivastava, 1989).

    Class

    Interval

    Frequency

    Occurrences

    Frequency

    Percentage

    Cumulative

    Number

    Cumulative

    Percentage

    0-1 1 1 1 1

    1-2 1 1 2 2

    2-3 0 0 2 2

    3-4 0 0 2 2

    4-5 3 3 5 5

    5-6 2 2 7 7

    6-7 2 2 9 9

    7-8 13 13 22 22

    8-9 16 16 38 38

    9-10

    11 11 49 49

    10-11

    13 13 62 62

    11-12

    17 17 79 79

    12-13

    13 13 92 92

    13-14

    4 4 94 94

    >14 4 4 100 100

    Figure 2a and 2b display Frequency and Cumulative frequency histograms of data in Table 1 (modified from Isaaks and Srivastava,

  • Figure 2a

    1989).

    2b

  • (Sometimes, the histograms are converted to continuous curves by running a line from the midpoint of each bar in the histogram. This process may be convenient for comparing continuous and discrete random variables, but may tend to confuse the presentation.)

    Continuous Random Variables

    These variables are defined by an infinitely large number of possible values (much like a segment of a number-line, which can be repeatedly subdivided into smaller and smaller intervals to create an infinite number of increments).

    In most practical problems, continuous random variables represent measurement data, such as the length of a line, or the thickness of a pay zone.

    The probability density function of the continuous random variable may be plotted

    as a continuous curve. Although such curves may assume a variety of shapes, it is interesting to note that a very large number of random variables observed in nature approximate a bell-shaped curve. A statistician would say that such a curve approximates a normal distribution (Mendenhall, 1971).

    Probability Distributions Of The Discrete Random Variable The probability distribution of a discrete random variable consists of the relative frequencies with which a random variable takes each of its possible values. Four common probability distributions for discrete random variables are: Binomial, Negative Binomial, Poisson, and Hypergeometric. Each of these distributions is discussed using practical geological examples taken from Davis (1986).

    Binomial Probability Distribution Binomial distributions only apply to a special type of discrete random variable, called a binary variable. Binary variables can only have two values: such as ON or OFF, SUCCESS or FAILURE, 0 or 1. (Often times, values such as ON or OFF, and SUCCESS or FAILURE will be assigned the numerical values of 1 or 0 respectively.) Similarly, binomial distributions are only valid for trials in which there are only two possible outcomes for each trial. Furthermore, the total number of trials must be fixed beforehand, all of the trials must have the same probability of success, and the outcomes of all the trials must not be influenced by the outcomes of previous trials. The probability distribution governing a coin toss or die throwing experiment is a binomial distribution.

    Well consider how the binomial distribution can be applied to the following oilfield example.

    Problem: Forecast the probability of success of a drilling program.

    Assumptions: Each wildcat is classified as either:

    0 = Failure (dry hole)

    1 = Success (discovery)

    The binomial distribution is appropriate when a fixed number of wells will be

    drilled during an exploratory program or during a single period (budget cycle) for which the forecast is made.

    In this case, each well that is drilled in turn is presumed to be independent; this means that the success or failure of one hole does not influence the outcome of the next. Thus, the probability of discovery remains unchanged as successive

  • wildcats are drilled (true initially -as Davis pointed out in 1986, this assumption is difficult to justify in most cases, because a discovery or failure influences the selection of subsequent drilling locations).

    The probability p that a wildcat well will discover gas or oil is estimated using an

    industry-wide success ratio for drilling in similar areas, or based on the companys own success ratio. Sometimes the success ratio is a subjective guess. From p, the binomial model can be developed for exploratory drilling as follows:

    P The probability that a hole will be successful.

    -p The probability of failure.

    P = (1 -p)n

    The probability that n successive wells will be dry.

    P = (1 -p)n-1 p The probability that the nth hole will be a discovery, but the preceding (n -1) holes

    will be dry.

    P = n(1 -p)n-1 p The probability of drilling one discovery well in a series of n wildcat holes, where the discovery can occur in any of the n wildcats.

    P = (1 -p)n-r pr

    The probability that (n -r) dry holes will be drilled, followed by r discoveries.

    However, the (n -r) dry holes and the r discoveries may be arranged in

    combinations, or equivalently, in n! / (n -r)!r! different ways, resulting in the equation:

    P = [n! / (n -r)!r!][(1 -p)n-r pr] The probability that r discoveries will be made in a drilling program of n wildcats.

    This is an expression of the binomial distribution, and gives the probability that r successes will occur in n trials, when the probability of success in a single trial is p.

    For example, suppose we want to find the probability of success associated with a 5-well exploration program in a virgin basin where the success ratio is anticipated to be about 10%. What is the probability that the entire exploration program will be a total failure, with no discoveries?

    The terms of the equation are:

    N = 5 r = 0 p = 0.10 P = [(5!/5!0!] [1] [0.95] = 0.59

    Where: P = the probability of success r = the number of discovery wells

    r

    n

  • p = anticipated success ratio n = the number of holes drilled in the exploration program

    The probability of no discoveries resulting from exploratory effort is almost 60%. Using either the binomial equation or a table for the binomial distribution, Figure 3 (Discrete distribution giving the probability of making n discoveries in a five-well drilling program when the success ratio (probability of discovery) is 10% (modified from Davis,

    Figure 3

    1986) shows the probabilities associated with all possible outcomes of the five-

    well drilling program.

    Negative Binomial Probability Distribution

    Other discrete distributions can be developed for experimental situations with different basic assumptions. We can develop a Negative Binomial Probability Distribution to find the probability that x dry holes will be drilled before r discoveries are made.

    Problem: Drill as many holes as needed to discover two new fields in a virgin basin.

    Assumption: The same conditions that govern the binomial distribution are assumed, except that the number of trials is not fixed.

    The probability distribution governing such an experiment is the negative binomial. Thus we can investigate the probability that it will require, 2, 3, 4, , up to n exploratory wells before two discoveries are made.

    The expanded form of the negative binomial equation is

    P = [(r + x -1)!/(r -1)!x!][(1 -p)x pr

  • Where: P = the probability of success r= the number of discovery wells x = the number of dry holes p = regional success ratio

    If the regional success ratio is 10 %, the probability that a two-hole exploration program will meet the companys goal of two discoveries can be calculated:

    r = 2 x = 0 p = 0.10 P = 0.029

    The calculated probabilities are low because they relate to the likelihood of obtaining two successes and exactly x dry holes (in this case: x = zero). It may be more appropriate to consider the probability distribution that more than x dry holes must be drilled before the goal of r discoveries is achieved. We do this by first calculating the cumulative form of the negative binomial. This gives the probability that the goal of two successes will be achieved in (x + r) or fewer holes, as shown in Figure 4 (Discrete distribution giving the cumulative probability that two discoveries will be made by or before a specified hole is drilled, when the success ratio is 10% (modified from Davis, 1986)).

    Figure 4

  • Each of these probabilities is then subtracted from 1.0 to yield the desired probability distribution illustrated in Figure 5 (Discrete distribution giving the probability that more than a specified number of holes must be drilled to make two discoveries, when the success ratio is 10% (modified from Davis, 1986)).

    Figure 5

    Poisson Probability Distribution

    A Poisson random variable is typically a count of the number of events that occur within a certain time interval or spatial area. The Poisson probability distribution seems to be a reasonable approach to apply to a series of geological events. For example, the historical record of earthquakes in California, the record of volcanic eruptions in the Mediterranean, or the incidence of landslides related to El Nino along the California coast can be characterized by Poisson distributions.

    The Poisson probability model assumes that:

    events occur independently,

    the probability that an event will occur does not change with time,

    the length of the observation period is fixed in advance,

    the probability that an event will occur in an interval is proportional to the length of the interval, and

    the probability of more than one event occurring at the same time is vanishingly small.

  • When the probability of success becomes very small, the Poisson Distribution can be used to approximate the binomial distribution with parameters n and p.

    This is a discrete probability distribution regarded as the limiting case of the binomial when:

    n, the number of trials becomes very large, and

    p, the probability of success on any one trial becomes very small.

    The equation in this case is

    p(X) = e-x/X!

    Where p(X) = probability of occurrence of the discrete random variable X

    = rate of occurrence

    Note that the rate of occurrence, , is the only parameter of the distribution.

    The Poisson distribution does not require either n or p directly, because we use

    the product np = instead, which is given by the rate of occurrence of events.

    Hypergeometric Probability Distributions The binomial distribution would not be appropriate for calculating the probability of discovery because the chance of success changes with each wildcat well. For example, we can use Statistics to argue two distinctly contradictory cases:

    Discovery of one reservoir increases the odds against finding another (fewer fields remaining).

    Drilling a dry hole increases the probability that the remaining untested features will prove productive.

    What we need is to find all possible combinations of producing and dry features within the population, then enumerate those combinations that yield the desired number of discoveries.

    The probability distribution generated by sampling without replacement, is called a hypergeometric distribution. Consider the following:

    Problem: An offshore concession contains 10 seismic anomalies, with a

    historical success ratio of 40%. Our limited budget will permit only six anomalies to be drilled. Assume that if four structures are productive, the discovery of one reservoir increases the odds against finding another. What will be the number of discoveries?

    The probability of making x discoveries in a drilling program consisting of n holes, when sampling from a population of N prospects of which S are believed to

    contain commercial reservoirs, is

    SN -S

    x n -x

    P = N

    n

    Where: x = the number of discoveries N = the number of prospects in the population

  • n = the number of holes drilled S = the number commercial reservoirs

    This expression represents the number of combinations of reservoirs, taken by the number of discoveries, times the number of combinations of barren anomalies, taken by the number of dry holes, all divided by the number of combinations of all prospects taken by the total number of holes in the drilling program (Davis, 1989).

    Applying this to our offshore concession example containing ten seismic anomalies, from which four are likely to be reservoirs, what are the probabilities associated with a three-well drilling program?

    The probability of total failure, with no discoveries among the three structures is about 17%.

    The probability of one discovery is about 50%.

    A histogram of all possible outcomes of this exploration strategy is shown in Figure 6 (Discrete distribution giving the probability of n discoveries in three holes drilled on ten prospects, when four of the ten contain reservoirs (modified from Davis, 1986)). Note that some probability of success is (1.00 -0.17), or 83%.

    Figure 6

  • Frequency Distributions Of Continuous Random Variables

    Frequency distributions of continuous random variables follow a theoretical probability distribution or probability density function that can be represented by a

    continuous curve. These functions can take on a variety of shapes. Rather than displaying the functions as a curve, the distributions may be displayed as a histogram, as shown in Figure 7a, 7b,

    Figure 7a

    7c,

    7b

  • 7c

    and 7d (Examples of some continuous variable probability distributions).

    7d

    In this section, we will discuss the following common distribution functions:

    Normal Probability Distribution

    Lognormal Distribution

  • Normal Probability Distribution It is often assumed that random variables follow a normal probability density function, and many statistical (and geostatistical) methods are based on this supposition. The Central Limit Theorem is the foundation of the normal probability distribution.

    Central Limit Theorem

    The Central Limit Theorem (CLT) states that under rather general conditions, as the sample size increases, the sums and means of samples drawn from a population of any distribution will approximate a normal distribution (Sokol and Rohlf, 1969; Mendenhall, 1971). The Central Limit Theorem is defined below:

    Central Limit Theorem: If random samples of n observations are drawn from a population with finite

    mean, , and a standard deviation, , then, as n grows larger, the sample mean, y, will be approximately normally distributed with mean equal to and standard deviation n. The approximation will become more and more accurate as n becomes large (Mendenhall, 1971).

    The Central Limit Theorem consists of three statements:

    1. The mean of the sampling distribution of means is equal to the mean of the population from which the samples were drawn.

    2. The variance of the sampling distribution of means is equal to the variance of the population from which the samples were drawn, divided by the size of the samples.

    3. If the original population is distributed normally (i.e. it is bell shaped), the sampling distribution of means will also be normal. If the original population is not normally distributed, the sampling distribution of means will increasingly approximate a normal distribution as sample size increases (i.e. when increasingly large samples are drawn).

    The significance of the Central Limit Theorem is twofold:

    1. It explains why some measurements tend to possess (approximately) a normal distribution.

    2. The most important contribution of the CLT is in statistical inference. Many algorithms that are used to make estimations or simulations require knowledge about the population density function. If we can accurately predict its behavior using only a few parameters, then our predictions should be more reliable. If the CLT applies, then knowing the sample mean and sample standard deviation, the density distribution can be recreated precisely.

    However, the disturbing feature of the CLT, and most approximation procedures, is that we must have some idea as to how large the sample size, n, must be in

    order for the approximation to yield useful results. Unfortunately, there is no clear-cut answer to this question, because the appropriate value of n depends

    upon the population probability distribution as well as the use we make of the approximation. Fortunately, the CLT tends to work very well, even for small samples, but this is not always true.

  • Properties of the Normal Distribution Formally, the Normal Probability Density Function is represented by the following

    expression:

    Where Z is the height of the ordinate (y-axis) of the curve and represents the density of the function. It is the dependent variable in the expression, being a function of the variable Y.

    There are two constants in the equation: , well-known to be approximately

    3.14159, making 1/2 equal 0.39894, and e, the base of the Naperian or natural logarithms, whose value is approximately 2.71828.

    There are two parameters in the normal probability density function. These

    are the parametric mean, , and the standard deviation, , which determine the location and shape of the distribution (these parameters are discussed under Summary Statistics). Thus, there is not just one normal distribution, rather there is an infinity of such curves, because the parameters can assume an infinity of values (Sokol and Rohlf, 1969).

    Figure 8a

    Figure 8a (Illustration of how changes in the two parameters of the normal

    distribution affect the shape and position of histograms. Left ( = 4, = 1). Right( = 8, = 0.5)) illustrates the impact of parameters on the shape of a probability distribution histogram.

    The histogram (or curve) is symmetrical about the mean. Therefore the mean, median and mode (described later under this subtopic) of the normal distribution occur at the same point. Figure 8b (Bell curve) shows that the curve of a

    2

    2

    1

    2

    1

    YeZ

  • Gaussian normal distribution can be described by the position of its maximum,

    Figure 8b

    which corresponds to its mean () and its points of inflection. The distance between and one of the points of inflection represents the standard deviation, sometimes referred to as the mean variation. The square of the mean variation is the variance.

    In a normal frequency distribution, the standard deviation may be used to characterize the sample distribution under the bell curve. According to Sokol and

    Rohlf, (1969): 68.3% of all sample values fall within -1 to +1 from the mean,

    while 95.4% of the sample values fall within -2and +2 from the mean, and

    99.7% of the values are contained within -3 and +3 of the mean. This bears repeating, in a different format this time:

    (1 standard deviation) contains 68.3% of the data

    2 (2 standard deviations) contain 95.46% of the data

    3 (3 standard deviations) contain 99.73% of the data

    How are the percentages calculated? The direct calculation of any portion of the area under the normal curve requires an integration of the function shown as the above expression. Fortunately, for those who have forgotten their calculus, the integration has recorded in tabular form (Sokol and Rohlf, 1969). These tables can be found in most standard statistical books, for example, see Statistical Tables and Formulas, Table 1 (Hald, 1952).

    Application of the Normal Distribution

    The normal frequency distribution is the most widely used distribution in statistics. There are three important applications of the density function (Sokol and Rohlf, 1969).

  • 1. Sometimes we need to know whether a given sample is normally distributed before we can apply certain tests. To test whether a sample comes from a normal distribution we must calculate the expected frequencies for a normal curve of the same mean and standard deviation, then compare the two curves.

    2. Knowing when a sample comes from a normal distribution may confirm or reject underlying hypotheses about the nature of the phenomenon studied.

    3. Finally, if we assume a normal distribution, we may make predictions based upon this assumption. For the geosciences, this means a better and unbiased estimation of reservoir parameters between the well data.

    Normal Approximation to the Binomial Distribution Recall that approximately 95% of the measurements associated with a normal distribution lie within two standard deviations of the mean and almost all lie within three standard deviations. The binomial probability distribution would nearly be symmetrical if the distribution were able to spread out a distance equal to two standard deviations on either side of the mean, which in fact is the case. Therefore, to determine the normal approximation we calculate the following when the outcome of a trial (n) results in a 0 or 1 success with probabilities q and p, respectively:

    = np

    = npq

    If the interval 2 lies within the binomial bounds, 0 and n, the approximation will be reasonably good (Mendenhall, 1971).

    Lognormal Distribution

    Many variables in the geosciences do not follow a normal distribution, but are highly skewed, such as the distribution in Figure 7b, and as shown below.

    Figure 9 Schematic histogram of sizes and numbers of oil field discoveries of hundred thousand-barrel equivalent.

  • Figure 9

    The histogram illustrates that most fields are small, with decreasing numbers of larger fields, and a few rare giants that exceed all others in volume. If the histograms of Figure 7b and Figure 9 are converted to logarithmic forms (that is, we use Yi = log Xi instead of Yi =Xi for each observation), the distribution becomes nearly normal. Such variables are said to be lognormal.

    Transformation of Lognormal data to Normal

    The data can be converted into logarithmic form by a process known as transformation. Transforming the data to a standardized normal distribution (i.e., zero mean and unit variance) simplifies data handling and eases comparison to different data sets.

    Data which display a lognormal distribution, for example, can be transformed to resemble a normal distribution by applying the formula ln(z) to each z variate in the data set prior to conducting statistical analysis. The success of the transformation can be judged by observing its frequency distribution before and after transformation. The distribution of the transformed data should be markedly less skewed than the lognormal data. The transformed values may be back-transformed prior to reporting results.

    Because of its frequent use in geology, the lognormal distribution is extremely important. If we look at the transformed variable Yi rather than Xi itself, the properties of the lognormal distribution can be explained simply by reference to the normal distribution.

    In terms of the original transformed variable Xi, the mean of Y corresponds to the nth root of the products of Xi,

    n XiGMY

  • Where: GM is the geometric mean

    is analogous to , except that all the elements in the series are multiplied rather than added together (Davis, 1986).

    In practice, it is simpler to convert the measurements into logarithms and compute the mean and variance. If you want, the geometric mean and variance compute the antilog of Y and s2y. If you work with the data in the transformed state, all of the statistical procedures that are appropriate for ordinary variables are applicable to the log transformed variables (Davis, 1986).

    The characteristics of the lognormal distribution are discussed in a monograph by Aitchison and Brown (1969) and in the geological context by Kock and Link (1981).

    Random Error Random errors for normal distributions are additive, which means that errors of

    opposite sign tend to cancel one another, and the final measurement is near the true value. Lognormal distribution random errors are multiplicative, rather than

    additive, thus produce an intermediate product near the geometric mean.

    UNIVARIATE DATA ANALYSIS

    INTRODUCTION

    There are several ways in which to summarize a univariate (single attribute) distribution. Quite often we will simply compute the mean and the variance, or plot its histogram. However, these statistics are very sensitive to extreme values (outliers) and do not provide any spatial information, which is the heart of a geostatistical study. In this section, we will describe a number of different methods that can be used to analyse data for a single variable.

    SUMMARY STATISTICS

    The summary statistics represented by a histogram can be grouped into three categories:

    measures of location,

    measures of spread, and

    measures of shape.

    Measures of Location

    Measures of location provide information about where the various parts of the data distribution lie, and are represented by the following:

    Minimum: Smallest value.

    Maximum: Largest value.

    Median: Midpoint of all observed data values, when arranged in ascending order. Half the values are above the median, and half are below. This statistic represents the 50th percentile of the cumulative frequency histogram and is not generally affected by an occasional erratic data point.

  • Mode: The most frequently occurring value in the data set. This value falls within the tallest bar on the histogram.

    Quartiles: In the same way that the median splits the data into halves, the

    quartiles split the data in quarters. Quartiles represent the 25th, 50th and 75th percentiles on the cumulative frequency histogram.

    Mean: The arithmetic average of all data values. (This statistic is quite sensitive to extreme high or low values. A single erratic value or outlier can significantly bias the mean.) We use the following formula to determine the mean of a Population:

    Mean = =

    where:

    = population mean N = number of observations (population size)

    ZI = sum of individual observations

    We can determine the mean of a Sample in a similar manner. The below formula for the sample mean is comparable to the above formula, except that population notations have been replaced with those for samples.

    Mean =

    where: = sample mean

    n = number of observations (sample size)

    ZI = sum of individual observations

    Measures of Spread

    Measures of spread describe the variability of the data values, and are represented by the following:

    Variance: Average squared difference of the observed values from the

    mean. Because the variance involves squared differences, this statistic is very sensitive to abnormally high/low values.

    Variance =

    Kachigan (1986) notes that the above formula is only appropriate for defining variance of a population of observations. If this same formula was applied to a sample for the purpose of estimating the variance of the

    parent population from which the sample was drawn, then the formula above will tend to underestimate the population variance. This

    underestimation occurs as repeated samples are drawn from the population and the variance is calculated from each, using the sample

    mean ( , rather than the population mean (). The resulting average of

    N

    i

    n

    ix

    x

    Ni

    2

    x

  • these variances would be lower than the true value of the population variance (assuming we were able to measure every single member of the population).

    We can avoid this bias by taking the sum of squared deviations and dividing that sum by the number of observations less one. Thus, the sample estimate of population variance is obtained using the following

    formula:

    Variance = s

    Standard Deviation: Square root of the variance.

    Standard Deviation =

    This measure is used to show the extent to which the data is spread around the vicinity of the mean, such that a small value of standard deviation would indicate that the data was clustered near to the mean. For example, if we had a mean equal to 10, and a standard deviation of 1.3, then we could predict that most of our data would fall somewhere between (10 - 1.3) and (10 + 1.3), or between 8.7 to 11.3. The standard deviation is often used instead of the variance, because the units are the same as the units of the attribute being described.

    Interquartile Range: Difference between the upper (75th percentile) and the lower (25th percentile) quartile. Because this measure does not use the mean as the center of distribution, it is less sensitive to abnormally high/low values.

    Figure 1a and 1b illustrate histograms of porosity with a mean of about 15 %, but different variances.

    1

    2

    n

    xi

    2

  • 1b

    Outliers or Spurious Data

    Figure 1a

    Another statistic to consider is the Z-score; a summary statistic in terms of standard deviation. Data which appear to be anomalous based on its Z-score which have absolute values are greater than a specified cutoff are termed outliers. The typical cutoff is 2.5 standard deviations from the mean. The formula

    is the ratio of the data value minus the sample mean to the sample variance.

  • Zscore = (Zi -) /

    This statistic serves as a caution, signifying either bad data, or a true local anomaly, which must be taken into account in the final analysis.

    Note: The Z-score transform does not change the shape of the histogram. The transform re-scales the histogram with a mean equal 0 and a variance equal 1. If the histogram is skewed before being transformed, it retains the same shape

    after the transform. The X-axis is now in terms of standard deviation units about the mean of zero.

    Measures of Shape

    Measures of shape describe the appearance of the histogram and are represented by the following:

    Coefficient of Skewness: Averaged cubed difference between the data

    values and the mean, divided by the cubed root of the standard deviation. This measure is very sensitive to abnormally high/low values:

    CS1/nZi -)3/

    where:

    is the mean

    is the standard deviation n is the number of X and Y data pairs

    The coefficient of skewness allows us to quantify the symmetry of the data distribution, and tells us when a few exceptional values (possibly outliers?) exert a disproportionate effect upon the mean.

    positive: long tail of high values (median < mean)

    negative: long tail of low values (median > mean)

    zero: a symmetrical distribution

    Figure 2a, 2b,

    Figure 2a

  • and 2c

    2c

    illustrate histograms with negative, symmetrical and positive skewness.

    2b

    Coefficient of Variation: Often used as an alternative to skewness as a measure of asymmetry for positively skewed distributions with a minimum at zero. It is defined as the ratio of the standard deviation to the mean. A value of CV > 1 probably indicates the presence of some high erratic values (outliers).

  • CV =

    where:

    is the standard deviation

    is the mean

    SUMMARY OF UNIVARIATE STATISTICAL MEASURES AND DISPLAYS

    Advantages

    Easy to calculate.

    Provides information in a very condensed form.

    Can be used as parameters of a distribution model (e.g., normal distribution defined by sample mean and variance).

    Limitations

    Summary statistics are too condensed, and do not carry enough information about the shape of the distribution.

    Certain statistics are sensitive to abnormally high/low values that properly

    belong to the data set (eg.,,CS).

    Offers only a limited description, especially if our real interest is in a multivariate data set (attributes are correlated).

    BIVARIATE STATISTICAL MEASURES AND DISPLAYS

    INTRODUCTION

    Methods for bivariate description not only provide a means to describe the relationship between two variables, but are also the basis for tools used to analyze the spatial content of a random function (to be described in the Spatial Correlation and Modeling Analysis section). The bivariate summary methods described in this section only measure the linear relationship between the variables - not their spatial features.

    THE RELATIONSHIP BETWEEN VARIABLES

    Bivariate analysis seeks to determine the extent to which one variable is related to another variable. We can reason that if one variable is indeed related to another, then information about the first variable might help us to predict the behavior of the second. If, on the other hand, our analysis of these two variables shows absolutely no relationship between the two, then we might need to discard one from the pair in favor of a different variable which will be more predictive the other variable's behavior.

    The relationship between two variables can be described as complementary, parallel, or reciprocal. Thus, we might observe a simultaneous increase in value between two variables, or a simultaneous decrease. We might even see a simultaneous decrease in the value of one variable while the other increases. An alternative way of characterizing the relationship between two variables would be to describe their behaviors in terms of variance. In this case, we observe how the value of one variable may change (or vary) in a manner that leaves the

    relationship with the second variable unchanged. (For instance, if the relationship

  • was defined by a 1:10 ratio, then as the value of one variable changed, the other would vary by 10 times that amount - thus preserving the relationship.)

    Dependent and Independent Variables

    Where a relationship between variables does exist, we can characterize each variable as being either dependent or independent. We use the behavior of the

    independent (or predictor) variable to determine how the dependent (or criterion) variable will react. For instance, we might expect that an increase in the value of the independent variable would result in a corresponding increase in the value of the dependent variable.

    COMMON BIVARIATE METHODS

    The most commonly used bivariate statistical methods include:

    Scatterplots

    Covariance

    Product Moment Correlation Coefficient

    Linear Regression

    We will discuss each of these methods in turn, below.

    SCATTERPLOTS

    The most common bivariate plot is the Scatterplot, Figure 1 (Scatterplot of Porosity (dependent variable) versus Acoustic Impedance (independent variable)).

    Figure 1

    This plot follows a common convention, in which the dependent variable (e.g., porosity) is plotted on the Y-axis (ordinate) and the independent variable (e.g.,

    acoustic impedance) is plotted on the X-axis (abscissa). This type of plot serves several purposes:

    detects a linear relationship,

    detects a positive or inverse relationship,

  • identifies potential outliers,

    provides an overall data quality control check.

    This plot displays an inverse relationship between porosity and acoustic impedance, that is, as porosity increases, acoustic impedance decreases. This display should be generated before calculating bivariate summary statistics, like the covariance or correlation coefficient, because many factors affect these statistical measures. Thus, a high or low value has no real meaning until verified visually.

    A common geostatistical application of the scatterplot is the h-scatterplot. (In geostatistics, h commonly refers to the lag distance between sample points.)

    These plots are used to show how continuous the data values are over a certain distance in a particular direction. If the data values at locations separated by h are identical, they will fall on a line x = y, a 45-degree line of perfect correlation. As the data becomes less and less similar, the cloud of points on the h-

    Scatterplot becomes fatter and more diffuse. A later section will present more detail on the h-scatterplot.

    COVARIANCE

    Covariance is a statistic that measures the correlation between all points of two variables (e.g., porosity and acoustic impedance). This statistic is a very important tool used in Geostatistics to measure spatial correlation or dissimilarity between variables, and forms the basis for the correlogram and variogram (detailed later).

    The magnitude of the covariance statistic is dependent upon the magnitude of the two variables. For example, if the Xi values are multiplied by the factor k, a scalar, then the covariance increases by a factor of k. If both variables are multiplied by k, then the covariance increases by k2. This is illustrated in the table below.

    VARIABLES

    COVARIANCE

    X and Y

    3035.63

    X*10 and Y

    30356.3

    X*10 and

    Y*10

    303563

    The covariance formula is:

    COVx,y =

    n

    yixi

  • where: Xi is the X variable Yi is the Y variable

    x is the mean of X y is the mean of Y n is the number of X and Y data pairs

    It should be emphasized that the covariance is strongly affected by extreme pairs (outliers).

    Product Moment Correlation Coefficient

    The product moment correlation coefficient ( ) is more commonly called simply the correlation coefficient, and is a statistic that measures the linear relation between all points of two variables (e.g., porosity and velocity). This linear relationship is assigned a value that ranges between +1 to -1, depending on the degree of correlation:

    +1 = perfect, positive correlation 0 = no correlation -a totally random relation -1 = perfect inverse correlation.

    Figure 2 illustrates scatterplots showing positive correlation, no correlation, and inverse correlation between two variables.

    Figure 2

    The numerator for the correlation coefficient is the covariance. This value is divided by the product of the standard deviations for variables X and Y. This normalizes the covariance, thus removing the impact of the magnitude of the data values. Like the covariance, outliers adversely affect the correlation coefficient.

    The Correlation Coefficient formula (for a population) is:

    Corr. Coeff.x,y =

    yx

    n

    yiYxiX

    y,x

  • where: Xi is the X variable Yi is the Y variable

    x is the mean of X y is the mean of Y x is the standard deviation of X

    y is the standard deviation of Y n is the number of X and Y data pairs

    As with other statistical formulas, Greek is used to signify the measure of a population, while algebraic notation ( r ) is used for samples.

    Rho Squared

    The square of the correlation coefficient 2 (also referred to as r2) is a measure of the variance accounted for in a linear relation. This measure tells us about the extent to which two variables covary. That is, it tells us how much of the variance

    seen in one variable can be predicted by the variance found in the other variable.

    Thus, a value of = -0.83 between porosity and acoustic impedance tells us that as porosity increases in value, velocity decreases, which has a real physical meaning. However, only about 70% (actually, it is -0.832, or 68.89%)of the variability in porosity is explained by its relationship with acoustic impedance.

    In keeping with statistical notation, the Greek symbol 2 is used to denote the correlation coefficient of a population, while the algebraic equivalent is used to r2 refer to the correlation coefficient of a sample.

    Linear Regression

    Linear regression is another method we use to indicate whether a linear relationship exists between two variables. This is a useful tool, because once we establish a linear relationship, we may later be able to interpolate values between points, extrapolate values beyond the data points, detect trends, and detect points that deviate away from the trend.

    Figure 3 (Scatterplot of inverse linear relationship between porosity and acoustic impedance, with a correlation coefficient of -0.83), shows a simple display of

    regression.

  • Figure 3

    When two variables have a high covariance (strong correlation), we can predict a linear relationship between the two. A regression line drawn through the points

    of the scatterplot helps us to recognize the relationship between the variables. A positive slope (from lower left to upper right) indicates a positive or direct relationship between variables. A negative slope (from upper left to lower right) indicates a negative or inverse relationship. In the example illustrated in the above figure, the porosity clearly tends to decrease as acoustic impedance increases.

    The regression equation has the following general form:

    Y = a + bXi,

    where: Y is the dependent variable, or the variable to be estimated (e.g., porosity) Xi is the independent variable, or the estimator (e.g., velocity)

    b is the slope; defined as b = (y/x), and

    is the correlation coefficient between X and Y

    x is the standard deviation of X

    y is the standard deviation of Y a is a Constant, which defines the ordinate (Y-axis) intercept

    and:

    a = x -by x is the mean of X y is the mean of Y

  • With this equation, we can plot a regression line that will cross the Y-axis at the point a, and will have a slope equal to b

    Linear equations can include polynomials of any degree, and may include combinations of logarithmic, exponential or any other non-linear variables.

    The terms in the equation for which coefficients are computed are independent terms, and can be simple (a single variable) or compound (several variables multiplied together). It is also common to use cross terms (the interaction between X and Y), or use power terms.

    Z = a + bX +cY: uses X and Y as predictors and a constant

    Z = a + bX + cY + dXY: adds the cross term

    Z = a + bX + cY + dXY +eX2 + fY2: adds the power terms

    SUMMARY OF BIVARIATE STATISTICAL MEASURES AND DISPLAYS

    Advantages

    Easy to calculate.

    Provides information in a very condensed form.

    Can be used to estimate one variable from another variable or from multiple variables.

    Limitations

    mvarhaug Summary statistics sometimes can be too condensed, and do not carry enough information about the shape of the distribution.

    Certain statistics are sensitive to abnormally high/low values that properly belong to the data set (e.g., covariance, correlation coefficient). Outliers can highly bias a regression predication equation.

    No spatial information.

    EXPLORATORY DATA ANALYSIS

    The early phase of a geostatistics project often employs classical statistical tools in a general analysis and description of the data set.

    This process is commonly referred to as Exploratory Data Analysis, or simply EDA. It is conducted as a way of validating the data itself -you need to be sure each value that you plug into the geostatistical model is valid. (Remember: garbage in -garbage out!) By analyzing the data itself, you can determine which points represent anomalous values of an attribute (outliers) that should either be

    disregarded or should be scrutinized more closely.

    EDA is an important precursor to the final goal of a geostatistical study, which may be interpolation, or simulation and assessment of uncertainty. Unfortunately, in many studies, including routine mapping of attributes, EDA is often overlooked. However, it is absolutely necessary to have a good understanding of your data, so taking the time in EDA to check the quality of the data, as well as exploring and describing the data set, will reward you with improved results.

    Formatted: Bullets and Numbering

  • The classical statistical tools described in previous sections, along with the tools that we will introduce in the sections under the Data Validation heading, will help you to conduct a thorough analysis of your data.

    EDA PROCESS

    Note that there is no one set of prescribed steps in EDA. Often, the process will include a number of the following tasks, depending on the amount and type of data involved:

    data preprocessing

    univariate and multivariate statistical analysis

    identification and probable removal of outliers

    identification of sub-populations

    data posting

    quick maps

    sampling of seismic attributes at well locations

    At the very least, you should plot the distribution of attribute values within your data set. Look for anomalies in your data, and then look for possible explanations for those anomalies. By employing classical statistical methods to analyze your data, you will not only gain a clearer understanding of your data, but will also discover possible sources of errors and outliers.

    Geoscientists tasked with making predictions about the reservoir will always face these limitations:

    Most prospects provide only a very few direct hard observations (well data)

    Soft data (seismic) is only indirectly related to the hard well data

    A scarcity of observations can often lead to a higher degree of uncertainty

    These problems can be compounded when errors in the data are overlooked. This is especially troublesome with large data sets, and when computers are involved; we simply become detached from our data. A thorough EDA will foster an intimate knowledge of the data to help you flag bogus results. Always take the time to explore your data.

    SEARCH NEIGHBORHOOD CRITERIA

    INTRODUCTION

    All interpolation algorithms require a standard for selecting data, referred to as the search neighborhood. The parameters that define a search neighborhood include:

    Search radius

    Neighborhood shape

    Number of sectors ( 4 or 8 are common)

    Number of data points per sector

    Azimuth of major axis of anisotropy

  • When designing a search neighborhood, we should remember the following points:

    Each sector should have enough points ( 4) to avoid directional sampling bias.

    CPU time and memory requirements grow rapidly as a function of the number of data points in a neighborhood.

    We will see a further example of the search neighborhood in our later discussion on kriging.

    SEARCH STRATEGIES

    Two common search procedures are the Nearest Neighbor and the Radial Search methods. These strategies calculate the value of a grid node based on data points in the vicinity of the node.

    Nearest Neighbor

    One simple search strategy looks for data points that are closest to the grid node, regardless of their angular distribution around the node. The nearest neighbor search routine is quick, and works well as long as samples are spread about evenly. However, it provides poor estimates when sample points are concentrated too closely along widely spaced traverses.

    Another drawback to the nearest neighbor method occurs when all nearby points are concentrated in a narrow strip along one side of the grid node (such as might be seen when wells are drilled along the edge of a fault or pinchout). When this occurs, the selection of points produces an estimate of the node that is essentially unconstrained, except in one direction. This problem may be avoided by specifying search parameters which select control points that are evenly distributed around the grid node.

    Radial Searches

    Two common radial search procedures are the quadrant search, and its close relative, the octant search. Each is based on a circular or elliptical area, sliced into four or eight equal sections. These methods require a minimum number of control points for each of the four or eight sections surrounding the grid node.

    These constrained search procedures test more neighboring control points than the nearest neighbor search, which increases the time required. Such constraints on searching for nearest control points will expand the size of the search neighborhood surrounding the grid node because a number of nearby control points will be passed over in favor of more distant points that satisfy the requirement for a specific number of points being selected from a single sector.

    In choosing between the simple nearest neighbor approach and the constrained quadrant or octant searches, remember that the autocorrelation of a surface decreases with increasing distance, so the remote data points sometimes used by the constrained searches are less closely related to the location being estimated. This may result in a grid node estimate that is less realistic than that produced by the simpler nearest neighbor search.

  • SPATIAL DESCRIPTION

    One of the distinguishing characteristics of earth science data is that these data sets are assigned to some particular location in space. Spatial features of the data sets, such as the degree of continuity, directional trends and location of extreme values, are of considerable interest in developing a reservoir description. The statistical descriptive tools presented earlier are not able to capture these spatial features. In this section, we will use a data set from West Texas to demonstrate tools that describe spatial aspects of the data.

    DATA POSTING

    Data posting is an important initial step in any study (Figure 1: Posted porosity data for 55 wells from North Cowden Field in West Texas).

    Figure 1

    Not only do these displays reveal obvious errors in data location, but they often also highlight data values that may be suspect. Lone high values surrounded by low values (or visa versa) are worth investigating. Data posting may provide clues as to how the data were acquired. Blank areas may indicate inaccessibility (another companys acreage, perhaps); heavily sampled areas indicate some initial interest. Locating the highest and lowest values may reveal trends in the data.

    In this example, the lower values are generally found on the west side of the area, with the larger values in the upper right quadrant. The data are sampled on a nearly regular grid, with only a few holes in the data locations. The empty spots in the lower right corner are on acreage belonging to another oil company. Other missing points are the result of poor data, and thus are not included in the final data set. More information is available about this data set in an article by Chambers, et al. (1994). This data set and acoustic impedance data from a high-resolution 3D seismic survey will be used to illustrate many of the geostatistical concepts throughout the remainder of this presentation.

  • DATA DISTRIBUTION

    A reservoir property must be mapped on the basis of a relatively small number of discrete sample points (most often consisting of well data). When constructing maps, either by hand or by computer, attention must be paid to the distribution of those discrete sample points. The distribution of points on maps (Figure 2, Typical distribution of data points within the map area.

    Figure 2

    ) may be classified into three categories: regular, random, or clustered (Davis, 1986).

    Regular: The pattern is regular (Figure 2 -part a) if the points are located

    on some sort of grid pattern, for example, a 5-spot well pattern. The patterns of points are considered uniform in density if the points in any

    sub-area are equal to the density of points in any other sub-area.

    Random: When points are distributed at random (Figure 2 -part b) across the map area, the coverage may be uniform, however, we do not expect to see the same number of points within each sub-area.

    Clustered: Many of the data sets we work with show a natural clustering (Figure 2 -part c) of points (wells). This is especially true when working on a more regional scale.

  • GRIDS AND GRIDDING

    INTRODUCTION

    One of our many tasks as geoscientists is to create contour maps. Although contouring is still performed by hand, the computer is used more and more to map data, especially for large data sets. Unfortunately, data are often fed into the computer without any special treatment or exploratory data analysis. Quite often, defaults are used exclusively in the mapping program, and the resulting maps are accepted without question, even though the maps might violate sound geological principles.

    Before using a computer to create a contour map it is necessary to create a grid and then use the gridding process to create the contours. This discussion will introduce the basic concepts of grids, gridding and interpolation for making contour maps.

    WHAT IS A GRID?

    Taken to extremes, every map contains an infinite number of points within its map area. Because it is impractical to sample or estimate the value of any variable at an infinite number of points within the map area, we define a grid to describe locations where estimates will be calculated for use in the contouring process.

    A grid is formed by arranging a set of values into a regularly spaced array, commonly a square or rectangle, although other grid forms may also used. The locations of the values represent the geographic locations in the area to be mapped and contoured (Jones, et al., 1986). For example, well spacing and known geology might influence your decision to calculate porosity every 450 feet in the north-south direction, and every 300 feet in the east-west direction. By specifying a regular interval of columns (every 450 feet in the north-south direction) and rows (every 300 feet in the east-west direction), you have, in effect, created a grid.

    Grid nodes are formed by the intersection of each column with a row. The area enclosed by adjacent grid nodes is called a grid cell (three nodes for a triangular

    arrangement, or more commonly, four nodes for a square arrangement). Because the sample data represent discrete points, a grid should be designed to

    reflect the average spacing between the wells, and designed such that the individual data points lie as closely as possible to a grid node.

    GRID SPACING

    The grid interval controls the detail that can be seen in the map. No features smaller than the interval are retained. To accurately define a feature, it must cover two to three grid intervals; thus the cell should be small enough to show the required detail of the feature. However, there is a trade-off involving grid size. Large grid cells produce quick maps with low resolution, and a course appearance. While small grid cells may produce a finer appearance with better resolution, they also tend to increase the size of the data set, thus leading to longer computer processing time; furthermore a fine grid often imparts gridding artifacts that show up in the resulting map (Jones, et al., 1986).

  • A rule of thumb says that the grid interval should be specified so that a given grid cell contains no more than one sample point. A useful approach is to estimate, by

    eye, the average well spacing, and use it as the grid interval, rounded to an even increment (e.g., 200 rather than 196.7).

    GRIDS AND GRIDDING

    Within the realm of geostatistics, you will often discover that seemingly similar words have quite different meanings. In this case, the word gridding should not be considered as just a grammatical variation on the word grid.

    Gridding is the process of estimating the value of an attribute from isolated points onto a regularly spaced mesh, called a grid (as described above). The attributes values are estimated at each grid node.

    Interpolation And Contouring

    The objective of contouring is to visually describe or delineate the form of a surface. The surface may represent a structural surface, such as depth to the top of a reservoir, or may represent the magnitude of a petrophysical property, such as porosity. Contour lines, strictly speaking, are isolines of elevation. However, geologists are rather casual about their use of terminology, and usually call any isoline a contour, whether it depicts elevation, porosity, thickness, composition, or other property.

    Contour maps are a type of three-dimensional graph or diagram, compressed onto a flat, two-dimensional representation. The X-and Y-axes usually correspond to the geographical coordinates east-west and north-south. The Z-

    axis typically represents the value of the attribute, for example: elevation with respect to sea level, or porosity, thickness, or some other quantity (Davis, 1986).

    Contour lines connect points of equal value on a map, and the space between two successive contour lines contains only points falling within the interval defined by the contour lines. It is not possible to know the value of the surface at every possible location, nor can we measure its value at every point we might

    wish to choose. Thus, the purpose of contouring is to summarize large volumes of data and to depict its three-dimensional spatial distribution on a 2-D paper surface. We use contour maps to represent the value of the property at unsampled locations (Davis, 1986; Jones, et al., 1986).

    The Interpolation Process The mapping (interpolation) and contouring process involves four basic steps. According to Jones, et al. (1986), the four mapping and contouring steps are:

    1. Identifying the area and the attribute to be mapped (Figure 1 , Location and values of control points within the mapping area, North Cowden Field, West Texas);

  • Figure 1

    2. Designing the grid over the area (Figure 2 , Grid design superimposed on the control points);

    Figure 2

    3. Calculating the values to be assigned at each grid node (Figure 3, Upper left quadrant of the grid shown in Figure 2.

  • Figure 3

    The values represent interpolated values at the grid nodes.These values are used to create the contours shown in Figure 4);

    4. Using the estimated grid node values to draw contours (Figure 4 , Contour map of porosity, created from the contol points in Figure 1 and the grid mesh values shown in Figure 3).

    Figure 4

    To illustrate these steps, we will use porosity measurements from the previously mentioned West Texas data set.

  • TRADITIONAL INTERPOLATION METHODS

    INTRODUCTION

    The point-estimation methods described in this section consist of common methods used to make contour maps. These methods use non-geostatistical interpolation algorithms and do not require a spatial model. They provide a way to create an initial quick look map of the attributes of interest. This section is not meant to provide an exhaustive dissertation of the subject, but will introduce certain concepts needed to understand the principles of geostatistical interpolation and simulation methods discussed in later sections.

    Most interpolation methods use a weighted average of values from control points in the vicinity of the grid node in order to estimate the value of the attribute assigned to that node. With this approach, the attribute values of the nearest control points are weighted according to their distance from the grid node, with the heavier weights assigned to the closest points. The attribute values of grid nodes that lie beyond the outermost control points must be extrapolated from values assigned to the nearest control points.

    Many of the following methods require the definition of Neighborhood parameters

    to characterize the set of sample points used during the estimation process, given the location of the grid node. For the upcoming examples, weve specified the following neighborhood parameters:

    Isotropic ellipse with a radius = 5000 feet

    4 quadrants

    A minimum of 7 sample points, with an optimum of 3 sample points per quadrant

    These examples use porosity measurements, located on a nearly regular grid. See Figure 1 (Location and values of control points within the mapping area at North Cowden Field, West Texas) for the sample locations and porosity values.

    Figure 1

  • The following seven estimation methods will be discussed in turn:

    Inverse Distance

    Closest Point

    Moving Average

    Least Squares Polynomial

    Spline

    Polygons of Influence

    Triangulation

    The first five estimation methods are accompanied by images that illustrate the patterns and relative magnitude of the porosity values created by each method. All images have the same color scale. The lowest value of porosity is dark blue (5%) and the highest value is red (13%), with a 0.5% color interval. However, for the purpose of this illustration, the actual values are not important at this time. (No porosity mapping images were produced for the polygons of influence and triangulation methods.)

    INVERSE DISTANCE

    This estimation method uses a linear combination of attribute values from neighboring control points. The weights assigned to the measured values used in the interpolation process are based on distance from the grid node, and are inversely proportional, at a given power (p). If the smallest distance is smaller than a given threshold, the value of the corresponding sample is copied to the

    grid node. Large values of p ( 5 or greater) create maps similar to the closest point method (Isaak