comparison of various classification techniques for satellite...
TRANSCRIPT
International Journal Of Scientific & Engineering Research, Volume 4, Issue 1, January-2013 1 ISSN 2229-5518
1
1
Comparison of Various Classification Techniques for Satellite Data
1Manoj Pandya,
1Astha Baxi,
1M.B. Potdar,
1M.H. Kalubarme,
2Bijendra Agarwal
Abstract - Computer interpretation of remote sensing data is referred to as quantitative analysis because of its ability to identify pixels
based upon their numerical properties and owing to its ability for counting pixels for area estimates. It is also generally called classification,
which is a method by which labels may be attached to pixels in view of their spectral character. This labeling is implemented by a computer
by having trained it beforehand to recognize pixels with spectral similarities. Clearly the image data for quantitative analysis must be
available in digital form. This is an advantage with image data types, such as that from Landsat, SPOT, IRS, etc, as against more
traditional aerial photographs. The latter require digitization before quantitative analysis can be performed. Various classification
techniques adopted in this study include unsupervised classifiers like K-Means, ISODATA and supervised classifiers like Minimum
Distance, Maximum Likelihood Classifier, Parallelpiped and Enhanced Seeded region Growing Technique.
Index Terms – Satellite Data, Classification, supervised, unsupervised, K-Means, ISODATA, MXL, Prallelpiped, Seeded Region
Growing
—————————— ——————————
1. INTRODUCTION
atellite imagery is a source of large amount of information.
A two dimensional image that is recorded by satellite
sensors is the mapping of the three dimensional visual
world. The captured two dimensional signals are sampled
and quantized to yield digital images. An Image is worth a
thousand words. Satellite image interpreters from various
domains extract Information by marking polygon features on
image. Various classification methods facilitate to
automatically classify objects from image. There are several
methods which can be used to extract features.
2. IMAGE SEGMENTATION
Segmentation of an image is defined by a set of regions that
are connected and non-overlapping, so that each pixel in a
segment in the image acquires a unique region label that
indicates the region it belongs to. Segmentation is one of the
most important elements in automated image analysis, mainly
because at this step the objects or other entities of interest
are extracted from an image for subsequent processing,
such as description and recognition.
3 CLASSIFICATION METHODS
There are basically two kinds of classification methods.
Unsupervised and supervised.
Unsupervised Classifiers:
These kinds of classifiers don’t require training site or
training resources to classify objects.
K-MEANS
In statistics and data mining, k-means clustering is a
method of cluster analysis which aims to partition n
observations into k clusters in which each observation belongs
to the cluster with the nearest mean.
It is an iterative procedure.
• In first step, it assigns an arbitrary initial cluster
vector.
• The second step classifies each pixel to the closest
cluster.
• In the third step the new cluster mean vectors are
calculated based on all the pixels in one cluster.
• The second and third steps are repeated until the
"change" between the iteration is small. The "change" can be
defined in several different ways; either by measuring the
distances the mean cluster vector has changed from one
iteration to another or by the percentage of pixels that have
changed between iterations.
• The objective of the k-means algorithm is to minimize
the within cluster variability. The objective function (which is
to be minimized) is:
Sums of squares distances (errors) between each pixel and
its assigned cluster centre.
S
————————————————
Manoj Pandya, Astha Baxi, M.B. Potdar & M.H. Kalubarme are currently affiliated with Bhaskaracharya Institute for Space Applications and Geo-Informatics (BISAG), Gandhinagar, Gujarat, India PIN 382007 E-mail: [email protected]
Dr. Bijendra Agarwal is Director, VJMS College, VADU, Kadi, Mahesana, Gujarat, India
International Journal Of Scientific & Engineering Research, Volume 4, Issue 1, January-2013 2 ISSN 2229-5518
1
2
1
3
1
4
Where, C(x) is the mean of the cluster that pixel x is
assigned to.
Minimizing the SSdistances is equivalent to minimizing the
Mean Squared Error (MSE). The MSE is a measure of the
within cluster variability.
ISO Data (Iterative Self-Organizing) Classifier
ISO Data stands for Iterative Self-Organizing Data Analysis
Techniques. This algorithm allows the number of clusters to
be automatically adjusted during the iteration by merging
similar clusters and splitting clusters.
Clusters are merged if either the number of members (pixel)
in a cluster is less than a certain threshold or if the centres of
two clusters are closer than a certain threshold. Clusters are
split into two different clusters if the cluster standard
deviation exceeds a predefined value and the number of
members (pixels) is twice the threshold for the minimum
number of members. [1]
The ISODATA algorithm is similar to the k-means
algorithm with the distinct difference that the ISODATA
algorithm allows for different number of clusters while the k-
means assumes that the number of clusters is known a priori.
• Each pixel is compared to each cluster mean and
assigned to the cluster whose mean is closest in Euclidean
distance.
• A new cluster center is computed by averaging the
locations of all the pixels assigned to that cluster.
• The Sum of Squared Errors (SSE) computes the
cumulative squared difference (in the various bands) of each
pixel from its cluster center for each cluster individually, and
then sums these measures over all the clusters.
• The algorithm will stop either when the # iteration
threshold is reached or the max % of unchanged pixel
threshold is reached
Supervised Classifiers:
These kinds of classifiers require adequate training sites or
training signatures for each class of a given image.
Minimum Distance Classifier
This classifier is also referred to as central classifier. This
classifier is the simplest classifier.
• Analyst first computes mean of each training class.
• Next the distance (Euclidian) of each pixel from the
mean is calculated.
• The pixel is assigned to that class whose distance is
nearest to the mean (i.e. the Euclidian distance is minimum
between the pixel and the mean).
The distance is defined as an index of similarity so that the
minimum distance is identical to the maximum similarity.
Figure 1 shows the concept of a minimum distance classifier.
The following distances are often used in this procedure. [7]
Figure 1: Minimum Distance Classifier
It is used to classify unknown image data to classes which
minimize the distance between the image data and the class in
multi-feature space. In the Euclidean plane, if p = (p1, p2) and q = (q1, q2) then the
distance is given by
Fig: 1 Schema for feature tables using pre-defined data types
Parallel Piped Classifier:
The parallelepiped classifier (often termed multi-level slicing)
divides each axis of multi-spectral feature space, as shown in
an example in Figure 2
Figure 2: Schematic concept of parallel piped classifier in three dimensional feature spaces
The decision region for each class is defined on the basis of a
lowest and highest value on each axis. The accuracy of
classification depends on the selection of the lowest and
highest values in consideration of the population statistics of
each class. In the two-dimensional feature space this forms a
rectangular box. We can have as many boxes as number of
classes. All the data points which fall within the box are
labeled to that class. In this respect, it is most important that
the distribution of population of each class is well understood.
[2]
International Journal Of Scientific & Engineering Research, Volume 4, Issue 1, January-2013 3 ISSN 2229-5518
1
5
1
6
1
7
1
8
Maximum Likelihood Classifier (MXL):
The maximum likelihood classifier is one of the most popular
methods of classification in remote sensing, in which a pixel
with the maximum likelihood is classified into the
corresponding class. The likelihood Lk is defined as the
posterior probability of a pixel belonging to class k. [4]
Lk = P(k)*P(X/k) / P(i)*P(X/i)
where P(k) : prior probability of class k
P(X/k) : conditional probability to observe X from class k, or
probability density function
Usually P(k) are assumed to be equal to each other and
P(i)*P(X/i) is also common to all classes.
Therefore Lk depends on P(X/k) or the probability density
function.
For mathematical reasons, a multivariate normal distribution
is applied as the probability density function. In the case of
normal distributions, the likelihood can be expressed as
follows. [5]
Where,
n: number of bands
X: image data of n bands
Lk(X) : likelihood of X belonging to class k
Xk : mean vector of class k
: variance-covariance matrix of class k
is determinant of
In the case where the variance-covariance matrix is
symmetric, the likelihood is the same as the Euclidian
distance.
Figure 3: concept of the maximum likelihood method
Seeded Region Growing Technique:
Seed point selection is based on some user criterion like
pixels in a certain gray-level range, pixels evenly spaced on a
grid, etc. An image is segmented into regions with respect to a
set of q seeds. Given the set of seeds, S1, S2, . . . , Sq, each step
of SRG involves one additional pixel to one of the seed sets.
Moreover, these initial seeds are further replaced by the
centroids of these generated homogeneous regions, R1, R2, ... ,
Rq, by involving the additional pixels step by step. The pixels
in the same region are labeled by the same symbol and the
pixels in variant regions are labeled by different symbols. All
these labeled pixels are called the allocated pixels, and the
others are called the unallocated pixels. Let H be the set of all
unallocated pixels which are adjacent to at least one of the
labeled regions. [14]
Where, N(x, y) is the second-order neighborhood of the
pixel (x, y) as shown in Figure 1. For the unlabeled pixel (x, y)
є H, N(x, y) meets just one of the labeled image region Ri and
define (x, y) є {1, 2, ..., q} to be that index such that N(x, y) ∩
R(x, y) . (x, y, Ri) is defined as the difference between the
testing pixel at (x, y) and its adjacent labeled region Ri. (x, y,
Ri) is calculated as
Where g(x, y) indicates the values of the three color
components of the testing pixel (x, y), g (Xci , Y ci ) represents
2
International Journal Of Scientific & Engineering Research, Volume 4, Issue 1, January-2013 4 ISSN 2229-5518
the average values of three colors components of the
homogeneous region Ri, with g(Xci , Y ci ) the centroid of Ri.
Figure 1 identification of pixels around the seed pixel
If N(x, y) meets two or more of the labeled regions, (x, y)
takes a value of i such that N(x, y) meets Ri and (x, y, Ri) is
minimized.
This seeded region growing procedure is repeated until all
pixels in the image have been allocated to the corresponding
regions.
Enhanced Seeded Region Growing:
This technique was specifically designed by Baxi et al., 2012,
for mobile applications along with mahalanobis distance
classifier [16]. SRG is also very attractive for semantic image
segmentation by involving the high-level knowledge of image
components in the seed selection procedure.
The FEATURE TABLE stores a collection of features.
The location information with metadata is considered as seed
points which can be super-imposed on the geog-referenced
image using Application Programming Interface (API). These
seed points generate training sites for SRG technique
Enhanced Classifier:
The distance of pixel from the mean of given class is
calculated by variance and co-variance methods. It differs
from Euclidean distance in that it takes into account the
correlations of the data set and is multivariate effect size.
where X : vector of image data (n bands)
X = [ x1, x2, .... xn]
Xk : mean of the kth class
Xk = [ m1, m2, .... mn]
(
9
(
10
International Journal Of Scientific & Engineering Research, Volume 4, Issue 1, January-2013 5 ISSN 2229-5518
k : variance matrix
k : variance-covariance matrix
4 COMPARISONS OF CLASSIFIERS
Figure 1 (a) FCC LISS 3 Image of Surendranagar district India with Cotton
Seeds marked in yellow circles
Figure 1 (b) Paddy Seeds in yellow circles
CLASSIFIED IMAGE
Figure 3 Enhanced SRG for both classes
ACCURACY ASSESSMENT
Kappa coefficient is used to evaluate the accuracy. The
original intent of Cohen's Kappa was to measure the degree of
agreement or disagreement of two or more people observing
the same phenomenon. Cohen's kappa measures the
agreement between two raters who each classify N items into
C mutually exclusive categories. The equation for K is:
K =Pr(a) -Pr(e)/1 - Pr(e)
Where Pr(a) is the relative observed agreement among
raters or the total agreement probability, and Pr(e) is the
hypothetical probability of chance agreement, using the
observed data to calculate the probabilities of each observer
randomly saying each category. If the raters are in complete
agreement then K = 1. If there is no agreement among the
raster, then K ≤ 0. The better performing classifiers should
then have the higher K.
Table: Comparison of various classifiers
International Journal Of Scientific & Engineering Research, Volume 4, Issue 1, January-2013 6 ISSN 2229-5518
Figure: Classifier vs. Kappa co-efficient chart
5 CONCLUSIONS
Various Classification techniques for satellite images have
different accuracy. Hybrid method of SRG and Mahalanobis
technique leads to better accuracy.
The extension to the ESRG classifier is Artificial Neural
Network (ANN) and Fuzzy logic. Object oriented
classification also leads to precisely identify objects like trees,
car, buses etc.
ACKNOWLEDGEMENTS
The authors would like to thank to the Director, BISAG, T .P.
Singh for his inspiration and motivation.
REFERENCES
[1] http://www.yale.edu/ceo/Projects/swap/landcover/Unsupervise
d_classification.htm
[2] Parallel Piped Classifier “http://stlab.iis.u-
tokyo.ac.jp/~wataru/lecture/rsgis/rsnote/cp11/11-4-1.gif”
[3] Variance
“www.stat.yale.edu/~pollard/Courses/241.fall97/Variance.pdf”
[4] Maximum Likelihood http://stlab.iis.u-
tokyo.ac.jp/~wataru/lecture/rsgis/rsnote/cp11/cp11-7.htm
[5] http://edndoc.esri.com/arcobjects/9.2/net/shared/geoprocessing/
spatial_analyst_tools/how_maximum_likelihood_classification
_works.htm
[6] Mahabolis_Dist.txt
http://research.cs.tamu.edu/prism/lectures/iss/iss_l12.pdf
[7] Joseph, George (2003) Fundamentals of Remote Sensing ISBN:
81 7371 457 6
[8] Ahmed Rekik, Mourad Zribi, et al. An Optimal Unsupervised
Satellite image Segmentation Approach Based on Pearson
System and k-Means Clustering Algorithm Initialization
[9] Tobler, W. R. “A classification of map projections”, Annals of the
Association of American Geographers, Vol. 52, pp. 167-175.
[10] Sergio Bernabé, Antonio Plaza, A New Tool for Classification of
Satellite Images Available from Google Maps: Efficient
Implementation in Graphics Processing Units
[11] Piscataway, NJ: IEEE, Knowledge-based semi-supervised
satellite image classification
[12] Aykut AKGÜNa,, A.Hüsnü ERONAT, et at. Comparing
Different Satellite Image Classification Methods: An
Application In Ayvalik District,Western Turkey
[13] Frank Y. Shih, Shouxian Cheng,Computer Vision Laboratory,
College of Computing Sciences, New Jersey Institute of
Technology, Newark, NJ 07102, USA,Automatic seeded region
growing for color image segmentation (2005)
[14] Jianping Fan a, Guihua Zeng Frank Y. Shih, et al. , Seeded
region growing: an extensive and comparative study
[15] http://en.wikipedia.org/wiki/Region_growing
[16] Astha Baxi, Manoj Pandya, M .H. Kalubarme and M .B.
Potdar,2012. Supervised Classification of Satellite Imagery
using Enhanced Seeded Region Growing Technique
[17] Jianping Fan, Guihua Zeng , Mathurin Body , Mohand-Said
Hacid, “Seeded region growing: an extensive and
comparative study”, Elsevier B.V. , Pattern Recognition Letters
26 , 2005
[18] Rolf Adams and Leanne Bischo, “Seeded region growing”, IEEE
transactions on pattern analysis and machine intelligence, vol.
16, no. 6, June 1994
INSTITUTE OF TECHNOLOGY, NIRMA UNIVERSITY, AHMEDABAD – 382 481, 08-10 DECEMBER, 2011
Supervised Classification of Satellite Imagery using
Enhanced Seeded Region Growing Technique
Astha Baxi, Manoj Pandya, M .H. Kalubarme and M .B. Potdar
Abstract - The image classifications techniques have been
practiced by remote sensing experts following certain methods
like unsupervised and supervised. Supervised classification
requires precise human intervention to extract features.
Enhanced Seeded region growing technique is an image
segmentation method; where the image pixel is seeded by
latitude and longitude recorded during ground truth data
collection using GPS. The Enhanced seeded region growing
technique generates clusters based upon 8 nearest neighbor
pixel connections. Pattern recognition standard software is
trained for the spectral signatures of the corresponding pixels.
Then the supervised classification algorithm can be used.
The system can leverage the potential of Location based
services (LBS) and Information Communication Technology
(ICT) to dynamically pull the latitude and longitude from the
server using web services and gateway protocols. This method
requires less effort to extract features from the image. This
scheme is applied on satellite imagery covering surendranagar
district in Gujarat, India.
Index Terms - Image segmentation, Location Based
Services, Satellite Image Classification, Seeded Region Growing
Technique, Supervised Classification.
I. INTRODUCTION
In the study of digital image processing, classification
techniques have been considered as basic approaches for
feature extraction and analysis. The classical methods of
unsupervised classification comprising K-Means and
ISODATA undergo limitations in terms of accuracy. The
methods of supervised classification encompassing
Maximum Likelihood (MXL), Parallelepiped Classifier,
Minimum Distance classifier etc. require precise human
intervention in identification of training sites or signature
marking for the system.
An Enhanced Seeded Region Growing (SRG) technique
identifies training sites from the corresponding location of the
site from the ground truth data. The location comprising
latitude, longitude and feature metadata can be superimposed
on the geo-referenced satellite image. The geographical
location is denoted as a seed pixel. This seed pixel is
epitomized as a training signature. The algorithm classifies
the image accordingly. The system requires less human
intervention compared to the conventional supervised
classification methods.
As the tone of a particular object depends upon the
reflection of the sunlight; it may vary on different days. A
satellite travels a path in a day. For different path, the
signature may be different. Thereafter, the signature can be
used to classify the images from the same path only.
The paper initially explains the seeded region growing
technique, and then an enhanced algorithm based on seeded
region growing technique is proposed in section III. The
algorithm is implemented on remote sensing data of site
covering surendanagar district in Gujarat, India. The result of
the output is discussed in section IV. Applications and
limitations of proposed algorithm are described in further
sections.
II. SEEDED REGION GROWING TECHNIQUE
Seeded region growing is a simple region-based image
segmentation method. It is also classified as a pixel-based
image segmentation method since it involves the selection of
initial seed points. This approach to segmentation examines
neighboring pixels of initial “seed points” and determines
whether the pixel neighbors should be added to the region.
The seed points are the pixels representing the signature of a
class. The process is iterated on, in the same manner as
common data clustering algorithms.
In conventional SRG technique the first step is to select a
set of seed points as described in [1]. Seed point selection is
based on some user criterion like pixels in a certain gray-level
range, pixels evenly spaced on a grid, etc. An image is
segmented into regions with respect to a set of q seeds. Given
the set of seeds, S1, S2, . . . , Sq, each step of SRG involves
one additional pixel to one of the seed sets. Moreover, these
initial seeds are further replaced by the centroids of these
generated homogeneous regions, R1, R2, ... , Rq, by
involving the additional pixels step by step. The pixels in the
same region are labeled by the same symbol and the pixels in
variant regions are labeled by different symbols. All these
labeled pixels are called the allocated pixels, and the others
are called the unallocated pixels. Let H be the set of all
unallocated pixels which are adjacent to at least one of the
labeled regions.
Where, N(x, y) is the second-order neighborhood of the
pixel (x, y) as shown in Fig 1. For the unlabeled pixel (x, y) є
H, N(x, y) meets just one of the labeled image region Ri and
define (x, y) є {1, 2, ..., q} to be that index such that N(x, y)
∩ R (x, y) . (x, y, Ri) is defined as the difference between
the testing pixel at (x, y) and its adjacent labeled region Ri.
(x, y, Ri) is calculated as
1
978-1-4577-2168-7/11/$26.00 ©2011 IEEE
INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN TECHNOLOGY, „NUiCONE – 2011‟
Where g(x, y) indicates the values of the three color
components of the testing pixel (x, y), g (Xci , Y ci ) represents
the average values of three colors components of the
homogeneous region Ri, with g(Xci , Y ci ) the centroid of Ri.
Fig. 1 Identification of pixels around the seed pixel
If N(x, y) meets two or more of the labeled regions, (x, y)
takes a value of i such that N(x, y) meets Ri and (x, y, Ri) is
minimized.
This seeded region growing procedure is repeated
until all pixels in the image have been allocated to the
corresponding regions. The definition of Equation (1) and (3)
ensures that the final classification will divide the image into
a set of regions as homogeneous as possible on the basis of
the given constraints. SRG algorithm is robust, rapid and free
of tuning parameters. It can correctly separate the regions that
have the same properties we define. The concept is simple
and we only need a small numbers of seed point to represent
the property we want. We can also choose multiple criteria at
the same time. It performs well with respect to noise.
However, SRG algorithm requires the mechanism to
automate the seed selection procedure.
III. METHODOLOGY
Use of GEO ICT can be converged in image classification
process to generate seed points. The GPS enabled mobile
devices can be used to collect feature details and position of
the location. The location details are transmitted to the
centralized server at the time of ground survey in the form of
GEO SMS using GSM modem or Gateway services
protocols. The server fetches corresponding information and
stores in the database. The location information with
metadata is considered as seed points which can be super-
imposed on the geog-referenced image using Application
Programming Interface (API). These seed points generate
training sites for SRG technique. All images taken from the
same path on that particular day can be classified using this
method. Once the signature data is collected, the image can
be classified by the following steps:
Fig. 2 Use of GEO ICT in the seed pixel based image data classification
1. Select the area of interest (AOI) from the image,
which is to be classified. The entire image can also be treated
as AOI.
2. The starting pixel of AOI is denoted as St (X, Y). If
the width of AOI is say W and height is H then define an
array list of size A[W, H] and initialize it as A[ i, j]=0 for
every i ≤ W, j ≤ H and i, j ≥ 0. 3. Now plot all the signature sites (training sites) on the
image and store its equivalent pixel‟s intensity values.
4. Find the mean of these intensity values. Say, I.
5. Define a stack- single dimension array list, Say S of
some size. Here stack size is defined to overcome the stack
overflow problem. For better performance, machines having
higher memory are recommended as it is required to limit the
stack size.
6. Define a threshold value T where the intensity
difference between the given pixel and „I' should not be
greater than T. Here, the threshold value should be given
according the clarity of the image to be classified. If the
image is clear i.e. every prominent object is visibly
distinguishable on image then we can reduce the value of T
and vice versa.
7. Now start with the pixel St (X, Y). The intensity of
St (X, Y) is say, ǐ.
If St (X, Y) is not labelled and If |I- ǐ| ≤ T then mark A [i, j]
as labelled. Find its 8-connected neighbouring pixels and
2
INSTITUTE OF TECHNOLOGY, NIRMA UNIVERSITY, AHMEDABAD – 382 481, 08-10 DECEMBER, 2011
push them in S if S is not full. If S is full then leave the
neighbouring pixels unlabelled.
If St (X, Y) is not labelled and |I- ǐ| > T then take the next
available pixel in AOI, say (X+1, Y) and repeat step 7.
8. If S is not empty then pop a pixel and find its
intensity. Continue the same process as described in step 7
until S is empty. If S is empty then go to the next available
pixel in AOI, say (X+1, Y) and repeat the same procedure
described in step 7.
9. Continue this procedure until the entire AOI is
covered.
Here, the efficiency of this algorithm depends upon factors
like Signature sites, clarity of the image to be classified, T,
size of S and the time difference between the ground truth
collection and satellite acquisition of the data.
IV. EXPERIMENT AND RESULTS
The algorithm is tested on LISS-3 imagery of an area
covering parts of surendranagar and Ahmedabad district in
Gujarat, India. The size of image is 1000 x 500. It is a
multispectral false colour composite (FCC) image consisting
of 3 bands which are NIR, RED and GREEN. The parameters
of the algorithm are as follows. T=10%, size of S=16 and
three signature sites for each cotton and paddy are taken
whose locations are given in table 1. Here, the average tone
of all the three pixels is taken as a seed pixel. So there is one
seed point for each class. Table I Location of seed pixels
Ground
Cover
Type
Latitude Longitude
Cotton 1 22o42‟19.34”N 70o53‟52.54”E
Cotton 2. 22o40‟26.21”N 70o58‟55.18”E
Cotton 3. 22o38‟15.13”N 70o55‟59.48”E
Paddy 1. 22o39‟44.86”N 71o4‟52.46”E
Paddy 2. 22o39‟19.89”N
71o1‟46.68”E
Paddy 3. 22o40‟4.36”N
70o57‟34.48”E
From the knowledge about the area we can say that there
are mainly 2 types of crops i.e. cotton and paddy grown in
this area. Fig. 3.a is the FCC image of the area. Here, RED
clusters show cotton crop while Maroon clusters display
paddy crop. The centre of circles marked as yellow is taken
as seed pixels. Fig. 3.b shows the seed point locations for
Paddy. First the algorithm is applied to classify only one class
i.e. cotton whose output is shown in fig. 4.a. In the image
green colour represents Cotton while all the other pixels
displayed in black represent unclassified area. Fig. 4.b
represents the output of parallel piped classification method
implemented on the same image using the same training site.
The output is compared with the ground truth data in table 2.
Fig. 3 (a) FCC LISS-3 Image of Surendranagar district in Gujarat, India
Fig. 3 (b) Paddy Seeds
Fig. 4 (a) Result of Enhanced SRG for single class
Fig. 4 (b) Result of parallel Pipe classification techniques
3
INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN TECHNOLOGY, „NUiCONE – 2011‟
Table II Comparison of Area of cotton crop
Classifica
tion
Method
Area
(sq
Km)
Actual
Area
(sq
Km)
Diffe
rence
(sq
Km)
%
Area
%
Differe
nce
Enhanced
Seeded
Reg
Growing
25.36 29.8
4.44
5.100
6711
4
14.8993
2886
Same algorithm is also used to classify multiple classes of
cotton and paddy. Fig. 5 shows the output in which Red
colour represents the cotton class while green represents the
paddy. The efficiency of the algorithm here is calculated
using the kappa co-efficient. Confusion matrix is prepared
and displayed in Table 3.The derived Kappa co-efficient
value is K=0.89. From table 2 and 3 we can say that the
efficiency of the algorithm varies between 85 to 90%
depending upon the number of signature sites per class.
Fig. 5 Enhanced SRG for 2 classes
Table III Confusion Matrix
Class Cotton Paddy
Cotton 46168 1537
Paddy 2562 28020
V. APPLICATION
The proposed method is used to identify a particular type
of feature from an image. Different types of signatures are
generated for different types of features in an image. The
algorithm requires few seed points; hence it is suitable to
serve interactive, real time and on-the-fly image classification
concepts. Once the image is classified, physical dimensions
like length, perimeter, area etc. can be derived. Same
signatures can also be used to classify other satellite images
having same temporal period and satellite path. The same
method can be replicated to extract multiple types of classes
like soil, water bodies, wasteland, land use etc.
VI. LIMITATION
The efficiency of this algorithm depends upon the
signature data. Hence it is required to collect the ground truth
data from at least two to three sites. Sufficient amount of seed
points improve the efficiency of the algorithm.
VII. CONCLUSION
The proposed algorithm requires less signature sites i.e.
training sites per class compared to Maximum Likelihood
(MXL) and Artificial Neural Network (ANN). The efficiency
is better than parallel piped classification technique and
comparable with MXL and ANN. It is faster than ANN.
VIII. ACKNOWLEDGEMENT
The authors would like to thank Mr. T .P. Singh, The Director, BISAG for his inspiration and motivation. The
authors would also like to thank to Mr. Apoorva Dalvadi,
BISAG for providing required resource datasets.
IX. REFERENCES
[1] Jianping Fan, Guihua Zeng , Mathurin Body , Mohand-Said
Hacid, “Seeded region growing: an extensive and comparative
study”, Elsevier B.V. , Pattern Recognition Letters 26 , 2005
[2] Rolf Adams and Leanne Bischo, “Seeded region growing”, IEEE
transactions on pattern analysis and machine intelligence, vol. 16,
no. 6, June 1994
[3] Sanghoon Lee, “Change detection in land-cover pattern using
region growing segmentation and fuzzy classification”, IEEE
[4] Robert M. Haralick, M.C. Zhang, and Roger W. Ehrich ,
“Dynamic programming approach for context classification using
the Markov random field”
[5] Bryant, J. 1979. “On the clustering of multi-dimensional pictorial
data”. Pattern recognition, vol. 11, pp. 115-125
[6] Deng, Y., Manjunath, B.S., 2001. “Unsupervised segmentation of
color–texture regions in image and video”. IEEE Trans. Pattern
Anal. Machine Intell. 23 (8), 800–810.
[7] Fan, J., Yau, D.K.Y., Elmagarmid, A.K., Aref, W.G., 2001a.
“Automatic image segmentation by integrating color-based
extraction and seeded region growing”. IEEE Trans. Image
Process. 10 (10), 1454–1466.
[8] Qiyao Yu and David A. Clausi, Senior Member, IEEE, "IRGS:
Image segmentation using edge penalties and region growing".
ieee transactions on pattern analysis and machine intelligence,
vol. 30, no. 12, December 2008
[9] A.K. Jain and R.C. Dubes, Algorithms for clustering data.
Prentice Hall, 1988.
[10] John A. Richards · Xiuping Jia, "Remote sensing digital image
analysis an introduction", 4th Edition.
[11] http://en.wikipedia.org/wiki/Region_growing
4
International Journal of Scientific & Engineering Research, Volume 3, Issue 9, September-2012 1 ISSN 2229-5518
Decision Support System for Industrial Estates using Geo Spatial Technology
1Krunal Patel,
1Manoj Pandya,
2Dr. N.N. Jani
Abstract – To meet the need of various administrative operations such as estate infrastructure planning and management, allotment and
regulation of the industrial plots, establishing new estates at various locations, Geo Spatial Technology is the key solution. Geo spatial
technology covers spatial dimension which facilitates to visualize outlook of industrial estates at dynamic geographical scale. The
integration of domain Knowledge with Geo-spatial datasets and technology leads to successful implementation of the system. System
should have capability to process and render enterprise level decisions and provide aid to plan, regulate and control land use.
Index Terms – Geo Database, Decision Support System, DSS, Oracle, Estate Management, GIS
1. INTRODUCTION
ndustrial estates are specific areas zoned for industrial
activity in which infrastructure such as roads, power, and
other utility services is provided to facilitate the growth of
industries and to minimize impacts on the environment [1].
The effective use of Geo ICT leads to better planning and
monitoring of geographical entities. To leverage the potential
of Geo ICT, a Web based geo-spatial industrial estate
management system is developed that allows Selection of any
estate of Gujarat State, Display of estate details like – plot
details, infrastructure facility, Utilities, Amenities, Allotment
status of plot, Property Account details, Overlay of revenue
maps and satellite image on the estate map, Customized
query and analysis functions for the Planning, Land and
Engineering divisions of GIDC, Linkage with the Oracle
database to provide up-to-date data, on the land transactions
and interactive decisions through report generation
mechanism.
2 RATIONALE OF THE MODEL
Industrial Estates have led to the development of large
urban regions especially in the States of Gujarat and
Maharashtra, wherein large-scale city/ town development has
taken place [2]. Effective development can be achieved by
implementing Geographical Information System (GIS). The
Maps improve decision-making capabilities. At user interface
level, the most essential for the stakeholders is that MIS and
GIS should be incorporated into a single application so that
the user has a single interface with which to interact. Spatial
dimension allows harnessing various analyses like visibility,
buffer, intersection, union etc. Effective decisions can be
carried out by leveraging the potential of GIS. The system
should be efficient, scalable and cost-effective.
3 DATABASE PLATFORMS
Database is a foundation for various kinds of research
activity and projects. In this paper a scenario of Gujarat
Industrial Development Corporation (GIDC) is considered
where department’s industrial estate database is stored on
Oracle database. There are several database platforms which
can be scaled for enterprise solutions. The geo database of
industrial estate is warehoused in Microsoft® SQL Server 2005
as the solution has been developed with Microsoft DOT NET
2005 which has better performance with native platform. The
GIS system used in this system is an indigenous and low cost.
Fig: 1 A synoptic view of industrial Estate System
I
————————————————
Manoj Pandya and Krunal Patel are currently working as a senior project scientist in Bhaskaracharya Institute for Space Applications and Geo-Informatics (BISAG), Gandhinagar, Gujarat, India PIN 382007 E-mail: [email protected], [email protected]
Dr. N.N. Jani is affiliated with S.K. patel Institute, Gandhinagar, Gujarat, India, E-mail: [email protected]
International Journal of Scientific & Engineering Research, Volume 3, Issue 9, September-2012 2 ISSN 2229-5518
4 METHODOLOGIES
Spatial database creation is a vital step where interactive
decisions are required like finding the associatively of
Industrial Plots, buffer analysis and hence propose site
suitability based on custom criteria. Spatial entities are
digitized on satellite imagery of cartosat data having 2.5 meter
resolution. Cadastral maps have also been co-registered with
satellite imagery.
Fig: 2 Workflow of the System
Spatial database:
To construct spatial data, cartosat satellite imagery having
resolution of 2.5 meters is used for creating base maps
including industrial plots. Departmental data is linked with
the spatial database to make a dynamic seamless information
matrix. Spatial dataset like Location of Industrial estates,
Layout of Industrial estates, Plot-wise details like Unsold
notes; Taxation; Maintenance etc., Layout of Utilities like
Industrial Zones, Water Supply System, Waste Water
Treatment System Plant, Solid Waste Management, Drainage,
Power Supply etc., Infrastructure facilities, Super imposing
Revenue map, Amenities etc. are derived.
5 DECISION SUPPORT SYSTEM (DSS)
DSS is knowledge based system that serves the
management, operations, and planning levels of an
organization and helps to make decisions, which may be
rapidly changing and not easily specified in advance [3].
Web Interface has authentication condition to allow specific
users to access the portal. The portal has in built query builder
to process user’s custom criteria. In this scenario, bridging
between spatial and aspatial information has lead to an
enterprise level solution by facilitating certain tools like Daily
land data creation, MIS report generation (Monthly,
Quarterly, Vacant Plot List, etc.), Auto notice generation,
Updation of regional office transactions to central database
server, Receipt and pay order generation, On-demand auto-
generation of Offer letter (Lease deed, Scrutiny report, etc.),
Selective dispatch of generated reports (eMail and/or hard
copy), Report Generation etc.
Satellite Imagery and geographical entities are created at
Bhaskaracharya Institute for Space Applications and Geo-
Informatics (BISAG).
Fig: 3 Digitized layout map of Viramgam GIDC estate overlaid on
satellite imagery
Web interface has facility to super impose multiple
amenities, selecting interactive criteria based plots etc. on
industrial plots as shown in below figure.
Fig 4: An example of Criteria for Outstanding amount >= 500000
By executing the criteria of Outstanding Amount >= ‘50000’,
plot fields which satisfy the criteria will be highlighted with
yellow color.
Fig 5: Super-imposing BORE and E.S.R. on Plot of Industrial Estates
International Journal of Scientific & Engineering Research, Volume 3, Issue 9, September-2012 3 ISSN 2229-5518
By super-imposing BORE location and E.S.R. location, it is
easy to find the source of water. Nearest source of water
supply using buffer analysis can be derived easily.
6 APPLICATIONS
The proposed methodology is capable to converge and
extract datasets from the given data warehouse. Real time
Decisions can be taken by implementing GIS engine which
considers spatial dimension. The system provides aid to plan
to regulate and control land use in the vicinity of the estate
providing buffer zones around the estates. Decision makers
can recommend land use controls around the estates for
controlling and minimizing adverse environmental impacts,
Recommending necessary effluent treatment and waste
disposal facilities and other needed abatement infrastructures
needed to be commonly used by all industries of the estate,
Monitoring of the various planning, allotment details of the
aspect etc.
7 LIMITATIONS
Huge data in the form of spatial layers is process oriented.
The proposed methodology can be optimized by introducing
distributed computing. Parallel Processing of tasks can lead to
improve the efficiency to produce spatial and aspatial output
from complex user defined queries.
8 CONCLUSIONS
The proposed solution is scalable to enterprise level.
Industrial Estates are crucial for financial growth of any
nation. Geo spatial technology can not only be useful for
better planning but also useful for better management of
assets in industries. Convergence of MIS and GIS technologies
can provide near real time information resulting in efficient
and effective decision support system (DSS) to help multiple
areas like Health, Defense, and Disaster etc.
ACKNOWLEDGEMENTS
The authors would like to thank to the Director, BISAG, T .P.
Singh and for his inspiration and motivation and Manager
S&A, GIDC, Mr. H V Bagtharia for providing valuable
technical support.
REFERENCES
[1] http://www1.ifc.org/wps/wcm/connect/72b3340048855c0c8ad4d
a6a6515bb18/industrialestates_PPAH.pdf?MOD=AJPERES&CA
CHEID=72b3340048855c0c8ad4da6a6515bb18.
[2] http://cg.gov.in/opportunities/Industrial%20Estates.pdf
[3] http://en.wikipedia.org/wiki/Decision_support_system
[4] Hettige M, Martin P, Singh M, Wheeler D (1995). The
Industrial Pollution Projection System, Policy Research
Department, Policy Research Working Paper 1431. World Bank.
[5] Smith VK, Andrès JE (1996). Environmental and Trade Policies:
Some Methodological Lessons. Environ. Develop Econ., 1(1):
19-40.
[6] Townroe P.M. (1972) Some behavioural considerations in the
industrial location decision Reional. Studies 6, 261-272
[7] Smith, D.M. (1976) Industrial Location: An Economic
Geographical Analysis. New York
[8] Visser, E.J. (2009), The complementary dynamic effects of
clusters and networks for Industry Innovation
[9] Weber, A. (1999), Theory of the Location of Industries, Chicago,
IL, University of Chicago Press.
[10] World Commission on Environment and Development 1987.
Our Common Future. Oxford: Oxford University Press.
[11] United Nations Environment Program 198 1. Guidelines for
Assessing Industrial Impact and Environmental Criteria for the
siting of Industry.
[12] Mohan. M.. Singh. M.P. and T.S. Panwar 1993. Industrial
Growth and Management in Urban Areas. In. Heptulla. N.
(ed.) Environmental Protection in Developing Countries. New
Delhi: Oxford & IBH Publishing Co. PVT. Ltd. pp 29-34.
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN ENGINEERING, TECHNOLOGY AND SCIENCES (IJ-CA-ETS)
ISSN: 0974-3588 | JULY’11- DEC’11 | Volume: 4 Issue 2 |
SUPERIMPOSITION OF ADAPTED GEOGRAPHICAL DATASETS ON GOOGLE MAP
1 MANOJ PANDYA, 2 JAY DAVE, 3 NIRAV SUTHAR, 4 PARTH BHAGAT, 5 DR. BIJENDRA AGRAWAL
1, 2, 3, 4 Bhaskaracharya Institute for Space Applications and Geo-Informatics, Gandhinagar, Gujarat 382007, India
[email protected] 5 VJKM Institute, Vadu, Kadi, Mahesana, Gujarat, India
ABSTRACT: The map navigation has become an essential means nowadays. In the world of web based GIS Google has started these services since February 2005. Google Maps APIs let the user put Google Maps in the web page. One has to undergo problems in certain scenario when the data is in ESRI© shapefile format and having different map projection. As far as vector data are concerned, industry has adopted the primitive file format that is provided by ESRI© since 1994. Google Maps APIs takes XML file as input which is then deployed on Google Maps with the help of API classes. So, the user has to first convert the shape file into XML file. To resolve certain problems, software has been developed. The application takes ESRI standardized shape file as an input (i.e. Combined file with extension *.shp, *.shx & *.dbf) file & gives the output as a well defined XML file. This XML file then can be placed in the web application & selecting that file from the web application the resulting XML file will be plotted onto Google Maps window with the help of Google Maps APIs. Keywords: Google API, Map Projection, Superimposition vector data, geo-referenced data overlay, GIS. 1. INTRODUCTION Google maps outfit geo-referenced satellite imagery and several vector data layers like road, river, city etc. In certain scenario, it is essential to superimpose user defined data layers like administrative boundary; cadastral layers etc. over Google maps. Different organizations stockpile data in different map projections and datum. Each map projection has certain parameters like central meridian, false easting, false northing, scale factor etc. If the vector geometry vertices points are superimposed without run time transformation, the vector may be superimposed with shifting in the edges. On the fly map projection transformation is required to convert current map projection parameters to the equivalent Latitude and Longitude 2. SYSTEM ARCHITECTURE In general practice, Google serves Map to the client browser using Asynchronous JavaScript And XML (AJAX). Custom information in the form of XML or KML can be used to pin point user defined location on Google Map. XML and KML consists well defined file format. Their specification is an open standard. The enterprise architecture uses database servers like Oracle©, SQL© Server etc. The spatial data can be stored as binary object or spatial geometry in the enterprise Database server.
Figure 1. Map projection Architecture
3. MAP PROJECTION: A map projection is any method of representing the surface of a sphere or other three-dimensional body on a plane. Map projections can be constructed to preserve one or more properties like area, shape, distance etc. though not all of them simultaneously. At the same time, geographical entities can be projected in a single map projection. To overlay different projection geographical entities, map projections and datum should be same. To overcome this problem, ‘on-the fly’ map-projection can be calculated and then overlaid on the reference map layer. 3.1. LOCATION BASED SERVICES (LBS) LBS include services to identify a location of a person or object. LBS utilize Google Maps as the Google API renders services in customized and
Page 133
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN ENGINEERING, TECHNOLOGY AND SCIENCES (IJ-CA-ETS)
ISSN: 0974-3588 | JULY’11- DEC’11 | Volume: 4 Issue 2 |
simplified fashion. Hence Google Map services are accepted in industries.
Organizations and institutions are accustomed with conventional shape file format. Every organizations and institutions store geographic information in their own map projection. The issue comes when the geometry is superimposed on Google maps as the co-ordinate reference System (CRS) is geographical where the location is measure in latitude and longitude rather than the measurements in easting and northing for local projection system. If the geographical entities overlaid without applying on-the-fly map projection, shifting in map extent is raised that creates trouble while super-imposition process. The proposed solution is to convert map projection using mathematical model. 4. MAP PROJECTION MATH-MODEL Mathematical formulae can be derived to transform one type of map projection system in to the other one For Google maps, geographic map projection is used. The mathematical model converts Transverse Mercator projection coordinates ( N , E ) to their geographic equivalents. Projection Symbols used in the Math-Model:
Map Projections can be modeled in the form of mathematical formulae and equations. Universal Transverse Mercator (UTM) or Transvers Mercator (TM) projections are generally used in organizations. The conversion equations are divided into two parts I. Projection Constats II. Transverse Mercator (TM) to Geographic
I. Projection Constants: Several additional parameters need to be computed before transformations can be undertaken. It is a function of various parameters. These parameters are constant for a projection.
mo is obtained by evaluating m using ∅0 The conversion of Transverse Mercator projection
coordinates (N, E) to geographic coordinates ( , ) is achieved in several steps. First determine N’, m’, n, G, σ , ∅’ II. Transverse Mercator (TM) to Geographic
It is required to determine σ’,ν’, ψ’, t’, E’ and using following formulae:
Page 134
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN ENGINEERING, TECHNOLOGY AND SCIENCES (IJ-CA-ETS)
ISSN: 0974-3588 | JULY’11- DEC’11 | Volume: 4 Issue 2 |
The latitude of the computation point is then computed using:
The longitude of the computation point is determined using:
Hence the value of latitude & longitude is derived. 5. XML AS TRANSITIONAL STORAGE Extensible Markup Language (XML) can be used to store vertices of geographical datasets as intermediate storage. This is required as the quantum of vertices is
very large. The sample XML file which contains tag for marker, polyline and polygon. This XML file can be overlaid on Google Map. Hence XML can store geometry & attributes that can be parsed to display geometry on Google map. <markers> <marker lat="43.65654" lng="-79.90138" label="Marker One"> <infowindow><![CDATA[ Some stuff to display in the<br>First Info Window ]]></infowindow> </marker> <marker lat="23.65654" lng="72.90138" label="Marker two"> <infowindow><![CDATA[ Some stuff to display in the<br>Second Info Window ]]></infowindow> </marker> <marker lat="24.65654" lng="74.90138" label="Marker Third"> <infowindow><![CDATA[ Some stuff to display in the<br>Third Info Window ]]></infowindow> </marker> <line colour="#FF0000" width="4" html="You clicked the red polyline"> <point lat="43.65654" lng="-79.90138" /> <point lat="43.91892" lng="-78.89231" /> <point lat="43.82589" lng="-79.10040" /> </line> <line colour="#10aa00" width="8" html="You clicked the green polyline"> <point lat="43.9078" lng="-79.0264" /> <point lat="44.1037" lng="-79.6294" /> <point lat="43.5908" lng="-79.2567" /> <point lat="44.2248" lng="-79.2567" /> <point lat="43.7119" lng="-79.6294" /> <point lat="43.9078" lng="-79.0264" /> </line> <line colour="#00dfff" width="2" html="You clicked the yellow polygon"> <point lat="23.2478" lng="73.0264" /> <point lat="24.1027" lng="72.6294" /> <point lat="23.2908" lng="75.2567" /> <point lat="24.2448" lng="70.2567" /> <point lat="23.4119" lng="62.6294" /> <point lat="23.9048" lng="71.0264" /> </line> </markers> 6. APPLICATION Super-imposition of geographical datasets in custom projections can become easier to overlay on Google maps. Without any physical transformations, the geographical layers can be overlaid and thematic
Page 135
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN ENGINEERING, TECHNOLOGY AND SCIENCES (IJ-CA-ETS)
ISSN: 0974-3588 | JULY’11- DEC’11 | Volume: 4 Issue 2 |
maps can be generated. Following figure shows that organization’s data in red border color can be overlaid on Google map.
Figure 2. Overlaid geographical datsets on Google
Map (using XML datasets) 7. CONCLUSION The mathematical model provides leverage to integrate geographical datasets on a same platform from various map projections. The mathematical model can be evolved for numerous kinds of co-ordinate reference systems to carry out thematic maps for various geo-spatial analyses. 8. REFERENCES
[1] http://mathworld.wolfram.com/MercatorProjection.html [2] http://en.wikipedia.org/wiki/Mercator_projection [3] Tobler, W. R. (1962). “A classification of map projections”, Annals of the Association of American Geographers, Vol. 52, pp. 167-175. [4] http://www.gpsy.com/gpsinfo/geotoutm/ [5] Pickles, J., ed. (1995) Ground Truth: The social Implications of Geographic Information Systems, New York: Guilford Press. [6] Berthon, S., and Robinson, A. (1991) The Shape of the World: The Mapping and Discovery of the Earth. Chicago: Rand McNally. [7] http://mysite.du.edu/~jcalvert/math/mercator.htm [8] http://surveying.wb.psu.edu/sur162/UTM/UTM.htm [9] http://gisremote.blogspot.com/2008/02/about-map-projection-what-is-map.html [10] C. P. Lo & Albert K.W. Yeung (2004) “Concepts and Techniques of Geographic Information Systems” [11] David Salomon (2006) “Transformations and Projections in Computer Graphics” [12] EARL F. BURKHOLDER (2008) “THE 3-D GLOBAL SPATIAL DATA MODEL- Foundation of the Spatial Data Infrastructure”
Page 136
International Journal of Computer Applications (0975 – 8887)
Volume 52– No.19, August 2012
28
Distributed Commuting Augmented Shortest Path Finding for Geo Spatial Datasets
ABSTRACT Geo Spatial information is a large collection of datasets referring
to the real world entities. Geo Spatial information has evolved in
the last decade which led to produce a vast platform in
Government Administration, Scientific Analysis and other
various sectors especially in disaster management (DM), site
suitability of Check-dams for Irrigation Department etc. It is
required to obtain imperative geographically analyzed solutions
like finding shortest path between sources (i.e. location having
stock of food packets, clinical remedial and therapeutic kits, etc.)
to the destination (i.e. location where disaster emerges).
Departmental data (i.e. village Maps) covers detailed spatial and
attribute information compared to the readily available sources.
Hence custom solutions based on Information Technology are
required to be constructed, processed efficiently and quickly to
outfit seamless performance that facilitates in mission critical
incidences.
Keywords Geo Database, Distributed Computing, Dijkstra, Shortest Path,
GIS, Hadoop, Shapefile
1. INTRODUCTION
Accidents are common disasters that cause much damage to
people’s lives. Due to population growth and the accompanying
development of extensive infrastructures has greatly increased
people's financial condition. This has lead to use large amount of
vehicles over recent years. In case of accidents, it is a critical
situation to get information about the location of the incidence, to
reach at that place and supply necessary therapeutic kit and
clinical remedies. Crucial need is to find out shortest path
between the location of incidence and the nearest location
providing clinical remedies or hospitalization. Available sources
do not cover complete geo spatial details of road network
especially in rural area. Even though detailed information is
available with the department, it is difficult to utilize it in the
time of disaster due to poor processing performance as the server
becomes loaded with handling multiple concurrent requests.
Vital issue is to make a timely and effective decision.
2. DISTRIBUTED DATABASES
A distributed database is a database in which storage devices
are not all attached to a common processing unit such as the
C.P.U. It may be stored in multiple computers located in the
same physical location, or may be dispersed over a network of
interconnected computers. A database may consist two or more
data files located at different sites on a computer network.
Different users can access it without interfering with one another.
However, the DBMS periodically synchronize the scattered
databases to make sure that they all have consistent data.
3. GIS NETWORK MODEL
In GIS systems, networks are modeled as points (For e.g:
street intersections, switches, water valves) and lines (For e.g:
Streets, transmission lines, pipes). Network topological
relationships define how lines connect with each other at nodes.
For the purpose of network analysis it is also useful to define
rules about how flows can move through a network.
A network is defined as a graph G = (N, A) consisting of a set
N of nodes and a set A of arcs with associated numerical values,
such as the number of nodes, n=|N|, the number of arcs, m=|A|,
and the length of an arc connecting nodes i and j, denoted as l (i,
j).
Figure 1: A typical Road Network
As shown in the above figure 1, each vertex is marked with
black color. The segment length or weight is marked with red
color near the center of the segment.
A road network as shown in above figure 1 can be interpreted in
the form of attributes that is compatible to dijkstra as shown in
the below table.
Table 1: Input tabular information for dijkstra Fnode_ Tnode_ Length
1 2 4
2 1 4
2 3 4
3 2 4
2 5 5
5 2 5
5 4 8
4 5 8
4 6 6
6 4 6
3 4 8
4 3 8
Manoj Pandya BISAG
Gandhinagar, Gujarat India
Abdul Zummarwala BISAG
Gandhinagar, Gujarat India
Prashant Chauhan BISAG
Gandhinagar, Gujarat India
International Journal of Computer Applications (0975 – 8887)
Volume 52– No.19, August 2012
29
It is required to pre-process shortest path from one node to other
node with all possible permutation and combinations. The graph
is non-directional. Hence both combinations of Fnode_ and
Tnode_ are to be considered as shown in the following table.
Table 2: Dijkstra processed path for each Fnode_ and
Tnode_
Fnode_ Tnode_ path
1 2 1,2
2 1 2,1
1 9 1,2,3,9
9 1 9,3,2,1
1 6 1,2,5,4,6
6 1 6,4,5,2,1
1 4 1,2,5,4
4 1 4,5,2,1
6 9 6,7,8,9
9 6 9,8,7,6
5 8 5,2,3,9,8
8 5 8,9,3,2,5
4. SHORTEST PATH
In graph theory, the shortest path problem is the problem of
finding a path between two vertices (or nodes) in a graph such
that the sum of the weights of its constituent edges is minimized.
An example is finding the quickest way to get from one location
to another on a road map; in this case, the vertices represent
locations and the edges represent segments of road and are
weighted by the time needed to travel that segment.
For undirected graphs, the shortest path problem can be
formally defined as follows. Given a weighted graph (that is, a
set V of vertices, a set E of edges, and a real-valued weight
function f : E → R), and elements v and v' of V, find a path P (a
sequence of edges) from v to a v' of V so that
is minimal among all paths connecting v to v'.
Shortest paths from one (source) node to all other nodes on a
network are normally referred as one-to-all shortest paths.
Shortest paths from one source node to a subset of the nodes on a
network can be defined as one-to-some shortest paths. Shortest
paths from every node to every other node on a network are
normally called all-to-all shortest paths. When the goal is to
obtain a one-to-one shortest path or one-to-some shortest paths,
the Dijkstra algorithm offers some advantages because it can be
terminated as soon as the shortest path distance to the destination
node is obtained. However, there are other algorithms that can
also be used for finding shortest path in various scenarios.
Dijkstra's algorithm: solves the single-source shortest path
problems.
Bellman–Ford algorithm: solves the single-source problem
if edge weights may be negative.
A* search algorithm: solves for single pair shortest path
using heuristics to try to speed up the search.
Floyd–Warshall algorithm: solves all pairs shortest paths.
Johnson's algorithm: solves all pairs shortest paths, and may
be faster than Floyd–Warshall on sparse graphs.
Dijkstra’s algorithm to find out a shortest path between source
and destination is commonly practiced and easy to implement.
Dijkstra’s algorithm is used in this paper to demonstrate a
prototype model as a code snippet.
Code Snippet:
-- Run the algorithm until we decide that we are finished
WHILE 1 = 1
BEGIN
-- Reset the variable, so we can detect getting no records in the
next step.
SELECT @FromNode = NULL
-- Select the Id and current estimate for a node not done, with
the lowest estimate.
SELECT TOP 1 @FromNode = Id, @CurrentEstimate =
Estimate
FROM #Nodes WHERE Done = 0 AND Estimate <
9999999.999
ORDER BY Estimate
UPDATE #Nodes SET Done = 1 WHERE Id =
@FromNode
-- Update the estimates to all neighbour node of this one (all the
nodes
--there are edges to from this node). Only update the estimate if
the new
-- proposal (to go via the current node) is better (lower).
UPDATE #Nodes
SET Estimate = @CurrentEstimate +
e.Weight, Predecessor = @FromNode
FROM #Nodes n INNER JOIN dbo.Edge e ON n.Id =
e.ToNode
WHERE Done = 0 AND e.FromNode = @FromNode AND
(@CurrentEstimate + e.Weight) < n.Estimate
END;
The code can be compiled as a dynamic link library (dll) to
integrate in a program to solve shortest path for given datasets.
The executed algorithm stores each vertex as a path in a defined
format.
5. DATA MODEL
Spatial dataset can be stored either in normalized or binary
format. Enterprise geo database like Microsoft SQL Server,
Oracle etc. can be used to process large amount of data.
The Dijsktra can’t be directly implemented in GIS datasets.
The Spatial data format is not known to Dijkstra as it can
understand the data in the form of FROM_NODE, TO_NODE
and WEIGHT or LENGTH of the segment. It is required to
calculate each node and edge (segment) length in GIS dataset and
store in the tabular data table.
At higher hierarchical level, there is a great density of roads
covering national Highway, State Highway, Village Road, ODR,
MDR etc. Dijsktra performs all possible permutations and
combinations for finding the shortest path.
For each request, dijkstra algorithm has to process multiple
node and edges that can affect the performance of the system
when considering the concurrent access of multiple users.
6. PRE-PROCESSING NETWORK
ROUTES
The task of finding shortest path for large network is a time
consuming process. It is memory and process intensive. In web
based interface, multiple concurrent users accessing various
combinations of nodes require high end servers. In case of
enterprise environment, single server can’t serve concurrent
Eq. 1
International Journal of Computer Applications (0975 – 8887)
Volume 52– No.19, August 2012
30
requests at the optimal level. To optimize server, there are
several optimization methods available.
Divide and Conquer: In this technique, the task is divided
into several fragments. These fragments are processed in
different process units and after the completion of task, they are
integrated. The information can be either in database or in file
format.
Database is fragmented into several parts by taking ratio of
number of records to the number of processors in a machine. A
file is divided by automated tool (Apache Hadoop) that handles
the consistency and integrity of information.[6]
7. VIRTUALIZATION
It is the creation of a virtual (rather than actual) version of a
hardware platform, operating system (OS), storage device, or
network resources. Virtualization reduces system integration and
administration costs by maintaining a common software baseline
across multiple computers in an organization. A virtual machine
is subjectively a complete machine (or very close), but
objectively merely a set of files and running programs on an
actual, physical machine. [10]
Figure 2: A diagram showing the layout for virtualization.
8. TASK DISTRIBUTION USING HADOOP The Apache Hadoop is an open source software framework
that supports data-intensive distributed applications. [5] It
enables applications to work with thousands of computational
independent computers and petabytes of data. Hadoop works on
file based architecture. Hadoop provides a distributed file system
(HDFS) that stores data on the compute nodes, providing very
high aggregate bandwidth across the cluster. Both Map/Reduce
and the distributed file system are designed so that node failures
are automatically handled by the framework.
Map/Reduce is a programming paradigm that was made
popular by Google where in a task is divided in to small portions
and distributed to a large number of nodes for processing (map),
and the results are then summarized in to the final answer
(reduce). [1] Google and Yahoo use this for their search engine
technology. By the execution of shortest path algorithm using
hadoop map/ reduce processing mechanism, the hardware
resources (CPU) are fully utilized. That is capable to handle large
size of files with HDFS. Redundancy and replication of files are
maintained by Hadoop automatically as soon as the task
executes. [6]
Figure 3: A synoptic view of Hadoop processing mechanism
9. EXPERIMENTATION RESULTS
The edges cross the count of more than 1 lac records for
complete dataset. The dataset corresponds to the road network of
Gujarat state, India provided by BISAG. Shortest Path algorithm
is executed with HDFS.
Figure 4: Snapshot of system showing hadoop map-reduce
functionalities.
The output is obtained using Distributed processing over Hadoop
framework.
Table 3: length, Fnode_, Tnode_ as input and Length,
Path as output
Sr FNODE_ TNODE_
Length
(m) Path
1 21626 1 753286.4
21626,21640,21930,
...,18,22,3,2,1
2 21626 135 736610.7
21626,21640,21930,
...,281,232,135
3 21626 205 727586.5
21626,21640,21930,
...,198,206,205
Table 3 represents the processed nodes and their respective
length and shortest path for possible permutation and
combinations.
GIS map is the graphical representation of entities and objects.
The derived output as shown in the figure 5 is linked with GIS
dataset. The red colored segments correspond to the shortest path
between two node points as show in the below figure.
International Journal of Computer Applications (0975 – 8887)
Volume 52– No.19, August 2012
31
Figure 5: The map highlighted with red colour represents the
shortest path between given two points. [3]
Figure: 6 Nodes vs. Time Chart
It is inferred from the results that the division of tasks to multiple
nodes decreases the computation time upto a certain extent as the
division into tasks and collection of results leads to overhead
upon the hadoop framework. Both the type of applications
declined to yield better results upon further division and increase
in number of nodes.[13]
Table: 4 Time taken to find shortest path by single &
multiple threads.
Number of
Nodes (Single
Threaded)
Time (Single
Threaded)
Number
of Nodes
(Multithre
aded)
Time
(Multithre
aded)
1 73 1 54
2 64 2 47
3 53 3 43
4 44 4 35
5 36 5 30
6 24 6 25
7 16 7 22
8 14 8 11
9 12 9 10
10 10 10 11
10.APPLICATIONS
Distributed Computing using Hadoop can be used in
various areas like rendering 3D imagery, computing the
energy in a system in a molecular model, data mining and
in scientific analysis.
11.LIMITATIONS
Distributed Computing using Hadoop is suitable for
large sized files especially for scientific data analysis but
for small sized files the performance is negligible as
compared to the time required for the setup of distributed
environment.
12.CONCLUSIONS
Applications where Mission critical solutions are
required with real time immediate response, Distributed
Computing with Apache Hadoop is suitable platform.
Hadoop can also be used with personal computers and low
end servers.
13.ACKNOWLEDGEMENTS
The authors would like to thank to the Director, BISAG, T
.P. Singh for his inspiration and motivation.
REFERENCES
[1] Lam, Chuck (July 28, 2010). Hadoop in Action (1st Ed.).
Manning Publications. ISBN 1-935182-19-6.
[2] J. Dean and S. Ghemawat. MapReduce: Simplified Data
Processing on Large Clusters. OSDI ’04, pages 137–150.
[3] Courtesy form BISAG
[4] Z. Mahmood and R. Hill (eds.), Cloud Computing for
Enterprise Architectures, Computer Communications and
Networks, DOI 10.1007/978-1-4471-2236-4_2, © Springer-
Verlag London Limited 2011
[5] VerticaInputFormat. http://www.vertica.com/mapreduce
[6] Tom White. Hadoop:The Definitive Guide[M]. United
States of America: O’Reilly Media, Inc. 2009.
[7] Jeffrey Dean, Sanjay Ghemawat. MapReduce:Simplied data
processing on large clusters[C]. Proceedings of the 6th
Symposium on Operating System Design and
Implementation. New York: ACM Press. 2004:137-150.
[8] Raghavendra, C. S., Kumar, V. K. P. and Hariri S.,
“Reliability analysis in Distributed systems”, IEEE
Transactions on Computers, Volume 37, Issue 3, Pg. 352 –
358, March 1988.
[9] Kin-Sun-Wah and McAlister D.F, Reliability optimization
of computer communication network, IEEE Trans. On
Reliability, Vol.37, No. 2, Pp.275- 287 (1998) Dccember.
[10] GPFS in the Cloud: Storage Virtualization with NPIV on
IBM System p and IBM System Storage DS5300.
[11] Bo Peng, B. C. (2009). Implementation Issues of A Cloud
Computing Platform. IEEE , 8.
[12] R. Agrawal and J.C. Shafer , "Parallel Mining of
Association Rules," Distributed Systems Online March
2004.
[13] D.W. Cheung , et al., "Efficient Mining of Association
Rules in Distributed Databases,"IEEE Trans. Knowledge
and Data Eng., vol. 8, no. 6, 1996,pp.911-922.
International Journal Of Scientific & Engineering Research, Volume 3, Issue 3, March‐2012 1 ISSN 2229‐5518
Geo-processing using Oracle Spatial Geo Database
1Manoj Pandya, 1Pooja Nair, 1Parthi Gandhi, 1Shubhada Pareek, 2Dr. Bijendra Agarwal
Abstract - In the last two decades, Geographic Information Systems (GIS) has been widely developed and is applied to various fields, which include resources management, environment management, prevention of disaster, planning area, education and national defense etc in many sections and subjects. However traditional GIS application systems can’t interact with each other and was considered as an isolated island of information facing problem of interoperability. This is because of different data models adopted by different GIS applications. The spatial data interoperability concept brought forward a huge amount of geospatial information for efficacious management, sharing and increase in value usage. OGC standards are developed in a unique consensus process supported by the OGC's industry, government and academic members to enable geo-processing technologies to interoperate, or "plug and play".
Index Terms – Geo Database, Geo processing, Oracle Spatial, OGC, 3D Rendering, Extrude, GIS
1. INTRODUCTION ithin an enterprise, various software packages are used and in many cases they can’t talk with each other. This
situation is ubiquitous and also exists between different departments. An Open GIS Consortium (OGC) is an international industry consortium of more than 400 companies, government agencies and universities participating in a consensus process to develop publicly available interface standards. Within a local government or enterprise, the spatial data is centrally stored. Interoperable metadata enables organizations to use the right tool for the job while eliminating complicated data transfers and multiple copies of the same data throughout the enterprise or department. The application server is a mediation system, this model uses oracle application server as the mediation system, and through the application server the application client sends WMS or WFS request and get the map server for background application. The three‐tier structure model exposes a GIS portal which is an online GIS for outside of the department. Any client can request the server if it accords with WMS or WFS specification.
2 DATABASE PLATFORMS There are several database platforms which provide the
spatial support. Oracle® is matured to support OGC standards. However there are other database platforms
available like Microsoft® SQL Server 2008, IBM DB2 etc. Open Source database platforms are also available in the realm of proprietary products. MySQL and PostGres with PostGIS extension can server large scale Geo‐Database.
3 ORACLE SPATIAL DATA MODEL Oracle® offers a spatial data model that provides basic
enterprise access to geospatial information. This spatial data model provides a standard structure for point, line, and area features. Oracle Spatial 10g XE (Express Edition) is used in this prototype model. This database platform is free to use upto 4 GB of data. However enterprise version has no limit. Basic DDL (CREATE, ALTER, DROP) and DML (INSERT, UPDATE, DELETE) statements can be used for Database Management. With Spatial, the geometric description of a spatial object is stored in a single row, in a single column of object type SDO_GEOMETRY in a user‐defined table.
Fig: 1 Schema for feature tables using pre-defined data types
The GEOMETRY_COLUMNS table describes the available feature tables and their Geometry properties.
W
———————————————— • Manoj Pandya is currently working as a senior project scientist in
Bhaskaracharya Institute for Space Applications and Geo-Informatics (BISAG), Gandhinagar, Gujarat, India PIN 382007 E-mail: [email protected]
• Pooja Nair, Parthi Gandhi, Shubhada Pareek are pursuing project at BISAG
• Dr. Bijendra Agarwal is Head Of Department (HOD), VJMS College, VADU, Kadi, Mahesana, Gujarat, India
International Journal Of Scientific & Engineering Research, Volume 3, Issue 3, March‐2012 2 ISSN 2229‐5518
GEOMETRY_COLUMNS table provides information on the feature table, spatial reference, geometry type, and co‐ordinate dimension for each Geometry column in the database. CREATE TABLE GEOMETRY_COLUMNS ( F_TABLE_CATALOG CHARACTER VARYING NOT NULL, F_TABLE_SCHEMA CHARACTER VARYING NOT NULL, F_TABLE_NAME CHARACTER VARYING NOT NULL, F_GEOMETRY_COLUMN CHARACTER VARYING NOT NULL, ……………….. ……………….. ……………….. GEOMETRY_TYPE INTEGER, COORD_DIMENSION INTEGER, MAX_PPR INTEGER, SRID INTEGER NOT NULL REFERENCES SPATIAL_REF_SYS, CONSTRAINT GC_PK PRIMARY KEY (F_TABLE_CATALOG, F_TABLE_SCHEMA, F_TABLE_NAME, F_GEOMETRY_COLUMN) ) Example 1: Creation of Geometry Table The SPATIAL_REF_SYS table describes the coordinate system and transformations for Geometry. CREATE TABLE SPATIAL_REF_SYS ( SRID INTEGER NOT NULL PRIMARY KEY, AUTH_NAME CHARACTER VARYING, AUTH_SRID INTEGER, SRTEXT CHARACTER VARYING(2048) ) Example 2: Creation of Spatial Reference System The FEATURE TABLE stores a collection of features.
Metadata is required to store ancillary information about Geo database. Metadata is data about data. It allows common information to be stored at a common level. The attributes like owner name, version number, description of layer, remarks, annotation etc are stored in metadata. CREATE TABLE ANNOTATION_TEXT_METADATA AS { F_TABLE_CATALOG AS CHARACTER VARYING NOT NULL, ………………… …………………
………………… A_TEXT_DEFAULT_MAP_BASE_SCALE AS CHARACTER VARYING, A_TEXT_DEFAULT_ATTRIBUTES AS CHARACTER VARYING} Example 3: Creation of Metadata The geometry metadata describing the dimensions, lower and upper bounds and tolerance in each dimension is stored in a global table owned by MDSYS (which users should never directly update). Each Spatial user has the following views available in the schema associated with that user: USER_SDO_GEOM_METADATA contains metadata information for all spatial tables owned by the user (schema). This is the only view that you can update, and it is the one in which Spatial users must insert metadata related to spatial tables. ALL_SDO_GEOM_METADATA contains meta‐data information for all spatial tables on which the user has SELECT permission. Spatial users are responsible for populating these views. For each spatial column, you must insert an appropriate row into the USER_SDO_GEOM_METADATA view. Metadata prototype is given below. INSERT INTO spatial_ref_sys VALUES (101, ʹPOSCʹ, 32214, ʹPROJCS[ʺUTM_ZONE_14Nʺ, GEOGCS[ʺWorld Geodetic System 72ʺ, DATUM[ʺWGS_72ʺ, ELLIPSOID[ʺNWL_10Dʺ, 6378135, 298.26]], PRIMEM[ʺGreenwichʺ, 0], UNIT[ʺMeterʺ,1.0]], PROJECTION[ʺTransverse_Mercatorʺ], PARAMETER[ʺFalse_Eastingʺ, 500000.0], PARAMETER[ʺFalse_Northingʺ, 0.0], PARAMETER[ʺCentral_Meridianʺ, ‐99.0], PARAMETER[ʺScale_Factorʺ, 0.9996], PARAMETER[ʺLatitude_of_originʺ, 0.0], UNIT[ʺMeterʺ, 1.0]]ʹ); Example 4: Inserting Map Projection parameters
4 RATIONALE OF THE MODEL
This model is scalable to obtain certain objectives. The model implements centralization enterprise spatial data. If database is centralized, it will be more convenient for data manager to update, maintain and distribute. The model enables organizations to use the right tool for the job while eliminating complicated data transfers and multiple copies of the same data throughout the enterprise. The model also helps to implement data sharing between enterprises, especially as data sources for local emergency department. Because emergency department can access and maintain the all data they need. After all, data redundancy can be reduced.
International Journal Of Scientific & Engineering Research, Volume 3, Issue 3, March‐2012 3 ISSN 2229‐5518
Fig: 2 Data translation component model
The oracle spatial data server can be established at centralized server. Different users can access Geo Spatial information through web browser. The application server is the main tier in internet model, which includes GIS application server, GIS application manager server and application connector. The application in this scenario is developed in Java using NetBeans 6.8 Interactive Development Environment (IDE).
5 GEO PROCESSING Geo‐processing is a GIS operation used to manipulate
spatial data. A typical geo‐processing operation takes an input dataset, performs an operation on that dataset, and returns the result of the operation as an output dataset. Common geo‐processing operations include geographic feature overlay, feature selection and analysis, topology processing, raster processing, and data conversion. Geo‐processing allows for definition, management, and analysis of information used to form decisions. Gujarat state from India is used in this example. Proposed System is developed in JAVA platform. JAVA
SERVER PAGE (JSP) serves the solution in web interface. The Code snippet is given as below for Buffer Analysis using Geo Database. CODE SNIPPET: // ASSIGN VARIABLES int sDragx=Integer.parseInt(startDragx); int sDragy=Integer.parseInt(startDragy); int eDragx=Integer.parseInt(endDragx); int eDragy=Integer.parseInt(endDragy); … BufferedImage buffimage1=null; JGeometry gmtry1,gmtry2; STRUCT dbobj1,dbobj2; Graphics2D g11,g22; Shape sing_sh=null,sh_rect=null;
ResultSet rs1=null,rsbuff=null; Statement st1=null; Connection con=newconn(); //INITIALZING MAP st1=con.createStatement(rs1.TYPE_SCROLL_SENSITIVE,rs1.CONCUR_UPDATABLE);
double boundingbox[]=boundingBox(ʺTABLE_GUJDISTRICTʺ);
buffimage1 = new BufferedImage(800,600,BufferedImage.TYPE_INT_RGB);
g11 = (Graphics2D)buffimage1.getGraphics(); g22 = (Graphics2D)g11;
g22.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON);
g22.setPaint(new Color(48,0,0)); g22.fillRect(0,0,800,600); //FIND BOUND BOX VALUES OF A LAYER double
boundingbox[]=boundingBox(ʺTABLE_GUJDISTRICTʺ); //CALCULATE SCALE FACTOR sx = clientwidth / mapw; sy = clientheight / maph; // EXECUTE BUFFER STATEMENT rsbuff= st1.executeQuery(ʺselect
sdo_geom.sdo_buffer(mdys.sdo_geometry(2003,null,null,mdsys.sdo_elem_info_array(1,1003,3), mdsys.sdo_ordinate_array(ʺ + Math.min(sDragx, eDragx) + ʺ,ʺ + Math.min(sDragy, eDragy) + ʺ,ʺ + (Math.min(sDragx, eDragx)+Math.abs(sDragx ‐ eDragx)) + ʺ,ʺ + (Math.min(sDragy, eDragy) +
Math.abs(sDragy ‐ eDragy)) + ʺ)),ʺ + dist + ʺ,0.005) from ʺ + currenttable);
//DRAW GEOMETRY ON MAP dbobj2 = (STRUCT)rsbuff.getObject(1); gmtry2 = JGeometry.load(dbobj2); sing_sh=gmtry2.createShape(); g11.setColor(Color.blue); g11.setStroke(new BasicStroke(1)); g11.draw(sing_sh); dbobj1 = (STRUCT) rs1.getObject(ʺGEOMʺ); gmtry1 = JGeometry.load(dbobj1); JGeometry geomrect = new
JGeometry(2003,gmtry1.getSRID(),arrrect,ordrect); sh_rect=geomrect.createShape();
Example 5: A prototype Code in Java Server Page to create Buffer using Geo database
International Journal Of Scientific & Engineering Research, Volume 3, Issue 3, March‐2012 4 ISSN 2229‐5518
Fig 3(a): Buffer Creation for give Area of Interest (AOI) Data source: BISAG
3 (b) Data inside Yellow Ring (buffered)
3 (c) Data inside Red
Ring (buffered)
3 (d) Union of Two Rings (buffered)
To perform geo‐processing in the GIS system, it requires
very efficient algorithm and good processing power. It is required to find the association between two objects
in Geo ‐processing. If two objects are A and B it can be associated like A touches B or A inside B or A equals B etc. Other GIS operations can be performed like difference between two geometries, Union, Intersection.
Table 1: Various operations based on Set theory
6 TOPOLOGICAL RELATIONSHIPS Topology is a major area of mathematics concerned with
properties that are preserved under continuous deformations of objects, such as deformations that involve stretching, but no tearing or gluing. It emerged through the development of concepts from geometry and set theory, such as space, dimension, and transformation.
Fig 4 Possible association combinations between objects A and B
7 METHODOLOGY Use of oracle Geo Database can be utilized by converging
Information Communication Technology (ICT). The Geo Database communicates with Web based GIS Map engine. The
International Journal Of Scientific & Engineering Research, Volume 3, Issue 3, March‐2012 5 ISSN 2229‐5518
GIS map engine talks with client with authentication from Java Application Programming Interface (API).
Fig: 5 Interface to access Oracle Spatial Geometry
8 3D RENDERING Oracle spatial can extrude a Two‐Dimensional Geometry to
Three Dimension. Two‐dimensional footprints of buildings can be extruded using the EXTRUDE function in the SDO_UTIL package to erect a building on the two‐dimensional footprint by specifying the ground height and the top height for each vertex of the two‐dimensional geometry.
Fig 6(a) Example of a two-dimensional solid with the top heights and
ground heights specified and Fig 6(b) the extruded solid object SDO_UTIL.EXTRUDE ( geometry SDO_GEOMETRY, groundheights SDO_NUMBER_ARRAY, topheights SDO_NUMBER_ARRAY,
result_to_be_validated VARCHAR2 tolerance NUMBER ) RETURN SDO_GEOMETRY
Example 6: Extrude schema The parameters can be interpreted as shown below: geometry: This specifies the input two‐dimensional SDO_GEOMETRY object that needs to be extruded. groundheights: This is an array of numbers, one each for each vertex for use as the ground height (minimum z value). If only one number is specified, then all vertices get the same value (that is specified here). topheights: This is an array of numbers, one each for each vertex for use as the top height (minimum z value). If only one number is specified, then all vertices get the same value (that is specified here). result_to_be_validated: This is a character string that can be set to either ʹTRUEʹ or ʹFALSEʹ. This string informs Oracle whether to validate the resulting geometry. tolerance: This specifies the tolerance to use to validate the geometry. A prototype example extruding a Polygon to a Three‐Dimensional Solid is given below. SELECT SDO_UTIL.EXTRUDE( SDO_GEOMETRY ‐‐ first argument to validate is geometry ( 2003, ‐‐ 2‐D Polygon NULL, NULL, SDO_ELEM_INFO_ARRAY(1, 1003, 1 ‐‐ A polygon element ), SDO_ORDINATE_ARRAY (0,0, 2,0, 2,2, 0,2, 0,0) ‐‐ vertices of polygon ), SDO_NUMBER_ARRAY(‐1), ‐‐ Just 1 ground height value applied to all vertices SDO_NUMBER_ARRAY(1), ‐‐ Just 1 top height value applied to all vertices ʹFALSEʹ, ‐‐ No need to validate 0.5 ‐‐ Tolerance value ) EXTRUDED_GEOM FROM DUAL; Example 7: Extruding a sample data using SQL statement EXTRUDED_GEOM(SDO_GTYPE, SDO_SRID, SDO_POINT(X, Y, Z), SDO_ELEM_INFO, SDO_ORDINA ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐SDO_GEOMETRY( 3008, ‐‐ 3‐Dimensional Solid Type NULL, NULL, SDO_ELEM_INFO_ARRAY( 1, 1007, 1, ‐‐ Solid Element
International Journal Of Scientific & Engineering Research, Volume 3, Issue 3, March‐2012 6 ISSN 2229‐5518
1, 1006, 6, ‐‐ 1 Outer Composite Surface made up of 6 polygons 1, 1003, 1, ‐‐ First polygon element starts at offset 1 in SDO_ORDINATES array 16, 1003, 1, ‐‐ second polygon element starts at offset 16 31, 1003, 1, ‐‐ third polygon element starts at offset 31 46, 1003, 1, ‐‐ fourth polygon element starts at offset 46 61, 1003, 1, ‐‐ fifth polygon element starts at offset 61 76, 1003, 1), ‐‐ sixth polygon element starts at offset 76 SDO_ORDINATE_ARRAY( ‐‐ ordinates storing the vertices of the polygons 0, 0, ‐1, 0, 2, ‐1, 2, 2, ‐1, 2, 0, ‐1, 0, 0, ‐1, 0, 0, 1, 2, 0, 1, 2, 2, 1, 0, 2, 1, 0, 0, 1, 0, 0, ‐1, 2, 0, ‐1, 2, 0, 1, 0, 0, 1, 0, 0, ‐1, 2, 0, ‐1, 2, 2, ‐1, 2, 2, 1, 2, 0, 1, 2, 0, ‐1, 2, 2, ‐1, 0, 2, ‐1, 0, 2, 1, 2, 2, 1,2, 2, ‐1, 0, 2, ‐1, 0, 0, ‐1, 0, 0, 1, 0, 2, 1, 0, 2, ‐1)) Example 8: output of Extruded sample Following example shows the satellite image of a building in New York City, USA taken from Google map. The Building ground height and top height are measured in the form of latitude, longitude and altitude with Global Positioning System (GPS). The building corner points are marked and respective co‐ordinates are extruded using SDO_UTIL.EXTRUDE method.
Fig: 7(a) Building in a satellite image with edges marked with dark spots. The extruded geometry vertices returns an array that is plotted as an object as shown in the below figure.
Fig 7(b): Extruded building
9 APPLICATIONS The proposed method is used to derive new datasets from
the data warehouse. The geo‐processing kind of tasks that require very efficient algorithms can be effectively carried out by leveraging the potential of Oracle spatial geo database. 3D Elevation Model, Triangulation Irregular Network (TIN) can also be generated from geo database. Moreover, centralized data‐centric and secured applications can be developed and scaled at enterprise level.
10 LIMITATIONS The geo‐processing task and extrude task are memory
intensive. The proposed methodology can be optimized by introducing cloud based storage and routing mechanism. Parallel Processing can also optimize the performance to render the output in terms of geographical maps.
11 CONCLUSIONS The proposed solution is scalable to enterprise level. The
Spatial Data Infrastructure (SDI) can be established that can be helpful to derive spatial data and attributes from various datasets of different organizations in multiple combinations that can provide near real time information resulting in efficient and effective decision support system (DSS) to help multiple areas like Health, Defense, Disaster etc.
ACKNOWLEDGEMENTS
The authors would like to thank Mr. T .P. Singh, The Director of BISAG for his inspiration and motivation.
REFERENCES [1] Oracle Spatial User guide and Reference Release 2002,
a96630.pdf [2] OpenGIS Coordinate Transformation Service Implementation
Specification from http://www.opengeospatial.org/standards. [3] http://www.opengeospatial.org/ogc [4] http://en.wikipedia.org/wiki/Geoprocessing [5] http://en.wikipedia.org/wiki/Topology [6] Z.L. Li, R.L. Zhao, and J. Chen, ʺA generic algebra for spatial
relations,ʺ Progressing in Natural Science, vol. 12, no. 7, pp. 528‐536, July 2002.Oracle® Spatial User’s Guide and Reference Release 9.2
[7] Ravi Kothuri, Albert Godfrind, and Euro Beinat, Pro Oracle® Spatial for Oracle Database 11g, Apress, 2007
[8] C. Murray, “Oracle spatial Topology and Network Data Models,” Oracle Corporation, pp. 193‐251, 2005
[9] http://wikihelp.autodesk.com/Infr._Map_Server/enu/2012/Help/0000‐Administ0/0104‐Data_Mod104/0182‐Data_Mod182
International Journal of Applied Information Systems (IJAIS) – ISSN : 2249-0868 Foundation of Computer Science FCS, New York, USA Volume – No. , January 2013 – www.ijais.org
1
Detection of DDoS Attack in semantic web
Prashant Chauhan
BISAG
Gandhinagar
Gujarat, India
Abdul Jhummarwala
BISAG
Gandhinagar
Gujarat, India
Manoj Pandya
BISAG
Gandhinagar
Gujarat, India
ABSTRACT
We are currently living in an era where information is very
crucial and critical measure of success. Various forms of
computing are also available and now a days it is an era of
distributed computing where all the task and information are
shared among participated nodes. This type of computing can
have homogeneous or heterogeneous platform and hardware.
The concept of cloud computing and virtualization has gained
much momentum and has become a more popular phrase in
information technology. Many organizations have started
implementing these new technologies to further reduce costs
through improved machine utilization, reduced administration
time and infrastructure costs.
Cloud computing also faces challenges. One of such problem
is DDoS attack so in this paper we will focus on DDoS attack
and how to overcome from it using honeypot. For this here
open source tools and software are used.
Typical DDoS solution mechanism is single host oriented and
in this paper focused on distributed host oriented solution that
meets scalability.
General Terms
Cloud Security, Network Monitoring, Distributed Log
Processing.
Keywords
Cloud Security, DDoS, Honeypot, Hadoop.
1. INTRODUCTION In cloud computing hardware, software etc. are available to
the end user in form of services. The user can subscribe for
the required service and he will be charged on the go. Means
he needs not to pay while resources are not in use. In cloud
security we have basic three characteristics confidentiality,
integrity and availability.
Availability is one of important aspect in cloud security
stating availability of service while user want to consume but
due to some security threats availability can not be achieved.
DDoS attack is one such threat which is distributed form of
Denial of Service attack in which service is consumed by
attacker and legitimate user can not use the service.
We can find solution against DDoS attack but they are based
on single host and lacks performance so here Hadoop system
for distributed processing is used.
2. Cloud Computing In cloud computing every resource is available to end user in
form of service and user pay only for what he has used. This
kind of freedom for the user leads cloud services more
popular. Cloud computing services are separated by layer
shown in fig 1.
Fig 1 Layers of Cloud Computing
1. Application Layer
This layer provides Software as a Service (SaaS).
Applications are provided on demand basis in this layer.
Highly scalable internet based applications are hosted on the
cloud and offered as a service to the end user. Google Docs,
SalesForce.com, Acrobat.com are some popular SaaS.
2. Platform Layer
This layer provides Platform as a Service (PaaS). In this layer
platforms used to design, develop, build and test application
are provided by the cloud infrastructure. Some popular PaaS
providers are Google App Engine, Azure Service Platform.
3. Infrastructure Layer
This layer provides Infrastructure as a Service (IaaS). In this
layer hardware and network equipment are provided as
service. It also provides storage and database management
and computing capabilities on demand. Some well known
International Journal of Applied Information Systems (IJAIS) – ISSN : 2249-0868 Foundation of Computer Science FCS, New York, USA Volume – No. , January 2013 – www.ijais.org
2
IaaS providers are Amazon Web Services, Go Grid and 3
Tera.
3. Cloud Security Cloud security has three basic aspects called confidentiality,
Integrity and Availability. This are also called basic security
goals. Fig 2 shows cloud security goals and availability is
focused in this research paper. DDoS attack is an issue of
availability of any cloud service.
A. Confidentiality
Confidentiality is the prevention of the intentional or
unintentional unauthorized disclosure of contents. Sometime
it is referred as Secrecy or Privacy in other terms. It means
that Confidentiality is the prevention of the intentional or
unintentional unauthorized disclosure of contents. Only
authorized person can read, write, view, print and know about
the content is available.
To ensure confidentiality network security protocols, Data
Encryption services, Authentication services are used.
Fig 2 Cloud Security Goals
B. Integrity
Integrity is the guarantee that message sent is received
without changing and the message is not intentionally or
unintentionally altered.
To ensure integrity Firewalls, Intrusion Detection Systems
etc. are used.
C. Availability
Availability is the element that creates reliability and stability
in the system. It means that the data is available whenever
asked.
To ensure Availability redundant disks and network system
and Backup utility are used.
4. Distributed Denial of Service Attack
4.1 Definition and Purpose A denial-of-service attack (DoS attack) is an attempt to make
a computer resource unavailable to its intended users.
Distributed denial-of-service attack (DDoS attack) is a kind of
DoS attack where attackers are distributed and targeting a
victim. It generally consists of the concerted efforts of a
person, or multiple people to prevent an Internet site or
service from functioning efficiently or at all, temporarily or
indefinitely.
Motives of attackers to perform Denial of Service attack could
be of different nature; an attacker might want to show his
power by attacking a large, popular Web site that will permit
him or her to be recognised in the underground community.
Another possibility of motive is a political one, for example, a
political party can ask an attacker to do a DoS attack on the
Web site of a concurrent political party. Most common motive
is the commercial one; a company can ask an attacker to
attack a concurrent; during the time where the commercial
website is unavailable it will lose lots of money and worst, the
trust of user and they might want to go shopping on another
website.
4.2 How DDoS Attack works DDoS attack is performed by infected machines called bots
and group of bots is called botnet. This bots (Zombie) are
controlled by an attacker by installing malicious code or
software which acts as per command passed by attacker. Bots
are ready to attack anytime upon receiving command from the
attacker. Many types of agents have scanning capability that
permit to identify open port of a range of machines. When the
scanning is finished, the agent takes the list of machines with
open port and launches vulnerability-specific scanning to
detect machines with un-patched vulnerability. If the agent
found a machine with vulnerability, it could launch an attack
to install another agent in the machine.
Fig 3 shows DDoS attack architecture explaining its working
mechanism.
Fig 3 DDoS attack Architecture
For controlling agents, the attackers use Command and
Control (C&C) server, currently, the majority of botnet use
Internet Relay Chat (IRC). Other possibilities exist as Peer-to-
Peer (P2P) C&C, Instant Messaging (IM) C&C or even Web-
based C&C server. When the agent is connected to the C&C it
can ask for updates as IP address for other C&Cs, software
update or exploit software. The agent could also ask for
orders, if the agent is newly installed it could ask order for
protect himself, this could be by asking the C&C the location
of the latest anti-virus and prevent it to detect the agent by
stopping the service.
A DoS attack’s main characteristic is that an attacker attempts
to prevent one or more legitimate users of a service from the
use of the required resources. There are two general forms of
DoS attacks: those that crash services and those that flood
services. Therefore, he attempts (1) to inhibit legitimate
network traffic by flooding the network with useless traffic.
International Journal of Applied Information Systems (IJAIS) – ISSN : 2249-0868 Foundation of Computer Science FCS, New York, USA Volume – No. , January 2013 – www.ijais.org
3
(2) to deny access to a service by disrupting connections
between two parties, (3) to block the access of a particular
individual to a service, or (4) to disrupt the specific system or
service itself. Here http flooding as a DDoS attack is
considered and its solution is proposed.
5. Hadoop MapReduce
5.1 Introduction MapReduce, developed by Google, is a software paradigm for
processing a large data set in a distributed parallel way. Since
Google’s MapReduce and Google file system (GFS) are
proprietary, an open-source MapReduce software project,
Hadoop, was launched to provide similar capabilities of the
Google’s MapReduce platform by using thousands of cluster
nodes. Hadoop distributed file system (HDFS) is also an
important component of Hadoop, that corresponds to GFS.
Hadoop consists of two core components: the job
management framework that handles the map and reduces
tasks and the Hadoop Distributed File System (HDFS).
Hadoop's job management framework is highly reliable and
available, using techniques such as replication and automated
restart of failed tasks. The framework has optimization for
heterogeneous environments and workloads, e.g., speculative
(redundant) execution that reduces delays due to stragglers.
Fig 4 Hadoop Multinode Cluster Architecture
Fig 4 shows Hadoop multinode cluster architecture which
works in distributed manner for MapReduce problem.
5.2 DDoS Attack and Hadoop DDoS attack is critical and find easily but conventional
solution are single host oriented. We know that for that we
can use honeypot system to overcome from it. Some network
monitoring tools like wireshark, nmap are available but they
just sniff the network and generate log files in huge size. So to
handle this gigantic file we need a special file system like
HDFS and MapReduce framework for processing log
effectively and efficiently. So Hadoop is a well suited solution
here.
6. MapReduce for Counter-based DDoS
Detection method In this method simple MapReduce algorithm to detect DDoS
with URL counting is implemented. This algorithm needs
three input parameters of time interval, threshold and
unbalanced ratio, which can be loaded through the
configuration property or the distributed cache mechanism of
MapReduce. Time interval limits monitoring duration of the
page request. Threshold indicates the permitted frequency of
the page request to the server against the previous normal
status, which determines whether the server should be
alarmed or not. The unbalanced ratio variable denotes the
anomaly ratio of response per page request between a specific
client and a server. This value is used for picking out attackers
from the clients.
The map function filters non-HTTP GET packets and
generates key values of server IP address, masked timestamp,
and client IP address. The masked timestamp with time
interval is used for counting the number of requests from a
specific client to the specific URL within the same time
duration. The reduce function summarizes the number of URL
requests, page requests, and server responses between a client
and a server. Finally, the algorithm aggregates values per
server. When total requests for a specific server exceeds the
threshold, the MapReduce job emits records whose response
ratio against requests is greater than unbalanced ratio,
marking them as attackers. While this algorithm has the low
computational complexity and could be easily converted to
the MapReduce implementation, it needs a prerequisite to
know the threshold value from historical monitoring data in
advance.
Fig 5 MapReduce for counter-based DDoS detection
7. Result As far as scalability is concerned, the performance of the
counter based DDoS detection method is used by varying
cluster nodes. Figure 6 shows statistical graph of the detection
job with ten worker nodes was completed within 25 minutes
for 500GB and 47 minutes for 1TB, which is over 8 times
faster than one node's and 2.9 times faster than 3 nodes'
respectively. From evaluation results the increased
performance enhancement is observed as the volume of input
traffic becomes large.
International Journal of Applied Information Systems (IJAIS) – ISSN : 2249-0868 Foundation of Computer Science FCS, New York, USA Volume – No. , January 2013 – www.ijais.org
4
Fig 6 Completion time of a counter-based DDoS detection
job regarding various # of Hadoop datanodes and traffic
sizes
8. Conclusion Scalability of the detection of DDoS attack is focused in this
paper and focused a Hadoop based DDoS detection model.
The approach of Hadoop based solution solves scalability
issues by parallel data processing for DDoS attack detection
for a huge volume of traffic. Other approaches are single host
based and suggesting memory increment for efficiency but in
this paper suggested a Hadoop based solution which solves
scalability issue by parallel data processing.
This kind of MapReduce job is preferable for Hadoop as it is
Computation oriented framework.
9. REFERENCES [1] NathalieWeiler, Eleventh IEEE International Workshops
on Enabling Technologies,2002, “Honeypots for
Distributed Denial of Service Attacks”
[2] Yuqing Mai, Radhika Upadrashta and Xiao Su, ITCC’04
“J-Honeypot: A Java-Based Network Deception Tool
with Monitoring and Intrusion Detection“
[3] Yeonhee Lee, Wonchul Kang, and Youngseok Lee, 51-
63, TMA'11“A Hadoop-based Packet Trace Processing
Tool”
[4] Yun Yang, Hongli Yang and JiaMi, IEEE, 2011, “Design
of Distributed Honeypot System Based on Intrusion
Tracking”
[5] Anup Suresh Talwalkar, 2011, “HadoopT - Breaking the
Scalability Limits of Hadoop”
[6] Flavien Flandrin, 2010, “A Methodology To Evaluate
Rate-Based Intrusion Prevention System Against
Distributed Denial Of Service”