introduction to artificial intelligence: applications in computational biology

76
Intelligent Systems Laboratory Introduction to Artificial Intelligence: Applications in Computational Biology Susan M. Bridges [email protected]

Upload: tekli

Post on 07-Jan-2016

56 views

Category:

Documents


2 download

DESCRIPTION

Introduction to Artificial Intelligence: Applications in Computational Biology. Susan M. Bridges [email protected]. Outline. What is AI? Search Expert systems Uncertainty Machine learning Data mining. Intelligent Systems and Computational Biology. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Introduction to Artificial Intelligence: Applications in

Computational Biology

Susan M. Bridges [email protected]

Page 2: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Outline• What is AI?

• Search

• Expert systems

• Uncertainty

• Machine learning

• Data mining

Page 3: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Intelligent Systems and Computational Biology

• First applications (DNA) in which great progress was made were digital• Signal processing algorithms• Text processing techniques

• Many of the most interesting and difficult problems to be tackled are analog• Protein structure• Gene expression• Metabolic networks

Page 4: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Definitions of AI(What is AI?)

• Rich, E. and K. Knight . 1991. Artificial Intelligence. New York: McGraw-Hill.

“Artificial intelligence (AI) is the study of how to make computers do things which at the moment, people do better.”

Page 5: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Another definition of AI• Winston, Patrick Henry. 1984. Artificial Intelligence.

1984. Addison-Wesley, Reading, MA.

“Artificial Intelligence is the study of ideas that enable computers to be intelligent. Intelligence includes: ability to reason, ability to acquire and apply knowledge, ability to perceive and manipulate things in the physical world, and others.”

Page 6: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Why Study AI?

• Understand human human intelligence

• Develop “intelligent” machines• Robotics

• Programs with intelligent properties

Page 7: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Acting Rationally:Turing Test Approach

Interrogator

Page 8: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

AI Tasks

• Mundane tasks• Perception

» Vision

» Speech

• Natural Language» Understanding

» Generation

» Translation

• Common sense reasoning

• Robot control

• Formal tasks• Games

• Mathematics» Geometry

» Logic

» Integral calculus

• Expert tasks• Engineering

• Scientific analysis

• Medical diagnosis

• Financial analysis

Page 9: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Intelligent Agents• Agent

• Perceives its environment using sensors• Acts on environment using effectors

• Rational agent • An agent that does the right thing • Basis for action

» A measure of degree of success.» Knowledge of what has been perceived so far.» The actions that the agent can perform

• Autonomous Agent• Learns from experience• Makes independent decisions

Page 10: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Major Topics

• Search

• Knowledge Representation

• Machine Learning

Page 11: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Problem-solving agent• A type of goal-based agent

• Find sequence of actions that lead to a desirable state

• Intelligent agents should make a set of changes in the state of the environment that maximizes the performance measure

• Life is simpler if we can set a goal and aim to satisfy it.

Page 12: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Components of a problem• Initial state• Set of possible actions

• actions can be described as operators» an operator describes an action by specifying the state that can be

reached by carrying out an action in a particular state

• actions can be described in terms of a successor function S. Given a particular state x, S(x) returns the set of states reachable from x by any single action.

Operator aState x State y

Page 13: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

State Space• The set of all states reachable from the initial

state by any sequence of actions

• A path in the state space is a sequence of actions leading from one state to another

• The agent can apply a goal test to any single state to determine if it is a goal state.

• If one path is preferable to another, then we may need to compute path cost (g).

Page 14: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

5 4

7

6 1 8

3 2

1 2

7

8

3

4

6 5

Initial State Goal State

States Goal Test

Operators Path Cost

Page 15: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Mathiston

PhebaWest Point

Maben

Starkville

Columbus

Ackerman

Sturgis Artesia

Crawford

BrooksvilleLouisville

Mayhew

Problem: Find route from Louisville to West Point

Page 16: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

LouisvilleA. The initial state

Louisville

Ackerman Starkville Brooksville

B. After expanding Louisville

C. After expanding Ackerman

Louisville

Ackerman Starkville Brooksville

Maben Sturgis Louisville

Page 17: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Some terms• New states are generated from old states by

operators.

• This is called expanding the state.

• The choice of which state to expand first is called the search strategy

• Result is called a search tree

• The set of nodes waiting to be expanded is called the fringe or frontier

Page 18: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Search Strategies• Requirements for a good search strategy

• causes motion

• is systematic

• State space can usually be represented as a tree or a graph

• Two important parameters of a tree• branching factor (b)

• depth (d)

Page 19: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Two Types of Searches• Uninformed or blind search

• systematically generate states

• test states to see if they are goal states

• Informed or heuristic search• use knowledge about the problem domain

• explore search space more efficiently

• may sacrifice accuracy for speed

Page 20: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Breadth-first search• All nodes at each depth d are expanded

before any nodes at depth d+1

Page 21: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Depth-first search Always expands one of the nodes at the

deepest level of the tree Parameter m is the maximum depth

Page 22: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

What is a heuristic?(rule of thumb)

• A heuristic is a formalized rule for choosing those branches in a state space that are most likely to lead to an acceptable solution (Luger and Stubblefield, 1998).

• Used two ways• some problems do not have exact solutions, so we

just do the best we can (medical diagnosis)

• there may be an exact solution, but it may be very expensive to find

Page 23: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Hill Climbing• Use an heuristic function (or objective or

evaluation function) to decide which direction to move in the search space.

• Always move toward the state that appears to be best (basing all decisions on local information).

• Assume that we want to maximize the value of the function.

• Can also be used for minimization (called gradient descent)

Page 24: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

1 2 3

7 8 4

6 5

1 2 3

7 4

6 8 5

1 2 3

7 8 4

6 6 5

1 2 3

7 8 4

6 5h=

Goal

1 2 3

8 4

7 6 5

Steepest Ascent Hill Climbing

Using Manhattan Distance

Heuristic

h= h=

1 2 3

7 8 4

7 6 5

h=

1 2 3

7 8 4

6 6 5

h=

Page 25: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

A* Search• Minimizing the total path cost

• Combines uniform-cost search and greedy search.

• Evaluation function:f(n) = g(n) + h(n)

g(n): cost of path from start to node n

h(n): estimate of cost of path from n to goal

f(n): estimated cost of the cheapest solution through n

Page 26: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Goal: Minimum length path.

Is h(n) an admissible heuristic?

f(n) = g(n) + h(n)

K (18) L ( 3) M(2) N(9) O(5) P(2) Q(10) R(12)

S (18) T(0) U (0)

A(22)

B (18) C (21) D (8)

E(12) F(7) G (9) H(6) I (13) J(14)

d = 0

d = 1

d = 2

d = 3

d = 4

5 103

6 12 4 7 8 11

3 11 4 2 7 1 5 12 3 4

3 4 14 6 5

Numbers in parentheses are h(n)

Numbers on edges are operator costs

Page 27: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Multiple Sequence Alignment • DNA and protein sequences

• Alignment of multiple sequences created by inserting gaps to shift characters to matching positions

ATCG -ATCG- TGA --T-GA

GAT GAT---

• Optimal alignment maximizes the number of matching positions

Page 28: Introduction to Artificial Intelligence:    Applications in Computational Biology

Multiple Sequence Alignment As State-Space Search

(Eric Hansen, Rong Zhou)

ATCG -ATCG- TGA --T-GA

GAT GAT---

start

goal

Space Complexity: O (LN) Time Complexity: O (2NLN)Where L is the average length of sequences and N is the number of sequences

Page 29: Introduction to Artificial Intelligence:    Applications in Computational Biology

Nodes pruned by Anytime A*

An Illustration of Anytime A*

0 918

5 617

4 718

7 415

8 314

9 213

3 716

2 818

11 011

1 919

6 516

4 718

3 819

2 920

g hf

f = g + 2h

Goal

g hf

= expanded node

g hf= stored but not

expanded node

Total number of nodes stored = 8

Page 30: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Genetic Algorithms• Search procedure based on a simple model

of evolution

• Uses a “random” process to explore search space

• Has been applied in many domains

Page 31: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Terminology• Begin with a population of individuals. Each individual

represents a solution to the problem we are trying to solve.

• A data structure describes the genetic structure of the individual. (Assume for initial discussion that this is a string of 0’s and 1’s).

• In genetics, the strings are called chromosomes and the bits are called genes.

• The string associated with each individual is its genotype

• Selection is based on fitness of individuals

Page 32: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

The Genetic Algorithm• Each evolving population of individuals is

called a generation.

• Given a population of individuals corresponding to one generation, the algorithm simulates natural selection and reproduction in order to obtain the next generation.

Page 33: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Three basic operations• Reproduction:

• Individuals from one generation are selected for the next generation

• Crossover: • Genetic material from one individual is exchanged

with genetic material from another individual

• Mutation: • Genetic material is altered

Page 34: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

General GA ProcedureSelection, crossover, and mutation operations

Initial population

Parent candidate pool Father and

Mother

Offspring

Crossover and mutate

Next generation populationConverge?

Evaluate fitness

Evaluate fitness and replace

Select parents

no

yes

Page 35: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Example of General GA Procedure

Selection, crossover, and mutation operations

Generation n 1 1 0

11 0 1

10 1 0

01 0 0

1

11 13 2 9

Reproduction 1 1 0

11 0 1

11 0 0

11 0 1

1Crossover

1 1 1 0 1

0 1 1

Mutation

1 0 0 1

1 0

0 1

1 1

1

0 1 1

1 0 1

0

1 0 1 1

0 0

1 0 0 1

1 1

Generation n+ 1

15 8 1 13

Page 36: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Two keys to the success of a GA

• Data structures for»Genes»Chromosomes»Population

• Fitness evaluation function

Page 37: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Knowledge Representation

• Semantic networks

• Frame based systems

• Rule based expert systems

• Ontologies

• Neural networks

Page 38: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Anything

AbstractObjects Events

Sets NumbersRepresentational

Objects

Intervals

Places

PhysicalObjects

Processes

Categories

Sentences Measurements

Moments

Times Weights

Things Stuff

Animals Agents

Humans

Page 39: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Expert Systems

• Rule based systems• Garnered a great deal of attention in the 1980’s

• Most famous examples are in medical domains

• Stimulated interest in “logic programming”

• Encode knowledge of people as sets of rules

• Still widely used

• Knowledge acquisition bottleneck

Page 40: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Representing Uncertainty

• Fuzzy logic

• Bayesian reasoning

Page 41: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Uncertainty versus Vagueness• Certainty–degree of belief

• there is a 50% probability of rain today

• I am 30 % sure the patient is suffering from pneumonia

• Vagueness–the degree to which an item belongs to a category• the man is tall

• move the wheel slightly to the left

• the patient’s lungs are highly congested

Page 42: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Fuzzy Sets Represent Vagueness

• Lotfi Zadeh popularized the idea in the 60’s

• Popular concept in Eastern philosophy

• Reasoning with fuzzy sets is called fuzzy logic

• Fuzzy logic is also called • approximate reasoning

• continuous logic

Page 43: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Fuzzy Set Definitions• Set membership can be expressed using a

characteristic (or descrimination) function

• Classic (or crisp) setsIf objects x are chosen from some universe X

• Fuzzy sets - an element can be a partial member of a set (grade of membership)

Aset ofelement an not is x if 0

Aset ofelement an is x if 1)(xA

0 A(x) 1

Page 44: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Examples of Fuzzy Concepts from Natural Language

• John is tall

• The weather is rainy

• Turn the volume up a little

• Dr. Bridges’ tests are long

• Add water until the dough is the right consistency

• There was very little change in the cost

• The water bill was somewhat high

Page 45: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Representing Fuzzy Sets• Enumeration of membership values of all

elements with non-zero membership

TALL = {.125/5.5, .5/6, .875/6.5, 1/7, 1/7.5, 1/8}

• Represent membership with a function

Page 46: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Functional Representations Fuzzy Set Tall

0

1

4 5 6 7

Tall

Height in feet

Membership

Page 47: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Linguistic (or Fuzzy)Variable• Usually corresponds to a noun

• The values of a linguistic variable are fuzzy sets (which correspond to adjectives)

• Examples:Linguistic variable Fuzzy sets

Height short medium tall

Weight light average heavy

Temperature cold cool typical warm hot

Speed slow medium fast

Page 48: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Linguistic Variable Temperature

30 40 50 60 70 80 90 100

0

1

Cold Normal Hot

Page 49: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Some Fuzzy Set Operations• Set union A B

A B(x)max(A(x),B(x)) for all x X

alternate syntax (join operator)

A B(x)A(x)B(x)) for all x X

• Set intersection A BAB(x)min(A(x),B(x)) for all x X

alternate syntax (meet operator)

A B(x)A(x) B(x)) for all x X

Page 50: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Fuzzy Reasoning• A fuzzy proposition is a statement that asserts a

value for a linguistic (or fuzzy) variable• Example: Joe’s height is medium

» Linguistic variable (noun) Joe’s height

» Fuzzy set (adjective) medium

» The fuzzy set “medium” is a value of the linguistic variable “Joe’s height”

• A fuzzy rule relates two or more fuzzy propositions

• Fuzzy inference techniques are used to draw conclusions using fuzzy rules

Page 51: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Example Fuzzy Rule

If speed is normal

then braking.force is medium

Speed

Normal = (0/0, .1/20, .8/40, 1/60, .1/80, 0/100)

braking.force

Medium = (0/0, .5/1, 1/2, 1/3, .2/4, 0/5)

Page 52: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

J. Dickerson, D. Bedeant, Z.Cox, W. Qi, D. Ashlock, and E. Wurtele, Atlantic Symposium on Computational Biology and Genome Information Systems & Technology (CBGIST 2000,

26-30.

Page 53: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Bayesian Reasoning• Bayesian networks: Represent knowledge as

a network of random variables

• Many names and many variations• Belief networks

• Probabilistic networks

• Causal networks

• Knowledge Maps

• Influence Diagram (extension)

• Decision Network (extension)

Page 54: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Belief Network

Burglary Earthquake

Alarm

JohnCallsMaryCalls

P(B)

0.001P(E)

0.002

B E P(A|B,E) T T 0.95 T F 0.94 F T 0.29 F F 0.01

A P(J|A)

T 0.90

F 0.05

A P(M|A)

T 0.70

F 0.01

Page 55: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Page 56: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Classification of Learning Systems

• Supervised learning• Give the system a set of examples and an

“answer” for each example.• Train the system until it can give the correct

response to these examples (or most of them).

• Unsupervised learning• Give the system a set of examples and let it

discover interesting patterns in the examples.

• Reinforcement learning• Learn from rewards and penalties

Page 57: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Feature Vectors• Simple representation

used by most learning systems.

• Represents each example as a vector or numbers • Quantities

• Nominal data

• Ordinal data#53 5.3 cold 3.2 blue 1

Page 58: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Neural Networks• Computational models “loosely” based on the

structure of the brain• Characteristics of the brain

• Large number of simple processing units (neurons)• Highly connected• No central control• Neurons are slow devices compared to digital computers• Can perform complex tasks in a short period of time• Neurons are failure-prone devices• Handles fuzzy situations very well.• Information accessed on the basis of content• Learns from experience

Page 59: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Neural Networks• Based on model of nervous system

• Large numbers of simple processing units

• Units are highly connected and connections are weighted.

• Highly parallel distributed control

• Emphasis on learning internal representations automatically

Page 60: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Neural Network Concepts• Cell or unit or neuron or node

• Autonomous processing unit that models a neuron• Purpose

» Receives information from other cells» Performs simple processing» Sends results on to one or more cells

• Layers • A collection of cells that perform a common function• Types:

» Input layer» Hidden layer» Output layer

Page 61: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Layers of Neurons

I1

I2

I3

H1

H2

O1

Input Layer Hidden Layer Output Layer

Page 62: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Properties• In general, there is no interconnection

between cells in the same layer

• Connections are one or two way communications links between two cells

• Weights are the strength of the connections. A weight wij is a real number than indicates the influence that cell ui has on cell uj

Page 63: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

More about weights• Positive weights indicate reinforcement

• Negative weights indicate inhibition

• Weight of 0 indicates no influence or connection

• Weights may be initialized to one of these:• 0

• predefined values

• random values

• Weights are altered by experience

Page 64: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Multilayer Feed-Forward Networks

• Networks that are connected acyclic graphs

• Backpropagation• Most popular training method for feed forward layered

networks.

• Invented in 1969 by Bryson and Ho

• Ignored until 80’s

• Supervised learning technique

Page 65: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Back Propagation• Initialize the network with random weights

• Show it an input instance

• Compute the output

• Determine how much the output differs from the goal.

• Feed small adjustments to the weights back through the network based on the error.

Page 66: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

General Algorithm for Training Network

Initialize NN

for epoch = 1 to MAXEPOCHS

for each input-output pair in the training set

present an example and compute error

adjust weights to reduce the error

compute mean-square-error for training set

for each input-output pair in the test set

compute error

compute mean-square-error for test set

Page 67: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Generalization

% correct

Training time (# of epochs)

Training set

Test set

Page 68: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Computational Biology Applications• Protein classification

• Sequence analysis

• DNA fragment assembly

• Prediction of transmembrane regions

• Phylogenetic classification of ribosome sequences

• And many more

Page 69: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Self-Organizing Maps• Also called Kohonen maps

• Used for unsupervised learning

• Widely applied for comparison of gene expression data

Page 70: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Principle of Self-Organizing Maps

Page 71: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Self-Organizing Map from Yeast Gene Expression Data

(German Florez)

Page 72: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Knowledge Discovery• Definition: Non-trivial extraction of implicit,

previously unknown, and potentially useful information from data.

• Applications in biology• Text mining

• Association rule mining

Page 73: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

KDD Process

Data

Target Data

--- --- --- ------ --- --- ---

PreprocessedData

TransformedData

Patterns

Knowledge

Selection

Preprocessing

Transformation

Data Mining

Interpretation/Evaluation

Page 74: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

Iterative Clustering Procedure( Wan, Bridges, J.Boyle, A.Boyle)

Download data from Genbank

Represent data using an appropriate method

Construct feature vector for each gene from POSITIONAL WEIGHT MATRICES

Cluster with K-clustering

Visualize results

Conclusions

Select clustering without clear patterns for further clustering

Page 75: Introduction to Artificial Intelligence:    Applications in Computational Biology

Positional Weight Matrix Representation

Page 76: Introduction to Artificial Intelligence:    Applications in Computational Biology

Intelligent Systems Laboratory

S. solfataricus with A, G box (2976)Clustering window (-48 to -1)

Nearby with A, G box (286)Cluster window (-24 to -1)

With A box (1656)

With A box (146)

With G box (1320)

With A and G box (142)

Distant with A box (1370)

Nearby with G box (571)

Distant with G box (749)

Clustering Results