de la 'data analysis' la 'topologie
TRANSCRIPT
De la "Data Analysis" la "Topologie "
Dan Burghelea
Department of MathematicsOhio State University, Columbuus, OH
Pitesti 2019
D.Burghelea De la "Data Analysis" la "Topologie
Data
⇓
Topology
New Invariants
Mutual benefits
D.Burghelea De la "Data Analysis" la "Topologie
Data Analysis
1 What is Data?
2 How the Data is obtained?
3 What do we want from Data?
4 How the Topology enter the picture ?
D.Burghelea De la "Data Analysis" la "Topologie
What is "Data"
1. Mathematically Data (≡ Point Cloud Data) is
a finite metric space (X ,d)
possibly equipped with
a map f : X → R or f : X → S1
D.Burghelea De la "Data Analysis" la "Topologie
How is "Data" obtained
a. sampling (of a shape in three or higher dimensional Euclidean space or a
probability distribution)⇒ a collection of points (three coordinates)
b. scanning of a 2 dimensional picture
c. collection of 2D pictures (black-white) of a 3D environmenttaken by a camera from different angles; each 2D picture regarded as a vector in the
pixel space with a gray scale coordinate for each pixel. If camera has 100× 100 pixels
this is a collection of vectors in R10000
d. list of measurements for a collection of objects/individuals;for example observations on the patients in a hospital ( EXAMPLE 2) suspected of
diabetes by measuring: insulin response, glucose tolerance, relative weight , blood
pressure, pulse, hemoglobin level = (A1C)
D.Burghelea De la "Data Analysis" la "Topologie
What one wants to extract from "Data"
In case of sampled geometrical objects
a) derive topological features number of components, number of
holes, the number of tubes transversing the object
without reconstructing the object entirelyb) reconstruct a continuous shape from a sampling.
In case of experimental observationsa) discover patterns and unexpected features,b) detect missing blocks of data and notice clusterings
determine the relevant number of (local ) parameters datadepends on
D.Burghelea De la "Data Analysis" la "Topologie
Example 1.Lung cancer imaging.
- 3D radiological images of cancerous lungs shows both tumorsand blood vessels as areas of increased density.
- Blood vessels show up as long tunnels in the image
- Tumors show up as balls.
Question: How to distinguish automatically between tumorsand blood vessels ?
D.Burghelea De la "Data Analysis" la "Topologie
Example 2. Diabetes Patients(after Miller-Reaven study mentioned in [?])
- Study carried out in 1976 on 145 patients at Stanford Hospital;Most of patients had symptoms of diabetes although somewere normal
- For each patient 6 metabolic variables (involving insulinresponse, glucose tolerence, relative weight, A1C level , bloodpressure .....) were measured and recorded in a 6 dimensionalspace. Hence a point cloud of 145 points in R6
Questions: Find the relevant number of metabolic variablesneeded to detect the diabetes. Find qualitative features (type ofdiabetes, etc).
D.Burghelea De la "Data Analysis" la "Topologie
What Topology does for "Data"
GEOMERTIZE DATA
i.e. converts data into nice topological spaces / possiblyequipped with real or angle valued maps with computablequalitative features:
1. Simplicial complexes (components, tubes, holes arecounted by Betti numbers),
2. Smooth manifolds equipped with a Riemannian metric(curvature)
D.Burghelea De la "Data Analysis" la "Topologie
TOPOLOGY
1. converts a finite metric space into "nice topologicalspace " ;simplicial complex orsimplicial complexs + real / angle- valued maps orsimplicial complex + filtration.
2. uses Homology ( Betti numbers, EP characteristic) tomake mathematically precise and algorithmicallycomputable "qualitative features" of the associated shape.
D.Burghelea De la "Data Analysis" la "Topologie
Why simplicial complex?1. any resonable shape can is homeomorhic to a geometricsimplicial complex2. A finite simplicial complex can be input in a computer as alarge matrix
D.Burghelea De la "Data Analysis" la "Topologie
SIMPLICIAL COMPLEXES
A solid k− simplex is the convex hull of (k + 1) linearlyindependent points .
A geometric simplicial complex K is a "nice subspace ofan Euclidean space " = a union of solid simplicies whichintersect each other in faces .
An abstract simplicial complex is a pair (X ,Σ) with:X a finite set,Σ a family of nonempty subsets of X , covering X s.t.
σ ⊆ τ ∈ Σ⇒ σ ∈ Σ .
The subsets of cardinality (k + 1) are denoted by Xkandreferred to as the k−simplexes.
D.Burghelea De la "Data Analysis" la "Topologie
An abstract simplicial complex determines a geometricsimplicial complex and vice versa.
A simplicial complex is determined by its incidence matrixwhich can be fed in as input of an algorithm.
D.Burghelea De la "Data Analysis" la "Topologie
VIETORIS RIPS complex
To a finite metric space (X ,d) and ε > 0 one asociates: theabstract complex Rε(X ,d) := (X ,Σε = ∪kXk ).
X0 = X ,
Xk := {(x1, x2, · · · xk+1)| iff d(xi , xj) < ε}.
One hasIf ε < ε′ Rε(X ,d) ⊆ Rε′(X ,d).
A map f : X → R provides the simplicial mapsf : Rε(X ,d)→ R
If ε < π a map f : X → S1 provides the simplicial mapsf : Rε(X ,d)→ S1.
D.Burghelea De la "Data Analysis" la "Topologie
If the data is a sample of a compact manifold embedded in theEuclidean space then:
TheoremThere exists α > 0 so that for any ε− dense sample (X ,d),ε < α, the MV-Rips complex Rε(X ,d) is homotopy equivalent tothe manifold.
D.Burghelea De la "Data Analysis" la "Topologie
Ilustration
A fixed set of points can be completed to a Cech complex Cε or to a Rips complex Rε based on a proximity parameter ε. This Cech complex has the homotopy type of the ε/2 cover (S1 ∨ S1 ∨ S1), while the Rips complex has a different homotopy type (S1 ∨ S2).D.Burghelea De la "Data Analysis" la "Topologie
The topology of the ε complexes differ, for different ε’s .
It is therefore desirable to consider all these complexes.
A sequence of Rips complexes for a point cloud data set representing an annulus. Upon increasing ε, holes appear and disappear. Which holes are real and which are noise?
D.Burghelea De la "Data Analysis" la "Topologie
New algebraic topological invariants
For1 A simplicial complex and a simplicial map f : X → R whose
sub levels f−1(−∞, t ] change the homology for finitelymany real values t0 < t1 < t2, · · · < tN ,
2 A simplicial complex and a simplicial map f : X → R orf : X → S1 whose levels f−1(t) change the homology forfinitely many (real or angle values) t0 < t2 < · · · tN ∈ R,
3 A simplicial complex X with a filtrationX0 ⊂ X1 ⊂ · · ·XN−1 ⊂ XN = X ;
D.Burghelea De la "Data Analysis" la "Topologie
Topological persistence- alternative computableinvariants for the classical topological invariants
Inspired from Morse theory/ Morse–Novikov theory oneassociates for any r = 0,1, · · ·
1 to f : X → R, based on changes in homology of sub levelsf−1(−∞,a], a collection of sub level bar codes = intervals[a,b], [a,∞)
2 to f : X → R or f : X → S1, based on changes in thehomology of the levels f−1(t), a collection of four types oflevel bar codes = intervals [a,b], (a,b), [a,b), (a,b] Jordancells {(λ, k) | λ ∈ C \ 0, k ∈ Z≥1}
D.Burghelea De la "Data Analysis" la "Topologie
In (1.) the ends of the interval are values where the topology ofcritical sub level a change.
In (2.) the ends of the interval are values where the topology oflevel a changes.
The Jordan cells (for angle-value map) appear from Jordandecomposition of matrtces which describe the way the topologyof a level θ is related to itself when θ moves along the circle .
D.Burghelea De la "Data Analysis" la "Topologie
D.Burghelea De la "Data Analysis" la "Topologie
One denotes by Bcr (f ), Bo
r (f ), Bc,or (f ) and Bo,c
r (f ) the set ofclosed, open, closed-open, and open-closed r−bar codesof f .
One collects the sets Bcr (f ) and Bo
r−1(f ) as theconfiguration Cr (f ) of points in C for f real-valued and inC \ 0 for f angle valued.
One collects the sets Bc,or (f ) and Bo,c
r (f ) as theconfiguration cr (f ) of points in C \∆},∆ := {z ∈ C | <z = =z} for f real valued and inC \ (S1 t 0) for f angle valued.
D.Burghelea De la "Data Analysis" la "Topologie
X
X
X
X
X
X
X
X
X
X
Y
x
x
x
x
x
x
x
Y
The bar code with ends a, b, a≤b and closed at a is represented
as a point a+ ib while the bar code with ends a, b, a < b open at
a is represented as a point b + ia.
D.Burghelea De la "Data Analysis" la "Topologie
for f angle-valued replace [a,b] and [a,b)by z = eia+(b−a) and (a,b) and (a,b] by z = eib+(a−b) and obtain
To the configuration Cr (f ) resp. cr (f ) one associates thepolynomial P f
r (z) resp. pfr (z) given by the product∏
(z − zi)
D.Burghelea De la "Data Analysis" la "Topologie
Main results
TheoremBetti numbers can be calculated from the degree of thepolynomial P f
r (z) and the cardinality of J ′r (f )s. (see [?]) .
Theorem
The assignment f P fr (z and f pf
r (z) from the space ofmaps to the space of polynomials (with the appropriatetopology) is continuous
D.Burghelea De la "Data Analysis" la "Topologie
TheoremIf X = Mn a closed topological manifold then
PC,fr (z) = PC,f
n−r (τ(z))
andpC,f
r (z) = pC,fn−r−1(τ(z))
where for f real valued τz = iz and and for f angle valuedτ(z) = 1/z.
D.Burghelea De la "Data Analysis" la "Topologie
Algorithms
A1. From Point Cloud data to incidence matrix of a simplicialcomplex + the simplicial map
A2. From the incidence matrix and the simplicial map f tobarcodes and Jordan cells
D.Burghelea De la "Data Analysis" la "Topologie
Dan Burghelea, New topological invariants for real- andangle-valurd maps; an alternative to Morse-Novikov theoryWord scientific Publishing Co. Pte. Ltd, 2017
G. Carlsson, Topology and Data Bull. Amer.Math.Soc, 46pp255-3086.
D.Burghelea De la "Data Analysis" la "Topologie