higher methods in data science and ml on the encoding of ... · 6th smartdata@polito workshop,...
TRANSCRIPT
![Page 1: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/1.jpg)
On the encoding of large, high-dimensional and unorganized datasets
Ulderico Fugacci
6th SmartData@PoliTo Workshop, Castello del Valentino
30 January 2020
Higher Methods in Data Science and ML
![Page 2: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/2.jpg)
TopologicalInformationDataset
Motivation
Topological Data Analysis (TDA) and Persistent Homology (PH) allow for extracting the core topological information from large, high-dimensional and unorganized datasets.
E.g. point clouds, complex networks, (semi-)metric spaces
![Page 3: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/3.jpg)
TopologicalInformationDataset
Motivation
Topological Data Analysis (TDA) and Persistent Homology (PH) allow for extracting the core topological information from large, high-dimensional and unorganized datasets.
E.g. point clouds, complex networks, (semi-)metric spaces
Filtered Simplicial Complex
![Page 4: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/4.jpg)
MotivationSimplicial Complex:
A family K of subsets (called simplices) of a finite set V closed under the operation of taking subsets
0-, 1-, 2-, 3-simplices
![Page 5: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/5.jpg)
V a point cloud with |V |≥ 30
Filtered complex K associated with V consists of more than a billion of simplices
Issue:
MotivationSimplicial Complex:
A family K of subsets (called simplices) of a finite set V closed under the operation of taking subsets
0-, 1-, 2-, 3-simplices
![Page 6: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/6.jpg)
Solution:
Development of compact and efficient data structures
for encoding simplicial complexes
MotivationSimplicial Complex:
A family K of subsets (called simplices) of a finite set V closed under the operation of taking subsets
0-, 1-, 2-, 3-simplices
![Page 7: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/7.jpg)
Motivation
• Which info to be stored?
• Data structures:• Simplex-based representations• Top-based representations• Operator-driven representations
• Issues and solutions in adopting top-based representations
• Comparisons
Outline:
![Page 8: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/8.jpg)
Motivation
• Data structures for specific classes of complexes• E.g., manifolds or complexes of low dimension
• Hierarchical and multi-resolution models
• Construction of a simplicial complex from a dataset
Out of Scope:
![Page 9: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/9.jpg)
Data Structures for Simplicial Complexes
Data structure:
• its simplices
K = K0 ∪ K1 ∪ … ∪ Kd
where Ki is the collection of the i-simplices of K
• the topological relations Ri,j ⊆ Ki × Kj
between the simplices of K encoding the (co-)boundary of each simplex
The entities which a simplicial complex consists of are:
4
31
2
![Page 10: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/10.jpg)
Data Structures for Simplicial Complexes
Data structure:
• its simplices
K = K0 ∪ K1 ∪ … ∪ Kd
where Ki is the collection of the i-simplices of K
• the topological relations Ri,j ⊆ Ki × Kj
between the simplices of K encoding the (co-)boundary of each simplex
The entities which a simplicial complex consists of are:
A data structure for K has to explicitly store a portion of the above information and to (efficiently) retrieve the remaining part
4
31
2
![Page 11: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/11.jpg)
Data Structures for Simplicial Complexes
Topological Relations:
Given a i-simplex 𝜎 and a j-simplex 𝜏 of K,
(𝜎, 𝜏) ∈ Ri,j
𝜎 ⊆ 𝜏
|𝜎 ∩ 𝜏|= i (equivalently, 𝜎 ∩ 𝜏 ∈ Ki-1)
𝜏 ⊆ 𝜎
for i < j
for i = j
for i > j
(12, 123) ∈ R1,2 (12, 24) ∈ R1,1 (12, 1) ∈ R1,0
4
31
2Example:
![Page 12: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/12.jpg)
Data Structures for Simplicial Complexes
Topological Relations:
Given a i-simplex 𝜎 and a j-simplex 𝜏 of K,
(𝜎, 𝜏) ∈ Ri,j
𝜎 ⊆ 𝜏
|𝜎 ∩ 𝜏|= i (equivalently, 𝜎 ∩ 𝜏 ∈ Ki-1)
𝜏 ⊆ 𝜎
for i < j
for i = j
for i > j
An i-simplex 𝜎 is called a top simplex of K if Ri,i+1 is empty
(12, 123) ∈ R1,2 (12, 24) ∈ R1,1 (12, 1) ∈ R1,0
4
31
2Example:
![Page 13: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/13.jpg)
Data Structures for Simplicial Complexes
Compactness
Effic
ienc
y
Store all the entities
Store only the top-simplices
๏ Simplex-based representations
๏ Top-based representations
๏ Operator-driven representations
Data structures:
![Page 14: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/14.jpg)
Data Structures for Simplicial Complexes
Compactness
Effic
ienc
y
Store all the entities Incidence
Graph
Store only the top-simplices
๏ Simplex-based representations
๏ Top-based representations
๏ Operator-driven representations
Data structures:
![Page 15: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/15.jpg)
Data Structures for Simplicial Complexes
Compactness
Effic
ienc
y
Store all the entities Incidence
Graph Simplex Tree
Store only the top-simplices
๏ Simplex-based representations
๏ Top-based representations
๏ Operator-driven representations
Data structures:
![Page 16: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/16.jpg)
Data Structures for Simplicial Complexes
Compactness
Effic
ienc
y
Store all the entities Incidence
Graph Simplex Tree
Store only the top-simplices
IA* Data Structure
๏ Simplex-based representations
๏ Top-based representations
๏ Operator-driven representations
Data structures:
![Page 17: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/17.jpg)
Data Structures for Simplicial Complexes
Compactness
Effic
ienc
y
Store all the entities Incidence
Graph Simplex Tree
Store only the top-simplices
IA* Data Structure
๏ Simplex-based representations
๏ Top-based representations
๏ Operator-driven representations
Stellar Tree
Data structures:
![Page 18: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/18.jpg)
Data Structures for Simplicial Complexes
Compactness
Effic
ienc
y
Store all the entities Incidence
Graph Simplex Tree
Store only the top-simplices
Skeleton Blocker
IA* Data Structure
๏ Simplex-based representations
๏ Top-based representations
๏ Operator-driven representations
Stellar Tree
Data structures:
![Page 19: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/19.jpg)
Simplex-based RepresentationsIncidence Graph:
4
31
2123
12 13 23 24 34
1 2 3 4
(𝜎, 𝜏) ∈ A ⟷ (𝜎, 𝜏) ∈ Ri,i+1
The simplicial complex K is encoded via a directed graph G = (N, A):
N ⟷ K
• All the relations between simplices can be immediately retrieved
• The representation size exponentially increases with the complex dimension
![Page 20: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/20.jpg)
Simplex Tree:4
31
2123
12 13 23 24 34
1 2 3 4
(𝜎, 𝜏) ∈ A ⟷ (𝜎, 𝜏) ∈ Ri,i+1 and I(𝜎) < I(𝜏)
The simplicial complex K is encoded via a directed graph G = (N, A):
N ⟷ K
• Graph is not uniquely determined but it depends on the chosen vertex order
where I(𝜎) denotes the maximum value taken by the vertices of 𝜎 w.r.t. a total order on K0
Simplex-based Representations
![Page 21: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/21.jpg)
Simplex Tree:4
3
1
2
(𝜎, 𝜏) ∈ A ⟷ (𝜎, 𝜏) ∈ Ri,i+1 and I(𝜎) < I(𝜏)
The simplicial complex K is encoded via a directed graph G = (N, A):
N ⟷ K
• Graph is not uniquely determined but it depends on the chosen vertex order
where I(𝜎) denotes the maximum value taken by the vertices of 𝜎 w.r.t. a total order on K0
123
12 13 14 23 34
1 2 3 4
Simplex-based Representations
![Page 22: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/22.jpg)
Top-based RepresentationsIA* Data Structure: 4
31
2
The simplicial complex K is encoded via a directed graph G = (N, A):
N ⟷ K0 ∪ Ktop
123 24 34
1 2 3 4
(𝜎, 𝜏) ∈ A ⟷ 𝜎, 𝜏 ∈ Ktop and (𝜎, 𝜏) ∈ Ri,i
𝜎 ∈ Ktop and (𝜎, 𝜏) ∈ Ri,0
𝜏 ∈ Ktop and (𝜎, 𝜏) ∈ (R0,i)*
• Compact: it explicitly stores just a fraction of the entities of a simplicial complex
• Not all the relations between simplices are immediately available
![Page 23: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/23.jpg)
Skeleton Blocker:4
31
2
The simplicial complex K is encoded by storing its 1-skeleton (i.e. the graph consisting of the 0- and the 1-simplices) and a map returning, for each 1-simplex 𝜎, the blockers of K containing 𝜎, where:
• Designed for flag complexes (e.g. VR complexes) and edge contraction
• Too specific: inefficient in any other task
Operator-driven Representations
4
31
2{ 234 }
a simplex 𝜏 is a blocker if 𝜏 does not belong to K but all its faces do
![Page 24: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/24.jpg)
Possible Issues in Top-based Representations
But, how to…
Top-based representations look like promising data structures for encoding a simplicial complex K
1. Store information associated to each simplex of K (e.g. labels, gradient, …)?
2. Efficiently perform operators having explicitly stored a fraction of the entities of K?
![Page 25: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/25.jpg)
Possible Issues in Top-based Representations
But, how to…
Top-based representations look like promising data structures for encoding a simplicial complex K
1. Store information associated to each simplex of K (e.g. labels, gradient, …)?
2. Efficiently perform operators having explicitly stored a fraction of the entities of K?
Attach information to the top simplices only
![Page 26: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/26.jpg)
Possible Issues in Top-based Representations
But, how to…
Top-based representations look like promising data structures for encoding a simplicial complex K
1. Store information associated to each simplex of K (e.g. labels, gradient, …)?
2. Efficiently perform operators having explicitly stored a fraction of the entities of K?
Attach information to the top simplices only
Re-define the algorithms performing the operators trying to extract the lowest possible amount of non-explicitly stored entities
![Page 27: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/27.jpg)
ComparisonsTop-based vs Simplex-based
![Page 28: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/28.jpg)
Comparisons
Top-based vs Simplex-based
![Page 29: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/29.jpg)
Comparisons
Top-based vs Operator-driven
![Page 30: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/30.jpg)
In Summary
We have briefly overviewed the most common data structures proposed in the literature
• Express the most frequently adopted operators in terms of top simplices
• Face the bottleneck concerning the construction of a simplicial complex from a dataset (maybe proposing an approximated construction?)
• Investigate with your help the connections between simplicial complexes (and the data structures to store them) and itemsets in association rule learning
Future Directions:
![Page 31: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/31.jpg)
Thank you!
![Page 32: Higher Methods in Data Science and ML On the encoding of ... · 6th SmartData@PoliTo Workshop, Castello del Valentino 30 January 2020 Higher Methods in Data Science and ML. Topological](https://reader035.vdocuments.net/reader035/viewer/2022071215/604623df286ed52b5e7ae32e/html5/thumbnails/32.jpg)
Main References:• Simplex-based Representations:
- Boissonnat, Maria. The Simplex Tree: an Efficient Data Structure for General Simplicial Complexes. In Algorithmica, 2014
• Top-based Representations:- Canino, De Floriani, Weiss. IA*: An Adjacency-Based Representation for Non-manifold Simplicial Shapes
in Arbitrary Dimensions. In Computers & Graphics, 2011- Fellegara, Weiss, De Floriani. The Stellar Tree: a Compact Representation for Simplicial Complexes and
Beyond. arXiv preprint, 2017
• Operator-driven Representations:- Attali, Lieutier, Salinas. Efficient Data Structure for Representing and Simplifying Simplicial Complexes in
High Dimensions. In International Journal of Computational Geometry & Applications, 2012
• Comparisons:- Fugacci, Iuricich, De Floriani. Computing Discrete Morse Complexes from Simplicial Complexes.
In Graphical Models, 2019- Fellegara, Iuricich, De Floriani, Fugacci. Efficient Homology‐Preserving Simplification of High‐Dimensional
Simplicial Shapes. In Computer Graphics Forum, 2019