ufsql: unified file system query language

28
UFSQL: Unified File System Query Language Yan Li , Yunfei Chen < [email protected] >, <[email protected]> Mar 8, 2012 Jack Baskin School of Engineering University of California, Santa Cruz 1 Thursday, March 8, 12

Upload: others

Post on 02-May-2022

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UFSQL: Unified File System Query Language

UFSQL:Unified File SystemQuery LanguageYan Li , Yunfei Chen<[email protected]>, <[email protected]>

Mar 8, 2012Jack Baskin School of EngineeringUniversity of California, Santa Cruz

1Thursday, March 8, 12

Page 2: UFSQL: Unified File System Query Language

Why file system query is important?

2Thursday, March 8, 12

Page 3: UFSQL: Unified File System Query Language

Query! Don’t Browse!

Exploding amount of information

Complex and plethora metadata

Fading human memory

Finding the right information is hard

3Thursday, March 8, 12

Page 4: UFSQL: Unified File System Query Language

Anatomizing FS Query

4Thursday, March 8, 12

Page 5: UFSQL: Unified File System Query Language

3 Classes of File System Query

Class 1: Free text query

Class 2: Structured metadata query

Class 3: Provenance query

5Thursday, March 8, 12

Page 6: UFSQL: Unified File System Query Language

Class 1: Free Text

Files with “California” in contents

Files with “tax” in file name

Files with “usr” in path name

Files with “rms” in uid

6Thursday, March 8, 12

Page 7: UFSQL: Unified File System Query Language

Class 2: Structured Metadata

Modern file systems can have complex hierarchical metadata

EXIFExifVersion, Manufacturer, Model, Orientation, Date and Time, Height, Width, ...

MakerNoteMacroMode, Quality, Metering Mode, ...

GeolocationLongitude, Latitude, Altitude, Hemisphere, ...

{7Thursday, March 8, 12

Page 8: UFSQL: Unified File System Query Language

Class 3: Provenance Graph

party.jpg

Raw Photo Converter

(Windows 7) Photoshop(Mac OS X)

ExifTool(Fedora 16)

8Thursday, March 8, 12

Page 9: UFSQL: Unified File System Query Language

Why we need a new query language?

9Thursday, March 8, 12

Page 10: UFSQL: Unified File System Query Language

Today’s Database and Structured Data

Query

Today’s File SystemQuery

10Thursday, March 8, 12

Page 11: UFSQL: Unified File System Query Language

Designing a new FS Query Language

Unified File System Query Language: UFSQL

Using Functional Programming model (like XQuery)

Programmable

Extensible

Supports 3 classes of queries

11Thursday, March 8, 12

Page 12: UFSQL: Unified File System Query Language

A Complex Sample

We are running a photo studio.We have a photo gallery.How many pictures each photographer took and processed in the last month?

12Thursday, March 8, 12

Page 13: UFSQL: Unified File System Query Language

How to Get File Author?

Different camera / software produces different metadata:

Canon: MakerNote:Author

Nikon: EXIF:ImageComment

Photoshop: XMP:Author

13Thursday, March 8, 12

Page 14: UFSQL: Unified File System Query Language

Solution 1

AuthorField = for PhotoStorage.metadata where field.name in [“MakerNote:Author”, “EXIF:ImageComment”, “XMP:Author”] return field

14Thursday, March 8, 12

Page 15: UFSQL: Unified File System Query Language

Solution 2

GetAuthor file = for file.metadata join fieldNames let fieldNames = [“MakerNote:Author”, “EXIF:ImageComment”, “XMP:Author”] authorList = file.metadata where authorList.value <> NIL return authorList

15Thursday, March 8, 12

Page 16: UFSQL: Unified File System Query Language

How many pictures each photographer took and processed in the last month?

AuthorField = (...)

for PhotoStorage.* as filewhere Now() - file.modifiedTime < DateRange(0,1,0) and file.prov.* contains “photoshop”group by GetAuthor(file)return GetAuthor(file) as Author, Count(*) as PhotoCount

16Thursday, March 8, 12

Page 17: UFSQL: Unified File System Query Language

How many pictures each photographer took and processed in the last month?

AuthorField = (...)

for PhotoStorage.* as filewhere Now() - file.modifiedTime < DateRange(0,1,0) and file.prov.* contains “photoshop”group by GetAuthor(file)return GetAuthor(file) as Author, Count(*) as PhotoCount

Functional Programming Model

16Thursday, March 8, 12

Page 18: UFSQL: Unified File System Query Language

How many pictures each photographer took and processed in the last month?

AuthorField = (...)

for PhotoStorage.* as filewhere Now() - file.modifiedTime < DateRange(0,1,0) and file.prov.* contains “photoshop”group by GetAuthor(file)return GetAuthor(file) as Author, Count(*) as PhotoCount

Functional Programming Model

Dynamic Graph Scanning

16Thursday, March 8, 12

Page 19: UFSQL: Unified File System Query Language

How many pictures each photographer took and processed in the last month?

AuthorField = (...)

for PhotoStorage.* as filewhere Now() - file.modifiedTime < DateRange(0,1,0) and file.prov.* contains “photoshop”group by GetAuthor(file)return GetAuthor(file) as Author, Count(*) as PhotoCount

Functional Programming Model

Dynamic Graph Scanning

Returns complex data

16Thursday, March 8, 12

Page 20: UFSQL: Unified File System Query Language

Basic Syntax of UFSQL

17Thursday, March 8, 12

Page 21: UFSQL: Unified File System Query Language

Basic Syntax: FLOWR

QUERY FUNCTION =

for <FILE_SET>

let <VAR_BIND>

where <EXPRESSIONS>

order by <EXPRESSIONS>

return <VARS>

18Thursday, March 8, 12

Page 22: UFSQL: Unified File System Query Language

import Graph.Face

for PhotoStorage.* as filewhere faceList(file) contains “Joe”group by faceList(file)return file.name

Modules

19Thursday, March 8, 12

Page 23: UFSQL: Unified File System Query Language

Our Innovation

First functional query language for file systems

First query language that combines 3 classes of queries: free text, structured data and provenance graph query

Extensible by using modules

In-function SQL-like dataset join

20Thursday, March 8, 12

Page 24: UFSQL: Unified File System Query Language

Future Research

More strong provenance query support (need more feedback from the end-user)

A test prototype engine

A natural language layer for ease of use

21Thursday, March 8, 12

Page 25: UFSQL: Unified File System Query Language

Related Research

22Thursday, March 8, 12

Page 26: UFSQL: Unified File System Query Language

Old FS Query Language

Primitive tools: ls, find (1980s)

Spotlight / Windows Search (2005)

XQuery (2007)

QUASAR: replacing file system interfaces with a query language (2008)

Harvard PQL: Path Query Language (2008)

23Thursday, March 8, 12

Page 27: UFSQL: Unified File System Query Language

Thank you!

Acknowledgment

We would like to thank the faculty and students of the Storage SystemsResearch Center (SSRC), Center for Research in Intelligent Storage(CRIS), and IRKM lab for their help and guidance. This research was supported inpart by the National Science Foundation, and by the industrial sponsors of the SSRCand CRIS, including EMC, Hewlett Packard Laboratories, IBM Research,NetApp, Northrop Grumman, and Samsung.

24Thursday, March 8, 12

Page 28: UFSQL: Unified File System Query Language

Copyright notice

The red sports car is from Peugeot, the tractor is from BC

Trademark Acknowledgment:All other products or services mentioned in this document are identified by the trademarks, service marks, or product names as designated by the companies who market these products. Inquiries concerning such trademarks should be made directly to those companies.

25Thursday, March 8, 12