creating streams with datasift

18
Creating Streams with DataSift

Upload: datasift

Post on 13-May-2015

3.646 views

Category:

Technology


3 download

DESCRIPTION

This slide deck runs through how to create DataSift Streams and the FSDL.

TRANSCRIPT

Page 1: Creating streams with DataSift

Creating Streams with DataSift

Page 2: Creating streams with DataSift

Creating a Stream: Workflow

Stream Specification

Stream Definition

Filtered Data

Page 3: Creating streams with DataSift

Creating a Stream: Specification

What do you want the elements to contain?

What sources do you want the data to come from?

What is your budget for data acquisition? Who is this data for?

Work out what you want your stream to do

Page 4: Creating streams with DataSift

Creating a Stream: Definition

Create Stream in DataSift

Create FSDL Definition

Verify with live data

Write a Stream Definition that executes your specification

Page 5: Creating streams with DataSift

Creating a Stream: Filtered DataRetrieve the data that is filtered by your stream

JSON API HTTP Streaming

WebSockets Streaming RSS

Page 6: Creating streams with DataSift

Creating a Stream in DataSift 1. Select the Create Stream button on any page on DataSift

Page 7: Creating streams with DataSift

Creating a Stream in DataSift2. Fill in the title, description, and tags for your Stream

The Title and Description will be shown next to your StreamThe Tags will be used for search and categorisation of your Stream

Enabling the Private checkbox will make your Stream visible only to you

Page 8: Creating streams with DataSift

Creating a Stream in DataSift3. Create your first stream definition

This is the Stream EditorThere is a default stream definition already inserted for you

Why not try changing “hello world” to a different value?e.g. interaction.content contains “cat”

Page 9: Creating streams with DataSift

Creating a Stream in DataSift4. Hit the Save button

Your Stream is now savedYou can use the breadcrumbs to go back to see a live preview of the results

Page 10: Creating streams with DataSift

FSDL: Filtered Stream Definition Language

FSDL is the language used to write Stream Definitions for DataSift

The language takes the following basic format:

<term> <logical operator> <term> <logical operator>

There must be a minimum of 1 term in a definition.

All terms must be separated by logical operators.

A logical operator is either “and” or “or”.

Page 11: Creating streams with DataSift

FSDL: Nested RuleOn the previous slide, we had this definition outline:

<term> <logical operator> <term> <logical operator>

The term can be either one of a “nested rule” or a “predicate”.

A nested rule is a method of including the result of another stream within the logic of this one.

The syntax for a nested rule is:

rule “<stream identifier>”

Where the stream identifier is a 32-character alphanumeric string obtainable from the stream you wish to include’s page on DataSift, or through the API.

Page 12: Creating streams with DataSift

FSDL: Nested Rule ExampleThis is an example of a simple FSDL definition:

interaction.content contains “justin bieber”

The Stream Identifier for this definition is 4e8e6772337d0b993391ee6417171b79. The stream will contain all content which contains “justin bieber” in its content.

We can create another rule to filter this down further, using the nested rule syntax:

rule “4e8e6772337d0b993391ee6417171b79” and language.tag == “en”

This performs the same filtering as the first stream, with the addition of only including content determined to be in English using the language.tag == “en” predicate.

In this case, the logical operator separating the two terms is “and”.

Page 13: Creating streams with DataSift

FSDL: PredicatesPredicates are formed of 3 items, a target, operator and argument, in the following format:

<target> <operator> <argument>

In the previous example, we saw this predicate used to filter the results of another rule:

language.tag == “en”

In this example, the target is “language.tag”; the operator is “==“ (equals); and the argument is “en”.

There is a long list of targets, operators, and the arguments they require on the DataSift Support Documentation.

Page 14: Creating streams with DataSift

FSDL: Example PredicatesThe following are some examples of some simple predicates:

interaction.content contains “#rdgtweetup”

twitter.user.friends_count >= 1000

interaction.content contains_word “net”

interaction.geo exists

author.username in "dtsn,nickhalstead,chris_alexander,datasift"

Page 15: Creating streams with DataSift

FSDL: Example DefinitionsHere are examples of more complex definitions composed of multiple terms:

(interaction.content contains "Justin Bieber« OR interaction.content contains "Justin Beiber")

(interaction.content contains "Nokia"OR interaction.content contains "Motorola"OR interaction.content contains "Palm")AND interaction.content contains "phone“

interaction.content contains "#rdgfestival"OR interaction.content contains "#readingfestival"

OR rule "4315e367618830de6224c479f35db4ca"

Page 16: Creating streams with DataSift

API CallsAPI calls are available to perform most of the DataSift functionality.

Stream

Get Create Update Duplicate Rate Delete List

Comments

Get Create Flag

All of these API calls are available through a semi-RESTful interface, in a similar way to the Twitter API.

Data formats supported include JSON, JSONP, XML and PHP (serialized).

Each call is fully documented on the DataSift Support site.

Page 17: Creating streams with DataSift

Retrieving Stream DataOnce you have configured your stream with a definition and verified it is correct, you can connect to your stream through a number of methods:

JSON API

HTTP Stream

WebSockets Stream

RSS

The JSON API is simple and similar to how you would access Twitter Search.

The HTTP Stream is similar to the Twitter firehose, giving a constant stream of data through a single connection. WebSockets is similar to this but meant for client-side connections through supported web browsers.

RSS is also available, recommended for lower volume feeds only.

All services are fully documented on the DataSift Support site.

Page 18: Creating streams with DataSift

Questions

You can get more help, support, examples and user content on the DataSift Support website:

http://support.datasift.net

You can also ask us on Twitter:@datasift