a generic framework for engaging online data …cs.berry.edu/sinbad/iticse16-slides.pdf · for...

30
A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES IN INTRODUCTORY PROGRAMMING COURSES NADEEM ABDUL HAMID

Upload: ngodieu

Post on 05-Apr-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES IN INTRODUCTORY PROGRAMMING COURSESNADEEM ABDUL HAMID

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES

“LIVE” DEMO2

https://datahub.io/dataset/ubigeo-peru/resource/12c2cc3a-5896-496b-96f6-d95cd1618d61

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES

import core.data.*;

public class PeruData1 { public static void main(String[] args) { DataSource ds = DataSource.connect("https://.../Ubigeo2010.csv"); ds.load(); String[] names = ds.fetchStringArray("NOMBRE"); System.out.println(names.length); System.out.println(names[367]); } }

CONNECT - LOAD - FETCH

3

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES

import core.data.*;

public class PeruData1 { public static void main(String[] args) { DataSource ds = DataSource.connect("https://.../Ubigeo2010.csv"); ds.load(); ds.printUsageString(); String[] names = ds.fetchStringArray("NOMBRE"); System.out.println(names.length); System.out.println(names[367]); } }

WHAT’S IN THE DATA?4

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES

USAGE STRING5

-----Data Source: https://commondatastorage.googleapis.com/.../Ubigeo2010.csvURL: https://commondatastorage.googleapis.com/.../Ubigeo2010.csv

The following data is available: A list of: structures with fields: { CODDIST : * CODDPTO : * CODPROV : * NOMBRE : * }-----

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES

USER-DEFINED CLASS

class Geo { String name; int pop; int elev; public Geo(String name, int pop, int elev) { this.name = name; this.pop = pop; this.elev = elev; } public String toString() { return String.format("%s (pop. %d): %d m.", name, pop, elev); } }

6

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES

DataSource ds = DataSource.connectAs("TSV", "http://download.geonames.org/export/dump/PE.zip"); ds.setOption("fileentry", "PE.txt"); ds.setOption(“header", “geoid,name,asciiname,altnames,lat,long,feature-class, feature-code,cc,cc2,admin1,admin2,admin3,admin4,ppl, elev,dem,tz,mod"); ds.load(); Geo g = ds.fetch("Geo", "name", "ppl", "dem"); System.out.println(g); ArrayList<Geo> places = ds.fetchList("Geo", "name", "ppl", "dem"); System.out.println(places.size()); for (Geo p : places) if (p.name.equals("Arequipa")) System.out.println(p);

DEMO - ADDITIONAL FEATURES7

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES

OUTPUT8

Brazo Tigre (pop. 0): 0 m.102315Arequipa (pop. 1218168): 3351 m.Arequipa (pop. 0): 3164 m.Arequipa (pop. 841130): 2355 m.Arequipa (pop. 0): 106 m.Arequipa (pop. 0): 2327 m.Arequipa (pop. 0): 404 m.

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES

OUTLINE

▸ Motivation

▸ Goals

▸ Usage & Functionality

▸ Design & Implementation

▸ Related & Future Work

▸ Conclusion

10

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES

MOTIVATION

▸ The “Age of Big Data”

▸ Incorporate the use of online data sets in introductory programming courses

▸ Provide a simple interface

▸ Hide I/O connection, parsing, extracting, data binding

11

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES

GOALS

▸ Minimal syntactic overhead

▸ Direct access via URL (or local file path)

▸ No requirement of pre-supplied data schemas/templates

▸ Bind (instantiate) data objects based on user-defined data representations (i.e. student-defined classes)

▸ Other good stuff

▸ Caching

▸ Help/usage

▸ Error handling/reporting

ArrayList<Geo> places = ds.fetchList(“Geo”, ...

12

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES

USAGE

▸ 3-step approach: • Connect • Load • Fetch

▸ Infer data format if possible — XML, CSV, JSON

▸ Display inferred structure of data — printUsageString()

▸ Fetching atomic values

▸ provide a path into the data

▸ Structured data:

▸ provide name of class and paths of data to be supplied to the constructor

▸ Collections: fetchStringArray / fetchArray / fetchList / …

ds.fetch("Geo", “info/name/std”, “metrics/pop", “phys/elev”);

13

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES

OTHER FUNCTIONALITY

▸ Data source specifications

▸ Query parameters

▸ Iterator-based access

▸ Cache control

▸ Processing support

14

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES

DESIGN & IMPLEMENTATION

▸ Connect

▸ prepare URL/path; set parameters, options, data type

▸ Load:

▸ get the data

▸ infer a schema

▸ Fetch:

▸ build a signature for type requested by user

▸ unify schema with signature - instantiate as objects

fetch&

load&

instan.ate&

data$sources$

code$

field$schema$

signature$object(s)$

1&

2&

3&

15

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES

EXPERIENCE

▸ Limited to date: “Creative Computing”

▸ Tutorial-style labs

▸ Sample data sets used/discovered by students:

Name Source Type Records(Asterisk indicates data set discovered by students)*1000 songs to hear before you die opendata.socrata.com XML 1,000Abalone data set UCI Machine Learning Repository CSV 4,177*Airport Weather Mashup NWS + FAA XML fixed*Chicago life expectancy by community data.cityofchicago.org XML ˜80Earthquake feeds US Geological Survey JSON variable*Fuel economy data US EPA XML 35,430*Jeopardy! question archive reddit JSON 216,930Live auction data Ebay XML 100/pageMagic the Gathering card data mtgjson.com JSON variableMicrofinance loan data Kiva XML variable*SEC Rushing Leaders 2014 ESPN CSV (manual) variable

Figure 5. Examples of data sources used in lecture examples and student final projects.

One issue that arose in some instances was finding a link tothe raw data provided by a service. The primary URL for manyweb services often provides an interactive interface or visualizationof the data. It takes some digging around to find a direct URL tothe underlying XML data or a JSON rendering. Related to this,some students had initial difficulty understanding the distinctionbetween data rendered in a human-friendly form (such as is visiblewhen a web page is viewed in a browser), and the raw data in aformat amenable to machine-processing by their program. Whenthey found a web page that displayed a set of data in tabular form,they would attempt to use that page’s URL to connect to the datasource.

Another concern that did not seem to be a major problem, butmay have limited the choice of data sources, was that some sitesrequire “developer” registration and an API key to be obtained inorder to access data. Some students were hesitant to register forsuch. Also, this required various query parameters to be set on theDataSource object before it is loaded. Determining the necessaryparameters involved reading into the developer documentation of aparticular web service, an experience which varies in pleasantnessfrom site to site.

Overall, insofar as operation of the library itself was concerned,things seemed to go fairly smoothly. As noted earlier, introducingthe library earlier on in the course, and integrating it more thor-oughly with the course content would make the students more com-fortable with it and allow them a better sense of how to access datafor the purpose of analysis or visualization. And, of course, moreextensive field studies, using the library in other courses and insti-tutions, are necessary to gauge its true usefulness.

5.2 Technical ChallengesBased on the limited testing and use so far, our current implementa-tion appears to meet many of the goals laid out in Section 2. How-ever, there are a few areas where improvement is clearly needed.One is to provide a more robust approach to error handling and re-porting. Currently, exceptions are raised when things do not matchup as expected at various stages of loading and instantiating databut the error messages can be improved. This could be achievedpartly by providing a high-level description of the source of a prob-lem, rather than just a line number in a file, to help a novice pro-grammer understand what happened.

As alluded to in the previous section, one complication in ac-cessing data sources is providing a user-specific API key or accesstoken. While this can be achieved using the existing set methods,it can be frustrating for students to figure out how to specify thenecessary parameters. Also, rather than having user API tokens ex-posed directly in source code of a program, which might be sharedwith others, it would be better to have private user data stored in

a separate file. We need to investigate a mechanism for doing thisin a simple manner, combined with a curated set of data sourcespecifications for common APIs (Ebay, Twitter, etc.) that requirespecial parameters, to make it easier for students to access these.Also, a GUI tool for managing data source-related parameters, asmentioned in the next section might be very helpful.

In terms of performance, the library handles many small tomedium-size data sets well. However, the current implementationloads the entire data set into memory. Along with the adoption ofXML as a common intermediate format, this results in significantdelays when loading large data sets (thousands of records). Evenwith caching, there is at least a two-fold delay loading a CSV file,for example. First the data must be loaded and converted to XML;then the entire converted data is scanned again to infer its schema.Depending on the fetch operation that is then executed, there maybe further delay in processing. Some ways to alleviate this wouldbe to cache the data after it has been converted to XML, and tocache the inferred schema as well as the data itself. Even withmore intelligent caching, however, XML is not the most efficientrepresentation of data and in many cases it would be much moreefficient (time and space) to access CSV data, for example, using adata structure other than an XML object.

Another approach to improve performance for large data setsis to abandon XML as the common intermediate format and adoptmore abstract mechanisms to encapsulate arbitrary data format ob-jects and allow them to be traversed using a uniform interface. Thisis an interesting software architecture challenge that is currentlyunder investigation and would require a major reorganization of theinternals of the library. A complementary feature would be to alsoprovide methods for streaming or paginating data. This remains tobe explored.

5.3 Additional Future DirectionsIn addition to addressing the more immediate technical challengesof the previous section, there are some other ideas for future de-velopment that would improve the library’s usefulness. One is theability to invoke a GUI interface to configure various settings forthe library and parameters for a particular data source. Upon load-ing a data source, it could also allow the user to view usage info(as an alternative to printUsageInfo), preview the available data,provide help for invoking methods of the library, and perhaps evengenerate code snippets.

As mentioned in Section 4, the fetch() methods currentlyonly support the specification of flat signatures. It is possible todirectly construct and provide a type signature object to the librarythat involves, for example, lists of structures with nested lists orstructures. Providing a simplified way to supply such signatures via

5 2015/6/29

16

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES

ISSUES

▸ Finding proper links to raw data (students can have trouble)

▸ Sites requiring “developer” registration

▸ Error messages not helpful (yet)

▸ XML as common intermediate format

▸ Better caching (of schemas as well as raw data)

▸ Streaming, pagination, sampling…

17

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES

FUTURE▸ Redo abstraction layer over data formats

▸ GUI tools

▸ Multiple language support (Python, Racket)

▸ Different language mechanisms to achieve dynamic binding (reflection, macros)

▸ Additional data formats

▸ HTML tables, web scrapers (regexps)

▸ Customized for popular APIs (ebay, twitter, etc.)

‣ Curriculum resources

▸ Evaluation of effectiveness

18

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES

RELATED WORK & ACKNOWLEDGEMENTS

▸ CORGIS Dataset Project - http://think.cs.vt.edu/corgis/

▸ XML Data Access Interfaces

▸ JAXB, Castor: schema-based; compile-time setup required

▸ FasterXML (Jackson): dynamic binding to POJOs; emphasis on Java → XML direction; tight coupling

▸ XML schema inference

▸ Contributions by Steven Benzel, Stephen Jones, Alan Young

19

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES

CONCLUSION

▸ Facilitate incorporation of online data sources into programming assignments

▸ Painlessly

▸ Seamlessly

20

cs.berry.edu/sinbad

Use a data set in your next assignment!

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES

DATA SOURCE SPECIFICATION FILE‣ Data source URL and format. ‣ Human-friendly name and description, along

with URL to a project or informational page about the data source.

‣ A specification of pre-supplied and user-supplied (required and optional) query parameters or path parameters. The latter are user-provided strings that are substituted in for placeholders in the URL path.

‣ Programmatic options specific to the particular data source object (such as a header for CSV files).

‣ Cache settings, such as cache directory path or timeout.

‣ A data schema defining the exposed data structures and fields from the source with various helpful annotations such as textual descriptions of fields that can be displayed by printUsageString().

{ "name": "Geographical Data - Peru", "format": "TSV", "path": "http://download.geonames.org/export/dump/PE.zip", "infourl": "http://www.geonames.org/",

"options": [ { "name": "fileentry", "value": "PE.txt" }, { "name": "header", "value": "geoid,name,asciiname,altnames,lat,long,feature-class,feature-code,cc,cc2,admin1,admin2,admin3,admin4,pop,elev,dem,tz,mod" }], }

DataSource.connectUsing("geospec-pe.spec");

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES

SCHEMAS & SIGNATURES

▸ Primitive, List, or Structure

Schemas (�) are either primitive, lists, or compound with fields: Signatures (⌧) are a

tagged primitive, list, or class constructor with fields:

(base type) B := boolean | int | String | . . .

(constructor) C := any Java class name

(schema) � := ⇤ | [p�] | {f0p0 : �0, . . .}

(signature) ⌧ := ⌧B | [⌧ ] | C{f0:⌧0,...}

(conversion) h := parseB(�) | h(�[i]) | h(�.p) | new C(h0, . . .) | new list[h0, . . .]

� k ⌧ ) hmeans schema � unifies with signature ⌧ to produce a conversion expression h.

prim-prim

⇤ k ⌧B ) parseB(�)

prim-singleton-comp

⇤ k ⌧ ) h

⇤ k C{f :⌧} ) new C(h(�))

list-list

� k ⌧ ) h

[�] k [⌧ ] ) new list([h(�0), . . .])

list-strip

� k ⌧ ) h

[�] k ⌧ ) h(�0)

wrap-list

� k ⌧ ) h � is not a list schema

� k [⌧ ] ) new list([h(�)])

comp-strip

� k ⌧ ) h

{fp : �} k ⌧ ) h(�.p)

comp-comp

�i k ⌧i ) hi

{f0p0 : �0, . . . , fnpn: �n, g0g0 : �n+1, . . .} k C{f0:⌧0,...,fn:⌧n} ) new C(h0(�.p0), . . .)

list-comp-weird

� k ⌧i ) hi

[�] k C{f0:⌧0,...} ) new C(h0(�[0]), . . .)

list-comp-weird should only be applied after list-strip fails.

1

The following data is available:   A structure with fields:   {     row : A list of:             A structure with fields:             {               Address_1 : *               Electricity_Use_-_Grid_Purchase_kWh : *               Energy_Cost_ : *               ...               Natural_Gas_Use_therms : *               Property_GFA_-_Self-Reported_ft : *               Property_Id : *               Property_Name : *               ...               Weather_Normalized_Site_EUI_kBtu-ft : *               Year_Ending : *             }

ds.fetch("Prop", "row/Property_Name", "row/Year_Ending", "row/Energy_Cost_");

25

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES

UNIFICATION

Schemas (�) are either primitive, lists, or compound with fields: Signatures (⌧) are a

tagged primitive, list, or class constructor with fields:

(base type) B := boolean | int | String | . . .

(constructor) C := any Java class name

(schema) � := ⇤ | [p�] | {f0p0 : �0, . . .}

(signature) ⌧ := ⌧B | [⌧ ] | C{f0:⌧0,...}

(conversion) h := parseB(�) | h(�[i]) | h(�.p) | new C(h0, . . .) | new list[h0, . . .]

� k ⌧ ) hmeans schema � unifies with signature ⌧ to produce a conversion expression h.

prim-prim

⇤ k ⌧B ) parseB(�)

prim-singleton-comp

⇤ k ⌧ ) h

⇤ k C{f :⌧} ) new C(h(�))

list-list

� k ⌧ ) h

[�] k [⌧ ] ) new list([h(�0), . . .])

list-strip

� k ⌧ ) h

[�] k ⌧ ) h(�0)

wrap-list

� k ⌧ ) h � is not a list schema

� k [⌧ ] ) new list([h(�)])

comp-strip

� k ⌧ ) h

{fp : �} k ⌧ ) h(�.p)

comp-comp

�i k ⌧i ) hi

{f0p0 : �0, . . . , fnpn: �n, g0g0 : �n+1, . . .} k C{f0:⌧0,...,fn:⌧n} ) new C(h0(�.p0), . . .)

list-comp-weird

� k ⌧i ) hi

[�] k C{f0:⌧0,...} ) new C(h0(�[0]), . . .)

list-comp-weird should only be applied after list-strip fails.

1

Schemas (�) are either primitive, lists, or compound with fields: Signatures (⌧) are a

tagged primitive, list, or class constructor with fields:

(base type) B := boolean | int | String | . . .

(constructor) C := any Java class name

(schema) � := ⇤ | [p�] | {f0p0 : �0, . . .}

(signature) ⌧ := ⌧B | [⌧ ] | C{f0:⌧0,...}

(conversion) h := parseB(�) | h(�[i]) | h(�.p) | new C(h0, . . .) | new list[h0, . . .]

� k ⌧ ) hmeans schema � unifies with signature ⌧ to produce a conversion expression h.

prim-prim

⇤ k ⌧B ) parseB(�)

prim-singleton-comp

⇤ k ⌧ ) h

⇤ k C{f :⌧} ) new C(h(�))

list-list

� k ⌧ ) h

[�] k [⌧ ] ) new list([h(�0), . . .])

list-strip

� k ⌧ ) h

[�] k ⌧ ) h(�0)

wrap-list

� k ⌧ ) h � is not a list schema

� k [⌧ ] ) new list([h(�)])

comp-strip

� k ⌧ ) h

{fp : �} k ⌧ ) h(�.p)

comp-comp

�i k ⌧i ) hi

{f0p0 : �0, . . . , fnpn: �n, g0g0 : �n+1, . . .} k C{f0:⌧0,...,fn:⌧n} ) new C(h0(�.p0), . . .)

list-comp-weird

� k ⌧i ) hi

[�] k C{f0:⌧0,...} ) new C(h0(�[0]), . . .)

list-comp-weird should only be applied after list-strip fails.

1

26

BART, ET AL. FIGURE 2

Figure 2: A simple program demonstrating the Java Earthquake library

1 import java . u t i l . L i s t ;2 import java . u t i l . HashSet ;3 import realt imeweb . ea r thquake s e rv i c e . main . EarthquakeService ;4 import realt imeweb . ea r thquake s e rv i c e . domain . Earthquake ;56 public class EarthquakeDemo {78 public stat ic void main ( S t r ing [ ] a rgs ) throws EarthquakeException {9 // Use the EarthquakeServ ice l i b r a r y

10 EarthquakeService es = EarthquakeService . g e t In s tance ( ) ;1112 es . connect ( ) ; // Remove to use the l o c a l cache1314 // 5 minute de lay , but i f we use the cache no de lay i s needed !15 int DELAY = 5 ⇤ 60 ⇤ 1000 ;1617 HashSet<Earthquake> seenQuakes = new HashSet<Earthquake >() ;1819 // Po l l s e r v i c e r e g u l a r l y20 while ( true ) {21 // Get a l l ear thquakes in the pas t hour22 Lis t<Earthquake> l a t e s t = es . getEarthquakes ( His tory .ALL) ;23 // Check i f t h i s i s a new ear thquake24 for ( Earthquake e : l a t e s t ) {25 i f ( ! seenQuakes . conta in s ( e ) ) {26 // Report new ear thquakes27 System . out . p r i n t l n ( ”New quake ! ” ) ;28 seenQuakes . add ( e ) ;29 }30 }31 // Delay to avoid spamming the weather s e r v i c e32 Thread . s l e e p (DELAY) ;33 }34 }35 }

Figure 3: A simple program demonstrating the

Racket Weather library

1 ( require ”weathe r�o f f l i n e . rkt ”)23 ; s t r i n g �> s t r i n g4 ; Consumes a c i t y and re turns whether i t s5 ; curren t temperature i s ”hot ” or ”co ld ”6 (define ( report�weather c i t y )7 (cond [(< 70 ( get�temperature c i t y ) )8 ”hot ” ]9 [ else ”co ld ” ] ) )

1011 ; We use the o f f l i n e ve r s i on o f the l i b r a r y12 ; f o r c on s i s t e n t check�expects13 (check�expect ( report�weather ”Nome, AK”)14 ”co ld ”)15 (check�expect ( report�weather ”Miami , FL”)16 ”hot ”)

• API documentation and student-oriented user guidesfor each language and library.

• Alternative datasets for the internalized data cache(e.g., instead of business reviews from around “Blacks-burg, VA”, there might be another dataset for “Indi-anapolis, IN”).

• Reduced variants of the libraries for targetted assign-ments.

• Example assignments that use the library.

All resources are open-source and fully supported. Theyare being continuously refined and extended.

3.3 Prototyping ToolAn important byproduct of our project is the creation of

an online tool for rapidly prototyping new libraries. Most ofthe code used in our libraries follows the same pattern forany given language. First, requests are made to a web ser-vice and raw data is returned (typically as XML or JSON).Next, the data is parsed into some intermediary, semi-struct-ured form using dictionary and list types that are nativeto the language. Finally, the data is encoded into a read-only class or struct, depending on the disposition of the lan-

EQUIVALENTA. Earthquake Program CodeThe following is the complete source code of a program duplicatingthe behavior of EarthquakeDemo from Figure 2 of [4].import big.data.*;import java.util.Date;import java.util.HashSet;import java.util.List;

public class EarthquakeDemo {public static void main(String[] args) {

int DELAY = 5; // 5 minute cache delay

DataSource ds = DataSource.connectJSON("http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_hour.geojson");ds.setCacheTimeout(DELAY);

ds.load();ds.printUsageString();

HashSet<Earthquake> quakes = new HashSet<Earthquake>();

while (true) {ds.load(); // this only actually reloads data when the cache times outList<Earthquake> latest = ds.fetchList("Earthquake",

"features/properties/title","features/properties/time","features/properties/mag","features/properties/url");

for (Earthquake e : latest) {if (!quakes.contains(e)) {

System.out.println("New quake!... " + e.description + " (" + e.date() + ") info at: " + e.url);quakes.add(e);

}}

}}

}

class Earthquake { // this class may be instructor-provided, or left to students to define as an exerciseString description;long timestamp;float magnitude;String url;

public Earthquake(String description, long timestamp, float magnitude, String url) {this.description = description;this.timestamp = timestamp;this.magnitude = magnitude;this.url = url;

}

public Date date() {return new Date(timestamp);

}

public boolean equals(Object o) { // introductory CS students would probably implement a simpler version of thisif (o.getClass() != this.getClass())

return false;Earthquake that = (Earthquake) o;return that.description.equals(this.description)

&& that.timestamp == this.timestamp&& that.magnitude == this.magnitude;

}

public int hashCode() { // technically, hashCode() should be overridden if equals() isreturn (int) (31 * (31 * this.description.hashCode()

+ this.timestamp) + this.magnitude);}

}

Sample output of this program is provided on the next page.Note that this is a continuously-running program. Once it printsout the initial current set of reports, it continues to loop, report-ing new events as they happen. A number of interesting extensionsof the program are possible as programming assignments: runningvarious statistical analyses, tracking locations with the most fre-quent quakes, predicting the time of the next event, visualization ofmagnitudes, etc.

8 2015/6/29

PLUS…

A. Earthquake Program CodeThe following is the complete source code of a program duplicatingthe behavior of EarthquakeDemo from Figure 2 of [4].import big.data.*;import java.util.Date;import java.util.HashSet;import java.util.List;

public class EarthquakeDemo {public static void main(String[] args) {

int DELAY = 5; // 5 minute cache delay

DataSource ds = DataSource.connectJSON("http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_hour.geojson");ds.setCacheTimeout(DELAY);

ds.load();ds.printUsageString();

HashSet<Earthquake> quakes = new HashSet<Earthquake>();

while (true) {ds.load(); // this only actually reloads data when the cache times outList<Earthquake> latest = ds.fetchList("Earthquake",

"features/properties/title","features/properties/time","features/properties/mag","features/properties/url");

for (Earthquake e : latest) {if (!quakes.contains(e)) {

System.out.println("New quake!... " + e.description + " (" + e.date() + ") info at: " + e.url);quakes.add(e);

}}

}}

}

class Earthquake { // this class may be instructor-provided, or left to students to define as an exerciseString description;long timestamp;float magnitude;String url;

public Earthquake(String description, long timestamp, float magnitude, String url) {this.description = description;this.timestamp = timestamp;this.magnitude = magnitude;this.url = url;

}

public Date date() {return new Date(timestamp);

}

public boolean equals(Object o) { // introductory CS students would probably implement a simpler version of thisif (o.getClass() != this.getClass())

return false;Earthquake that = (Earthquake) o;return that.description.equals(this.description)

&& that.timestamp == this.timestamp&& that.magnitude == this.magnitude;

}

public int hashCode() { // technically, hashCode() should be overridden if equals() isreturn (int) (31 * (31 * this.description.hashCode()

+ this.timestamp) + this.magnitude);}

}

Sample output of this program is provided on the next page.Note that this is a continuously-running program. Once it printsout the initial current set of reports, it continues to loop, report-ing new events as they happen. A number of interesting extensionsof the program are possible as programming assignments: runningvarious statistical analyses, tracking locations with the most fre-quent quakes, predicting the time of the next event, visualization ofmagnitudes, etc.

8 2015/6/29