hackathon student pack - computer science and engineeringfethir/hackathoninfo... · tr_lvs...
TRANSCRIPT
Hackathon Student Pack
By Fethi Rabhi (31/3/2016). Email comments/suggestions to: [email protected] Latest version of this document is available at: http://www.cse.unsw.edu.au/~fethir/HackathonInfo/
Contents 1 Business Challenge .......................................................................................................................... 2
1.1 Background ............................................................................................................................. 2
1.2 The business opportunity to solve: Dimensions of the problem ............................................ 2
2 Datasets .......................................................................................................................................... 3
2.1 Overview ................................................................................................................................. 3
2.2 Thomson Reuters Datasets ..................................................................................................... 4
2.3 Other Data datasets ................................................................................................................ 9
3 IBM software ................................................................................................................................. 10
3.1 Accessing IBM Blue Mix ........................................................................................................ 10
3.2 DashDB .................................................................................................................................. 10
3.3 R and R studio ....................................................................................................................... 11
3.4 News processing ................................................................................................................... 11
3.5 Watson Analytics................................................................................................................... 11
3.6 Spark Service ......................................................................................................................... 11
3.7 Internet of Things .................................................................................................................. 11
4 Thomson Reuters .......................................................................................................................... 11
5 Data Elements and Relationships ................................................................................................. 12
5.1 Data Dictionary ..................................................................................................................... 12
5.2 Known relationships between datasets ................................................................................ 15
1 Business Challenge
1.1 Background No other Australian and New Zealand industries are as naturally positioned to benefit from regional and global growth heading into the future, as agribusiness. As populations grow, and much of the world increases their food intake, as diets globally evolve to seek new products (or perhaps return to old ones), the demand for food can only continue to increase. In Australia, until relatively recently, agri was almost taken for granted – food was never in short supply, and most Australians identified in some way with this as a land of sheep farms and cattle stations, despite never having been to one. In recent years, however, driven particularly by global food security awareness beginning around the mid 2000s, the focus on Australian agriculture’s true level of potential, competitive advantage, and national benefits has risen to a whole new level of sophistication. Which countries and consumers will by our products? What prices and economic value is likely to be generated from this? What primary or processed food products should Australia seek to produce in future? The idea of the challenge is to use all the public data on this sector – macro-economic indicators, production volumes, weather patterns, prices and news are only a handful – so that whole new possibilities into what will drive this industry going forward can be investigated. The potential benefits arising from this work cannot be underestimated. The results emanating from this will assist farmers, agribusinesses, and government in making medium and long term decisions, to allow the sector to reach its full potential. This means a clear path to greater innovation across the Agri supply chain, more people employed, greater incentive for new capital to flow into the sector, an acceleration in the development of ground-breaking Agri technology and management techniques, and greater revenue flowing back to Australia, particularly to rural and regional areas.
1.2 The business opportunity to solve: dimensions of the problem ANZ’s Super Regional Strategy relies on a strong network and presence across 34 countries. We are currently leading trade bank in Australia & New Zealand (our home markets) – providing financial services that link Australia and New Zealand to the world. Part of our value proposition and competitive advantage involves using our country network to intermediate trade flows and connect our customers within key economic trade corridors. Dimension 1 – Agribusiness Trade flows Three important Agribusiness Trade Flows that we see as key growth sectors and therefore high potential for our business:
Beef Grains Dairy
Dimension 2 – Trade Corridors between our home markets (Aust/NZ) and Asia
Five important Trade Corridors linking Australia and New Zealand to Asia where we see ANZ playing and important role are:
1. Aust/NZ – China 2. Aust/NZ – Japan 3. Aust/NZ – Korea 4. Aust/NZ – Hong Kong 5. Aust/NZ – Singapore
And finally ... The Business Problem
Help ANZ to understand the value (volume & price) of these three Agribusiness Trade Flows as they leave Australia and New Zealand
Help answer questions like where the Agribusiness Trade Flows go? Who participates in these values chains in our target Trade Corridors? i.e. Find connectivity and map out participant networks (country, company or individual) along the value chains (who buys from who) within our Trade Corridors. This could take the form of visualisation, analysis, historical view, prediction forecast, causality analysis or any other forms of analysis that is regarded as potentially beneficial to ANZ.
Examples
Determine the drivers of the Agribusiness Trade Flows using analytical models
Visualise the movements and connections in the network of participants using open source tools
Bring life to the value chain by showing trends and movements of supply side data (e.g. weather, production,) and network data (e.g. shipping, distribution) with demand side data (e.g. population changes, rate of urbanisation)
For ideas/discussions around these topics from ANZ, see: https://bluenotes.anz.com/
2 Datasets
2.1 Overview At a high level, the type of content can be categorized as:
Name Description
Commodity pricing Commodity pricing 1960-2015 for the 3 commodities (forecast to 2020 if available)
Beef
• Numbers of live cattle (including breeders and feeders breakdown) • Price & volume of Feeder cattle • Price & volume of breeder cattle • Size of herd (beef/ dairy split) • Area dedicated to beef/dairy
Grains
• Regional wheat & barley prices • Prices from the Chicago Board Of Trade (CBOT), Chicago Mercantile Exchange (CME) and regional pricing • Production output
• Area dedicated to grains
Dairy • commodity pricing: Whole Milk Powder (WMP),Skim Milk Powder (SMP), Anhydrous, Cheese) • Stock levels in countries of interest
Macro-economic indicators
• FX currencies • Oil prices • Bond market rates reflecting credit regimes • Credit default swaps • Bank Bill Swap Rates (BBSY)/ London Interbank Overnight Exchange Rate (Libor) • Central bank rates • Stock market indices • Inflation rates
Significant regulatory changes in countries of interest
Dates
Population dynamics • population growth • spread of dependants (>65, <18), women vs men • urbanisation
Weather Historical weather patterns in countries of interest (including rainfall, temp, soil moisture, etc)
Shipping and logistics • Shipping rates, volumes, number of vessels, available capacity • Baltic index
News Thomson News Archive
2.2 Thomson Reuters Datasets The datasets that are specific to TR and which are made available for the Hackathon are:
Dataset Name
Description BlueMix Tables (Schema TRDATA)
TR_RNA Reuters News Available on request
TR_MER Merchandise Exports with FOB (Freight on Board) Value
TBL024 TBL025 TBL026 TBL027 TBL028 AU State to Country: TBL029 TBL030 TBL031 TBL032 TBL033 TBL034 TBL035
TR_GOS Goods or Service Exports Value Statistics
TBL015 TBL016 TBL017 TBL018 TBL019 TBL020 TBL021 TBL022 TBL023
TR_SER Service Exports
TR_FUT Exchange traded commodity futures · Exchange-traded beef/dairy futures on CME · Exchange-traded grains futures on ASX · Exchange-traded dairy futures on NZX
TBL001 TBL002 TBL003 TBL004 TBL005 TBL006
TR_PFRM Australian General Purpose Wheat Pro Farmer pricing data per port (OTC)
Available on request
TR_YCI Young Cattle Index Available on request
TR_PSHP Shipping prices data (ones tracked by Reuters are Corn/Maize, Wheat, Barley, Rice, Sorghum, Oat, Rye, Livestock ). Links for every pair of port codes (Reuters) the prices of a commodity between them
TBL064 (DAMPIER_TO_QINGDAO__ORE) Others available on request
TR_WEA Weather information location varchar(50), dt date, min_temp decimal(10,2), max_temp decimal(10,2), avg_temp decimal(10,2), precip decimal(10,2)); The data is for 1/1/2006 through to 31/12/2015 To date the following areas have been loaded: Australian Capital Territory - Bal (AUS AC) Barwon (AUS VI) Brisbane (AUS QL) Canberra (AUS AC) Central (AUS WA) Central Highlands (AUS VI) Central West (AUS NS) Darwin (AUS NT) Far West (AUS NS) Goulburn (AUS VI) Greater Hobart (AUS TS)
TBL106
Hunter (AUS NS) Kimberley (AUS WA) Loddon (AUS VI) Lower Great Southern (AUS WA) Mallee (AUS VI) Midlands (AUS WA) Murray (AUS NS) Murrumbidgee (AUS NS) North Western (AUS NS)
TR_ITR International Trade Provide more information about this data
TBL015 TBL016 TBL017 TBL018 TBL019 TBL020 TBL021 TBL022 TBL023 TBL024 TBL025 TBL026 TBL027 TBL028 TBL029 TBL030 TBL031 TBL032 TBL033 TBL034 TBL035
TR_PCN Port Congestion TBL062
TR_CFD Company Financials & Pricing • Headline Financials – Revenue – Profit – EBITDA (exc Interest, Taxes, Depreciation and Amortization) – Margin – Market Cap • Historical Pricing
Available on request
TR_SHP Shipping • Listing of all ports in each of the relevant trade corridors. • Historical shipping cost between ports for commodities • Shipping vessel by port – listed by destination over a historical period.
Available on request
– Ship name and manifest description – Departure date and Arrival date – Ship type, sub-type, dimensions, dry weight tonnage (DWT)
TR_FX Foreign Exchange Rates TBL070
TR_OIL Oil Prices · Brent Crude (ICE) · WTI Light Crude (NYMEX)
TBL080
TR_CDS Credit Default Swaps Available on request
TR_BND Macro Economic Indicators: Bonds · Government Bonds
TBL061
TR_CBR Central Bank Rates TBL082
TR_NDX Stock Market Indices TBL073
TR_REG Regulatory changes Key financial regulatory changes across countries covered by the hackathon
Available on request
TR_INF Inflation Rates / CPI TBL047_1 TBL047_2 TBL048_1 TBL048_2 TBL049_1 TBL049_2 TBL050_1 TBL050_2 TBL051_1 TBL051_2 TBL052_1 TBL052_2 TBL053_1 TBL053_2 TBL054_1 TBL054_2 TBL055_1 TBL055_2 TBL056_1 TBL056_2 TBL057_1 TBL057_2 TBL058_1 TBL058_2 TBL059_1 TBL059_2 TBL060_1 TBL060_2
TBL074 TBL086 TBL087_1 TBL087_2 TBL088_1 TBL088_2
TR_POP Population dynamics: Historical Population Data in relevant countries – Population growth (Annual %) – Urban population (% of Total) – Age of dependency ratio (% of working age population)
TBL081
TR_LSD Livestock/Livestock Product Receipts TBL007
TR_LVS Livestock Export Statistics Livestock Production Statistics
TBL091 TBL092_1 TBL092_2 TBL093 TBL094_1 TBL094_2 TBL095_1 TBL095_2 TBL096_1 TBL096_2 TBL097 TBL098_1 TBL098_2 TBL099_1 TBL099_2 TBL100 TBL101_1 TBL101_2 TBL102 TBL103_1 TBL103_2 TBL104 TBL105
TR_AGG Agricultural Data Source of Income Labour Costs Grants Labour Data Crop types Additional Data
TBL038 TBL039 TBL040 TBL041 TBL042 TBL043 TBL044 TBL045 TBL046
TR_FAN Fleet analysis Bulker Dry Weight Tonnage Bulker Total by Number Containership total by TEU
TBL066
TR_COM Company Identifiers Information about company
TBL089 TBL090
2.3 Other Data datasets There is trading data available from Quandl.(https://www.quandl.com/). The main agencies interested in this type of challenge are ABARES, ABS and Department of the Environment. The datasets that are identified for the Hackathon are:
Dataset Name
Data source Description References
ABS_TRD ABS
Supply by Product Group by Industry
Employment by Industry
Allocation of Imports
International Trade Pricing
See ABS Web site
ABS_PRD ABS Producer Price Indices See ABS Web site
ABS_STA ABS
Agricultural statistics:
Employment
Crop Data
Area
Production
Yield
Livestock Data
Production
Number Slaughtered
International Trade Pricing
See ABS Web site
ABA_BEEF Department of Agriculture
Australian supply and use of beef and veal
Australian cattle numbers, by state and territory
Volume of Australian exports of beef, veal and live cattle, by destination
Value of Australian exports of beef and veal, and live cattle
Prices for Australian beef and veal on principal overseas markets
World cattle numbers, by country
See ABARES Web site
World beef and veal production, by country
Volume of trade in beef and veal, by selected countries
Summary of Japanese beef and veal statistics
Summary of Korean beef and veal statistics
WB_AGR World Bank Food production
Land Use
Agriculture statistics
http://data.worldbank.org/topic/agriculture-and-rural-development
UN_TRD United Nations
Trade flows http://comtrade.un.org/
3 IBM software
3.1 Accessing IBM Blue Mix A dedicated Blue Mix environment for the Hackathon is accessible from the following link: https://console.anz-blue-art-lab.au-south.bluemix.net/ There is a collection of great tutorials on Bluemix services: http://www.ibm.com/developerworks/cloud/bluemix/services.html
3.2 DashDB After logging in, click on “Dashboard” then “ANZ-HACKATHON-DATA” (under services)., then “Launch”. Then click on “Tables” (in left hand side menu). Choose the schema “ANZTRDATA” and all the tables listed earlier will be visible (see example below).
For DashDB related Bluemix apps, see http://www.ibm.com/developerworks/topics/dashdb%20service
3.3 R and R studio Information on how to use R within a BlueMix environment is available from: https://www.ibm.com/support/knowledgecenter/SS6NHC/com.ibm.swg.im.dashdb.doc/learn_how/explorer_Dynamite.html How to deploy a R Shiny app to Bluemix : http://www.ibm.com/developerworks/library/ba-bluemix-trs-predictive-analytics-with-dashdb/ .
3.4 Watson Analytics A data visualisation and predictive analytics tool to enable students to quickly analyze and better understand the data. A separate account has been provided to all teams to use Watson. Trade off analytics demo: http://tradeoff-analytics-demo.mybluemix.net/
3.5 News processing 1. AlchemyNews API - http://docs.alchemyapi.com/docs/getting-started-1 2. Twitter Insights app that was demonstrated by Shamim: - https://github.com/IBM-Bluemix/insights-search.git. It was based on Bluemix Insights for Twiiter service (https://console.ng.bluemix.net/catalog/services/insights-for-twitter)
3.6 Spark Service Provides easy access to the power of built-in machine learning libraries without the challenges of managing a Spark cluster independently. The Notebooks functionality also improves productivity for coding and running analytics. Information is available at: https://console.ng.bluemix.net/docs/services/AnalyticsforApacheSpark/index.html#analyticsforapachespark This service is not available from the dedicated BlueMix but from the public Bluemix (https://console.ng.bluemix.net). Here is some information on using DashDB with Apache Spark on Bluemix: https://developer.ibm.com/clouddataservices/docs/spark/tutorials-and-samples/load-and-analyze-dashdb-data-with-spark/
3.7 Internet of Things Source code for Shamim’s Internet of Things demo: http://www.shamimhossain.com/2016/01/control-star-wars-bb-8-droid-using-mqtt.html
4 Thomson Reuters The following facilities are available:
Eikon terminal: is available to all teams. You can book a session anytime. The terminal can be used to obtain any Thomson Reuters dataset of the same type as the one on BlueMix DashDB.
Open Calais (http://www.opencalais.com/): processes text and returns: Entities, Topic codes, Events, Relations and SocialTags.
5 Data Elements and Relationships
5.1 Data Dictionary/Glossary ABARES: http://www.agriculture.gov.au/abares
Age of dependency ratio: one of the characteristics of population, expressed as % of working
age population
AGPW Wheat Type: as defined by AGPW. E.g. “AGP1” refers to a particular type of wheat.
Agricultural Future Contract: see Future Contract
Agricultural Future Contract RIC: specific Reuters code to identify an exchange-traded future contract, takes the form <exchange>-<commodity>-<expiry convention>
AMF: Anhydrous Milk Fat
Australian Agricultural and Grazing Industries Survey (AAGIS): conducted by Australian Bureau of Statistics, contains Production and Pricing Data for Beef, Dairy and Grains.
Australian Bureau of Statistics (ABS): http://www.abs.gov.au/
Australian General Purpose Wheat RIC: takes the form <Australian Port Code (AGPW)>-<Wheat Type>-<Current Year>. For example, "ALBN-AGP1-CY1" refers to “AGP1” type of wheat shipped from Albany in the current year.
Australian Port Code (AGPW): AGPW code where wheat is shipped from (17 codes). E.g. “ALBN” is Albany.
Beef: one of the commodities targeted by the Hackathon
Beef pricing: there are two categories, Feeder Cattle and Live Cattle. Pricing can be obtained from CME futures contracts. Young Cattle Index can be obtained from Meat and Livestock Australia
Bonds: : important Economic Indicator
Bushel: dry measure of yield (in pounds or hundredweights) of wheat and other crops.
Central Bank Rates: : important Economic Indicator
CPI: Consumer Price Index, there are many indices that represent inflation rates.
Commodity: commodity involved in the Hackathon. Could be Grain, Dairy or Beef
Commodity origin: where the commodity is produced. In the Hackathon, we are only concerned with commodities produced in Australia (origin code = “AU”)
Commodity pricing: can be obtained from Futures Exchanges (ASX, CME and NZX) or OTC data.
Commodity RIC: specific Reuters code to identify a commodity mostly used in OTC trading, takes the form <commodity root>-<commodity subtype>-<commodity origin>. For example “MEAT-EYCI-AU” refers to a particular type of meat produced in Australia.
Commodity Root: code used by Reuters to identify a commodity. E.g. “MEAT”.
Comtrade database: see UN Comtrade database
Crop data: important Production Indicator
Current Year: convention used by Reuters, CY1 means current year (e.g. 2016), CY2 means next year (e.g. 2017) etc.
Dairy: one of the commodities targeted by the Hackathon, includes Whole Milk Powder (WMP), Skim Milk Powder (SMP), Anhydrous Milk Fat (AMF), Cheese.
Dairy pricing: can be obtained from NZX/CME future contracts.
Demand: important market indicator, represents the sum between domestic consumption and exports
Department of Agriculture: see ABARES
Economic indicator: anything that affects the trade flows
Employment figures: important Production Indicator
End stock: important Market Indicator, defines the difference between supply and demand
Exchange: in this context, refers to an exchange where future contracts are exchanged. Reuters codes for exchanges concerned by Hackathon are Chicago Mercantile Exchange (IMM) for Beef and Dairy, ASX (AX) for Grains and NZ Exchange (NZX) for Dairy.
Exchange prices: these prices are determined via the exchange
Exchange-traded: trading occurs via an exchange.
Expiry convention: Reuters uses a special convention to represent expiry dates
Feeder Cattle: Category of beef, represents weaned calves 600-800 lbs
Foreign Exchange Rates: important Economic Indicator
Future Contract: in this context, a future contract on a commodity concerned by the Hackathon
Future Contract RIC: a convention used by Reuters to identify future contracts, e.g. “1YVWK6” and “1YVWN6” refer to future contracts on “Mill Wheat” with expiry date May and July 2016 respectively.
Grains: can be Soybean, Wheat or Corn. Wheat is one of the commodities targeted by the Hackathon.
Gains pricing: Wheat and Barley pricing can be obtained from ASX futures contracts. Profarmer Australia provides Barley and Wheat prices per Australian port
Imports: direct allocation and indirect allocation of imports
Inflation Rate: measured by CPI, it is an important Economic Indicator
International Trade Pricing
Land use: important Production Indicator
Live Cattle: Category of beef with average weight 1250 lbs
Margin : part of Company Financials & Pricing (headline financials)
Market Cap: part of Company Financials & Pricing (headline financials)
Meat and Livestock Australia: http://www.mla.com.au/Home
News: a news item that affects all entities
Oil prices: important Economic Indicator
Open stock: excess supply from the previous year
OTC: Over The Counter
OTC pricing: prices agreed between two partners directly with each other (as opposed to exchange pricing)
Planting Process: important to understand this process for analysing weather risk
Population: affects commodity pricing (demand side)
Population growth: expressed as an annual %, one of the characteristics of population
Port Code (Reuters): Reuters code to identify ports. E.g. “TS7309539121” is Albany.
Profarmer Australia: primary contributor of Australian grain (wheat and barley) pricing, they ship out of 17 ports in Australia and price each separately.
Production: affects commodity pricing (supply side)
Profit: part of Company Financials & Pricing (headline financials), excludes Interest, Taxes, Depreciation and Amortization
Producer Price Index: is a weighted index of prices measured at the wholesale, or producer level. It is an important Production Indicator
Regulatory changes: key financial regulatory changes
Revenue: part of Company Financials & Pricing (headline financials)
RIC: Reuters Instrument Code, used to identify an type of asset class and financial instrument in Thomson Reuters datasets
SMP: Skim Milk Powder
Southern Oscillation Index (SOI): important Weather Indicator. Gives an indication of the development and intensity of El Niño or La Niña events in the Pacific Ocean (below −7 indicate El Niño and above +7 indicate La Niña).
Stock Market Index: important Economic Indicator
Stock to use: ratio between end stock and consumption. A high number indicates oversupply and low prices.
Supply: important market indicator, is the sum between production and imports.
Trade flow: movement of commodities between countries. One important database is the UN Comtrade database
UN Comtrade database: contains important information about trade flows between countries
Urban population: one of the characteristics of population, expressed as % of total population
Weather: important Production Indicator as it affects the yield and quality of agricultural commodities
Weather Indicator: e.g. SOI
Weather risk: requires knowledge about planting and harvesting times
Wheat Planting Process: will depend of wheat type, typically October/November for a harvest in June.
Wheat Type: there are different classifications. The general one is 5 categories depend on the planting season: Hard red winter, Soft Red Winter, Hard Red Spring, White Wheat, Durum Wheat. In Australia, AGPW has its own codes (see AGPW Wheat Type)
WMP: Whole Milk Powder
World Bank: provides many statistics on agriculture such as land usage and production
Yield: crop yield.
5.2 Known relationships between datasets The known relationships are illustrated in the figure below.