2017.04.06 historical shipping data and machine learning

32
Historical shipping data and machine learning Whitaker Institute Research Day NUI Galway 6 th April 2017 Aidan Kane Peter M Solar NUI Galway Vesalius College, VUB and Université Saint-Louis- Bruxelles Note: data/graphs throughout reflect work-in-progress, not final results, and so should not be relied/cited.

TRANSCRIPT

Page 1: 2017.04.06 Historical Shipping Data and Machine Learning

Historical shipping data and machine learning Whitaker Institute Research Day NUI Galway 6th April 2017

Aidan Kane Peter M Solar NUI Galway Vesalius College, VUB and Université Saint-Louis-Bruxelles

Note: data/graphs throughout reflect work-in-progress, not final results, and so should not be relied/cited.

Page 2: 2017.04.06 Historical Shipping Data and Machine Learning

Outline

• Database of steamships c.1810 to c.1860 built in Britain and Ireland • 59,000 observations • from c.250 sources • up to 50 fields for each observation

• Database of ships 1770s to 1850 listed on Lloyd’s Register of Shipping • transcription from originals: sample of about 260,000

observations –A,B, ships most years, full sample every 5 years • up to 40 fields for each observation

• How many distinct vessels in each dataset? • => domain knowledge/hand matching and/or machine learning • probabilistic record linkage?

• with INSIGHT, IT at NUI Galway)

Page 3: 2017.04.06 Historical Shipping Data and Machine Learning

Why?

• shipping as a key technology of the industrial revolution • how did technological dimensions evolve?

• how long did vessels stay in service • engine size, propulsion/hull – wood to iron/speed? • in which sectors were vessels adopted?

• impacts on international trade • indicators of regional industrial activity/skills

• how many built, and where? who built them? • a good application of some basic database technologies

• and perhaps more • picks up an interesting strand in econometric tradition of

economic history ...

Page 4: 2017.04.06 Historical Shipping Data and Machine Learning

Journal of the American Statistical Society, 53 (1958), 360-381

Page 5: 2017.04.06 Historical Shipping Data and Machine Learning

sparking a debate ... (TLS (1966), pp. 899, 948, 1016)

Page 6: 2017.04.06 Historical Shipping Data and Machine Learning

Hughes • “His complaint that we missed some ships is likely to be of no quantitative

importance if our 1,945 ships contain no error larger than a good sampling error. If our ships turned out to be as good as a random sample, all our conclusions would hold without modification no matter how many ships we missed.”

• “I rather doubt, from what he says, that he ever read our paper.” • “The question Mr Craig must answer is this: What differences do the missing

numbers make? The question cannot be answered by grousing about the political background, only by hard quantitative work.”

Craig • “I have never seen this phrase (permanent British registry) used before, and I

would ask Professor Hughes which side of the Atlantic it comes from.” • “…econometric historians, now that their work of quantification is rapidly

expanding, should exercise the greatest care in the utilization of statistical material collected in the past.”

Page 7: 2017.04.06 Historical Shipping Data and Machine Learning

On steamships

• Rather than 1,945 steamships built in the period, we think probably closer to 6,000 • wider range of sources • not just one cross section • focus not just on those ships involved in international

trade • also coastal trade, tugs, river/canal craft/postal/military • attempt to trace whole life cycle, including ‘reason out of

service’

Page 8: 2017.04.06 Historical Shipping Data and Machine Learning

When steamships die wrecked, sold Denmark, sold France, driven ashore, converted, sold foreign, sold Russia, sank, lost, grounded, broken up, sold Germany, register still open, sold Australia, sold Spain, sold Portugal, captured, sold Bahamas, sold Gibraltar, destroyed in hurricane, sold Italy, missing, sold Trinidad, hulked, disposed of, boiler explosion, abandoned, sold Finland, sold Switzerland, foundered, sold New Zealand, sold India, sold Poland, scrapped, blown up, sold Brazil, sold Admiralty, sold Turkey, sold Netherlands, sold Naples, sold Greece, sold Norway, sold Moulmein, exploded, stranded, sold Canada, sold Colombia, engine transferred, sold Hong Kong, register closed, inland navigation, last note, sold Egypt, sold Mauritius, sold South America, sold Austria, sold Argentina, sold Japan, sold Uruguay, condemned, burnt, collision, sold East India Co, sold Sardinia, beached, sold Antigua, sold Belgium, sold Africa, sold Malta, sold Tripoli, struck off--owners not found, scuttled, sold Columbia, sold Chile, sold Sicily, sold China, engine removed, sold Burma, sold Sweden, sold Demerara, sold New Brunswick, laid up, withdrawn, sold Ecuador, sold Singapore, sold Mexico, sold Batavia, sold Australian, out of commission, still in existence, disarmed, last observed, ceased, sold Peru, sold Argentine, explosion, sold government, sold Hamburg, defective, failed experiment, sold Prussia, sold USA, sold West Indies, sold Bermuda, still present, sold Quebec, out of registry, destroyed, out of service, sold Jamaica, cancelled, still in operation, out of register, sold Cochin China, closed, chartered France, sold Barbadoes, sold Barbados, sold Venezuela, written off, not steam, sold Asia, sold Vatican, sold to crown, sold Ionia, sold South Africa, under tonnage req, engine removed, condemned as prize, hull for sale, coal depot, broken up, sold Tuscany, deleted, sunk, sold Cuba, sold Ceylon, used as barge, certificate given up, sold Romania, sold Prince Edward Island, sold Gualdalquivir Co, sold Papal States, sold Borneo, converted to barge, sold Paraguay, chartered Turkey, sold Australia, then Batavia, sold Hayti, sold Haiti, still in service, store ship, dismantled, still in use, damaged, sold Belize, sold Honduras, dismasted, sold Thailand, existence in doubt, sold, converted to dredger, sold Trinity House, depot,

Page 9: 2017.04.06 Historical Shipping Data and Machine Learning

What’s in a name? Faid Gahaad, Michael, Cock o`th North, Illawana, St Kieram, Ann Scarboro, Dreadnough, George and June, Jokka, Die Schone Mainaern, Robert and Ann, William and Charles, Pride of the North, Lady Kilburn, Seyd Pacha, Bellbird, Adaline, Candare, Plato, Bonny Dundee, Will-o`-the-Wisp, Sir C Napier, Revensbourne, Cock o`the North, Aquilla, Sun Flower, Forget-me-Note, Malukhoff, Thomas Roydon, Primer Argentino, Koh-i-nor, Normandy, Pellissier, Flying Childer, Earl of Malmsebury, Dune, August Louise, Saint Andrew, Saint Mungo, Storfursten, Chili, General Monanga, Earl of Elgin, Finland, Senator, Silma, Tribune, Lubeck, Earl Douglas, Guide, Lochfyne, Malcolm Brown, Almansor, Correo, Corriere Sicillano, Cotinguiba, Gotherburg, Graaf von Rechteren, Henry Wright, John M`Adam, Nicholai 1st, Santa Cruz, St Halvard, Warata, George the Fifth, Balder, Baron von Humboldt, Eclat, Folden, Havila, Will o` the Wisp, Camoes, Dom Alfonso, Protis, Memel Packet, Der Prusse, Persenunga, Camaragibe, Berlin, Prins Oscar, Queen of the West, Acoriano, Alexandre, Alku, Antonio Varas, Aphroezza, Aracaju, Attalante, Belgique, Boyana, De Cartes, East Anglia, Foz de Douro, General Pelissier, Gipsy King, Glommon, Great Contest, Gustave Pastor, Hadjaz , Hamburgh, Ilmarmen, Iron Master, Istanboul, Italo, Kielmansegge, Killingsworth, Kroonprinces Louise, L`Imperatrice,, Leopold 1, Lisbonne, Marco Bozzaris, Mariout, Marsella, Narwa, Nordstjernan, Oliva, Panelinion, Pelissier, Peninsular, Plexavra, Pytheas, Quarry Maid, Rio de Janeiro, Robert Henry, Sagres, Salama, Salaminia, Sphendone, St Petersburgh, Stromstad, Tilset, Union No 2, Collettis, Comte de Hainaut, Congress, Corra Linn, El Correo del Riff, Eptanissos, Estephania, Fauqui, George Olympius, Guillermina, Marie de Brabant, Restlass, Briliant, Unuique, Christiana, Wiliam Swan, Pauliina, Sea Nymh, Gosport, Belldog, Ocean Pride, Inuique, Wansbeek, Sir William Pool, Admiral Misulis, Tantullan, Cockerill, Izamados, Patrus, Al Hamy Pacha, Colietis, Rendell, Bridge, Hulls, Coventanter, During, Lady Seule, Orrisa, Cockerll, Taumados, Regus Ferreos, Barbudda, Juverne, William & John, Isabelle, Powerfull, Rosaric, Ann

Page 10: 2017.04.06 Historical Shipping Data and Machine Learning
Page 11: 2017.04.06 Historical Shipping Data and Machine Learning
Page 12: 2017.04.06 Historical Shipping Data and Machine Learning
Page 13: 2017.04.06 Historical Shipping Data and Machine Learning
Page 14: 2017.04.06 Historical Shipping Data and Machine Learning
Page 15: 2017.04.06 Historical Shipping Data and Machine Learning
Page 16: 2017.04.06 Historical Shipping Data and Machine Learning

Database coverage

VARIABLE Coverage Augmented

Year built 96 99

Location built 90

Year out of service 71 94

Reason out of service 64

Tonnage 93

Horsepower 86

Builder 78

Engine-maker 50

Page 17: 2017.04.06 Historical Shipping Data and Machine Learning

-400

-300

-200

-100

0

100

200

300

400

500

1810 1820 1830 1840 1850 1860

Flows: ships built and out of service

DISAPPEARANCES

NEW CIVILIAN

NEW MILITARY

NET BUILD

Page 18: 2017.04.06 Historical Shipping Data and Machine Learning

0

50

100

150

200

250

300

350

400

450

500

1810 1815 1820 1825 1830 1835 1840 1845 1850 1855 1860

Steamship Building

Database

Official statistics

Hughes & Reiter

Page 19: 2017.04.06 Historical Shipping Data and Machine Learning

0

10

20

30

40

50

60

70

80

90

100

1810 1820 1830 1840 1850 1860

Diffusion of Iron Hulls

Page 20: 2017.04.06 Historical Shipping Data and Machine Learning

0

10

20

30

40

50

60

70

80

90

100

1815 1820 1825 1830 1835 1840 1845 1850 1855 1860

Regional Diffusion of Iron Hulls

CLYDETYNETHAMESMERSEYOTHER

Page 21: 2017.04.06 Historical Shipping Data and Machine Learning

Lloyd’s Register as a source Easy to search for ships Much harder to search for masters, owners Mass of information on: Ship characteristics Shipbuilding Ownership Uses

Page 22: 2017.04.06 Historical Shipping Data and Machine Learning

Lloyd’s Registers as source

• Fairly comprehensive coverage of ships involved in U.K.’s foreign trade • Including ships owned abroad • But for the coasting trade only a small share of ships,

mainly larger vessels • Relatively few steamships (alternative database)

• Standard format throughout period

Page 23: 2017.04.06 Historical Shipping Data and Machine Learning

MySQL database .

Page 24: 2017.04.06 Historical Shipping Data and Machine Learning

Easy to search for ships Much harder to search for masters, owners Mass of information on: Ship characteristics Shipbuilding Ownership Uses

Page 25: 2017.04.06 Historical Shipping Data and Machine Learning

Lloyd’s Registers as source

• Fairly comprehensive coverage of ships involved in U.K.’s foreign trade

• Including ships owned abroad • But for the coasting trade only a small share

of ships, mainly larger vessels • Relatively few steamships (alternative

database) • Standard format throughout period

Page 26: 2017.04.06 Historical Shipping Data and Machine Learning

MySQL database data transferred from Excel, can be readily interrogated.

Page 27: 2017.04.06 Historical Shipping Data and Machine Learning

Standardising locations

Newfoundland: 58 variants: • N fl'd, N fland, N flnd, N'fdld, N'fdlnd, N'fiand, N'fid, N'find,

N'fl, N'flad, N'fland, N'fld, N'fln, N'flnd, N'fndld, N'fnld, N'wfld, N'wflnd, Newf, Newfd, Newfdl, Newfdld, Newfl, Newfld, Newflnd, Newfndld, Newfoundland, Nfdland, Nfdld, Nfdlnd, Nfdnd, Nfiand, Nfind, Nfl'd, Nfl'nd, Nflad, Nfland, Nfld, Nflnd, Nflndld, Nfndid, Nfndl, Nfndl'd, Nfndld, Nfndlnd, Nfnld, Nufld, Nwfd, Nwfdl, Nwfdld, Nwfl, Nwfl'd, Nwfl'nd, Nwfld, Nwflnd, Nwfndld, Nwfndlnd, Nwfnlnd

So far 6826 distinct abbreviated locations 1135 standard locations

Page 28: 2017.04.06 Historical Shipping Data and Machine Learning

Source potential: one example

• Hull protection: • Doubling: structural solidity • Sheathing: protection against shipworms and fouling

Page 29: 2017.04.06 Historical Shipping Data and Machine Learning

Doubling and sheathing: broad changes (%)

Doubled Sheathed Sheathed Wood Metal

1779 9,8 36,0 0,1 1800 5,5 28,4 13,1 1820 2,8 15,5 24,9 1840 3,3 8,5 32,7 1860 0,8 4,0 45,9

Page 30: 2017.04.06 Historical Shipping Data and Machine Learning

Doubling by location of ships

Baltic White Sea Fishing 1779 18,5 22,4 64,1 1800 10,3 20,0 37,6 1820 4,6 4,7 29,7 1840 1,9 6,3 26,7 1860 1,7 0,0 50,0

Stronger ships well before iron?

Page 31: 2017.04.06 Historical Shipping Data and Machine Learning

Sheathing by destination, 1779-1860

Sheathed

Coasting 10,5

Northern Europe 16,8

Southern Europe 53,1

Americas/Atlantic 72,0

Asia/Africa 91,4

... against the dangers in warmer waters

Page 32: 2017.04.06 Historical Shipping Data and Machine Learning

Next steps

• most steamship/LR data now done • standardisation

• linguistic, geographic similarity/closeness of names, locations especially – others?

• machine learning • for steamships (in a sense, already done)

• cluster analysis • probabilistic record linkage – supervised/with ground-truthed data

• for Lloyd’s register • more open-ended

• match year-to-year a starting point • matches/non-matches/uncertain