carlos rivero predictive analytics presentation

27
Predictive Analytics Integrating Permit Information, Vessel Monitoring, and Fishery Observer Programs Carlos Rivero Southeast Fisheries Science Center

Upload: carlos-rivero

Post on 22-Jan-2018

174 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Carlos Rivero Predictive Analytics Presentation

Predictive Analytics

Integrating Permit Information,

Vessel Monitoring, and Fishery

Observer Programs

Carlos Rivero

Southeast Fisheries Science Center

Page 2: Carlos Rivero Predictive Analytics Presentation

2

Data Systems Overview

PIMS (Permit Information Management System)

• Southeast Regional Office (St. Petersburg, FL)

• PostgreSQL Database Management System

VMS (Vessel Monitoring System)

• Office of Law Enforcement (Silver Spring, MD)

• Oracle Database Management System

Gulf Shrimp Observer Program

• Southeast Fisheries Science Center (Galveston, TX)

• Microsoft Access

GIS (Geospatial Information System)

• Multiple Sources

• Various Storage Formats (Shapefiles, Grids, Excel files, MS Access

databases, Oracle, JPEG images, Binary & ASCII Raster)

Page 3: Carlos Rivero Predictive Analytics Presentation

3

Permit Information Management

PIMS (Permit Information Management System)• PostgreSQL Database Management System

• 146 Transactional Tables

• Access via a local replicated database (Disaster Recovery Server)

• MS Access configured to read and extract data (ODBC)

• Migrated data tables to Oracle 11g RDBMS

• Primary Tables Include:

1. TBL_REQMIT (Permits and Requests)

2. TBL_VESSELS (Vessel Characteristics)

3. TBL_FISHERY_TYPE (Fishing Industry)

4. TBL_REQMIT_STATUS (Permit Status)

Page 4: Carlos Rivero Predictive Analytics Presentation

4

Vessel Monitoring System

VMS• Oracle RDBMS

• 108 Transactional Tables

• Access local replicated database and direct access via DBLink

• Receive nightly updates (~80,000 records) with a 4-day lag

• FMC_POS is the primary table of interest containing:• ID

• LAT_LON (SDO_GEOMETRY)

• UTC_DATE

• COURSE

• SPEED

• TRACK (SDO_GEOMETRY)

• RADIO (VESSEL IDENTIFIER)

Page 5: Carlos Rivero Predictive Analytics Presentation

5

Gulf Shrimp and Reef Fish

Observer Program

Observer Data• Microsoft Access RDBMS

• Data manipulated to create

TRIPS and TOWS tables:

• The TRIPS table documents

when the trips started and

ended. This information is used

to extract the locations from the

warehouse.

• The TOWS table identifies when

trawling is occurring which is the

target variable. This is used to

assign this behavior to the

locations previously extracted.

TRIPS TOWS

Vessel Official Number Vessel Official Number

Trip Number Trip Number

Trip Start Tow Number

Trip End Time In

Number of Days Time Out

Number of Tows/Sets Location

Page 6: Carlos Rivero Predictive Analytics Presentation

6

Geospatial Information System

Bathymetry• High Resolution Coastal

• Low Resolution Global

Distance from Shore

Direction to Shore

Speed

Page 7: Carlos Rivero Predictive Analytics Presentation

7

Geospatial Data Warehouse

1. Assign fishery permit to each VMS location (Vessel_ID and

Date)

2. Spatially-join bathymetry, distance from shore, and direction to

shore to each VMS location (Raster Cell Value)

3. Organize facts and dimensions based on the data warehouse

design.

4. Populate materialized view containing relevant data elements in

one master table

5. Identify which locations pertain to each observer trip. Assign

target variable (FISHING) a value of 1 for each location within

the TIME_IN and TIME_OUT window. All others receive 0.

Page 8: Carlos Rivero Predictive Analytics Presentation

8

Distribution of VMS Locations

Page 9: Carlos Rivero Predictive Analytics Presentation

9

Bathymetry by Fishery Code

Page 10: Carlos Rivero Predictive Analytics Presentation

10

Distance From Shore by Fishery Code

Page 11: Carlos Rivero Predictive Analytics Presentation

11

Vessel Speed (knots) by Fishery Code

Page 12: Carlos Rivero Predictive Analytics Presentation

12

Suspected Fishing Locations

(Using Speed & Bathymetry as Primary Criteria)

Page 13: Carlos Rivero Predictive Analytics Presentation

13

Predictive Analytics

1. Upload training data for Shrimp (trawling) and import into SAS Enterprise Data Miner.

2. Partition the data into training and validation segments based on their original distributions:

1. Develop models, Regression and Decision Tree, to predict fishing behavior. The Auto-

Neural Network model was not selected for this project since the resulting variable

coefficients must be understood.

2. Compare the models to determine which is the most effective at predicting fishing behavior.

BEHAVIOR VALUE SHRIMP

FISHING 1 43.69%

NOT FISHING 0 56.31%

Page 14: Carlos Rivero Predictive Analytics Presentation

14

Model Pathway

Additional data were not scored due to the relatively high

misclassification rate (0.38551) of the regression model. The

decision tree model had a similar misclassification rate of

(0.38636). The model must be refined prior to its application within

an operational context.

Page 15: Carlos Rivero Predictive Analytics Presentation

15

Trawling Regression Model

1. The regression model established that the following variables were most useful in predicting

shrimp trawling behavior.

Parameter DF Estimate Standard

Error

Wald

Chi-Square

Pr > ChiSq Standard

Estimate

Intercept 1 -2.7513 0.6571 17.53 <0.0001 0.064

ADW 1 -0.5844 0.0574 103.75 <0.0001 0.557

Bathymetry 1 -0.00663 0.00105 40.12 <0.0001 -0.1403

Freezer 1 0.3899 0.0584 44.64 <0.0001 1.477

Fuel Capacity 1 -0.00004 7.32E-6 23.12 <0.0001 -0.1666

Gross Weight 1 -0.00490 0.00236 4.31 0.0378 -0.0630

Longitude 1 -0.0355 0.00643 30.44 <0.0001 -0.1276

RS 1 0.2542 0.0922 7.60 0.0058 1.289

Steel Hull 1 -0.1832 0.0590 9.64 0.0019 0.833

WRK 1 0.7395 0.1003 54.38 <0.0001 2.095

Page 16: Carlos Rivero Predictive Analytics Presentation

16

Trawling Regression Equation

-2.7513

+ (-0.5844*ADW)

+ (-0.00663*Bathymetry)

+ (0.3899*Freezer)

+ (-0.00004*Fuel Capacity)

+ (-0.0049*Gross Weight)

+ (-0.0355*Longitude)

+ (0.2542*RS)

+ (-0.1832*Steel Hull)

+ (0.7395*WRK)

Variable Influence

Intercept Negative

ADW Permit Negative

Depth (neg. meters) Positive

Freezer (Y/N) Positive

Fuel Capacity Negative

Gross Weight Negative

Longitude (neg. degrees) Positive

RS Permit Positive

Steel Hull Negative

WRK Permit Positive

Page 17: Carlos Rivero Predictive Analytics Presentation

17

Trawling Regression Fit Statistics

Page 18: Carlos Rivero Predictive Analytics Presentation

18

Trawling Regression Iteration Plot

Page 19: Carlos Rivero Predictive Analytics Presentation

19

Trawling Decision Tree

Page 20: Carlos Rivero Predictive Analytics Presentation

20

Decision Tree Explained

1. If the LATITUDE is >= 35.165, there is a 60.7% chance that the vessel is fishing.

2. If LATITUDE is < 35.165, there is a 40.1% chance that the vessel is fishing.

3. If LATITUDE is < 35.165 and LONGITUDE < -81.045, there is a 44.7% chance that the vessel is fishing.

Furthermore, if the vessel has a KM permit, there is a 67.3% chance that the vessel is fishing as opposed

to a 43.4% chance if the vessel does not have a KM permit.

4. If LATITUDE is < 35.165 and LONGITUDE > -81.045, there is a 32.1% chance that the vessel is fishing. If

the NET_WEIGHT of the vessel is less than 69.5 tons there is a 41.5% chance that the vessel is fishing. In

addition, if the vessel’s speed is >= 0.105 knots, then there’s a 47.2% chance that it is fishing. If the speed

is <0.105 knots, then the LONGITUDE must be greater than -79.955 degrees to have a 83.3% chance of

predicting fishing behavior.

5. On the other hand, if the NET_WEIGHT of the vessel is >= 69.5 tons, there is a 24.3% chance that the

vessel is fishing. In addition, if the HOLD_CAPACITY of the vessel is less than 14,000 pounds, there is a

52.0% chance that the vessel if fishing. Furthermore, if the DISTANCE to the closest shore is < 7,394

meters, then it is 100% likely that the vessel is fishing as opposed to 40.0% likely if the distance is greater

than or equal to 7,394.

Page 21: Carlos Rivero Predictive Analytics Presentation

21

Trawling Decision Tree Fit Statistics

Page 22: Carlos Rivero Predictive Analytics Presentation

22

Trawling Decision Tree Iteration Plot

Page 23: Carlos Rivero Predictive Analytics Presentation

23

Trawling Decision Tree

Important Variables

Page 24: Carlos Rivero Predictive Analytics Presentation

24

Model Comparisons

(Event Classification)

Model Role False

Negative

True

Negative

False

Positive

True

Positive

Decision Tree Train 1079 1742 240 459

Decision Tree Validation 1084 1706 276 454

Regression Train 1033 1688 294 505

Regression Validation 1033 1658 324 505

Page 25: Carlos Rivero Predictive Analytics Presentation

25

Model Comparisons

(Event Classification)

0

500

1000

1500

2000

2500

3000

3500

4000

Decision-Train Decision-Validate Regression-Train Regression-Validate

True Positive

False Positive

True Negative

False Negative

Page 26: Carlos Rivero Predictive Analytics Presentation

26

Model Selection

(Validation Misclassification Rate)

Selected

Model

Model Misclassification

Rate

Average

Squared Error

Y Regression 0.38551 0.22852

N Decision Tree 0.38636 0.22573

Page 27: Carlos Rivero Predictive Analytics Presentation

27

Next Steps

1. Develop observer data warehouse

2. Link VMS/Permit and Observer data warehouses

3. Use the observer data to determine fishing vs non-fishing

locations for all programs (pelagics, reef fish, shrimp, sharks)

4. Develop, test, and validate program specific models

5. Incorporate model output into operational scoring routine

6. Use validated models to quantify fishing effort