the 1 st national puc dockets database: aee powersuite eric fitz director, engineering and product...
DESCRIPTION
AEE Membership Across Technologies 3TRANSCRIPT
THE 1ST NATIONAL PUC DOCKETS DATABASE:
AEE POWERSUITE
Eric FitzDirector, Engineering and Product Development
NARUC Subcommittee on Information Services
November 2014
Advanced Energy Economy
2
AEE is a national association of business leaderswho are making the global energy system more secure, clean, and affordable.
Mission: Transform public policy to enable rapid growth of advanced energy companies.
3
AEE Membership Across Technologies
Two Energy Policy Data Problems:
4
+
2) Big Data1) Fragmented Data
INDUSTRY DATA IS FRAGMENTED
Industry Stakeholder Groups
NREL
EIA DSIRE
OpenEI
PUCs
Databases
C AM AI L T X
C T
1
2
3
You must follow dozens of data sources to track important issues.
Big Data
Policy work is plagued by the three “Vs”• Volume of policy data• Variety of legislative/regulatory processes• Velocity of data change
AEE DIGITAL PLATFORM VISION
Industry Stakeholder Groups
NREL
EIA DSIRE
OpenEI
PUCs
Databases
C AM AI L T X
C T
AEE Big Data Asset
The Solution – AEE’s PowerSuite
8
PowerSuite is robust set of tools – including BillBoard, DocketDash, and PowerPortal – that allows you to search, track, and collaborate on energy legislation and utility regulatory proceedings from across the country with one, easy-to-use interface.
PowerSuite Products
9
Review of Features
10
Core Features
Search
• First national PUC database• Advanced energy focused bills • Simple interface
Track
• Email notifications• Favorites• Reporting
Collaborate
• Summaries• Priority and Position• Comments
User Testimonial
11
Jim KennerlySenior Policy Analyst
“PowerSuite is really amazing…I've already discovered some incentives in California (tax exemptions and such) we didn't even have in the database! This is really going to help us tremendously - great product.”
DEMO
DocketDash System Details
13
SH
DocketDash Coverage: 46 States + DC
14
Under DevelopmentQuality Assessment (QA) PendingReview Completed
DocketDash Key Stats
15
Dockets 190K
32M60GB of raw text
Documents 2.6M900GB of pdfs
Pages
Number of Pages: Wikipedia vs. DocketDash
16
Series10
10,000,000
20,000,000
30,000,000
40,000,000
# Pa
ges
[Mill
ions
] DocketDash
34M* 32M
*As of November 2014, http://en.Wikipedia.Org/wiki/wikipedia:statistics
VS.
DocketDash will surpass Wikipedia’s total content in a few months.
Collect Display (User Interface)
PUCs
C AM AI LC T
T X A
C
Bills
Dockets
Store IndexAdapt
B
Process
AEE Big Data Asset
DocketDash Technology Stack
Technology Stack Detail
18
Process
Download
Adapt
Collect •Dynamic docket metadata collection at off-peak hoursDocket #, Title, Description, Parties, Date...
•Map source schema to AEE standard
•Queue downloads and identify scanned documents
OCR PIPELINE
OCR = Optical Character Recognition
Reassembled PDF
ExtractedText
Validate •Review metadata and check for failures
Scanned Document
20 CPU-Years
What Have We Learned?
19
• PUC docket sites vary dramatically state by state• Usability
• Permalinks• Search
• Data structure• Nomenclature• Digital vs. paper system
• Creating a standardized docket system is hard
QUESTIONS?
Create an account today > PowerSuite.aee.net
For federal, state, and municipal government employees
PowerSuite is FREE