Transcript
Page 1: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

GE Proficy Historian GE Proficy Historian Data CompressionData Compression

IntroductionIntroduction

Stephen [email protected]

Page 2: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

This method is used by the GE Historian

What is data compression?What is data compression?

There are two fundamental classes of file There are two fundamental classes of file compression:compression:• Identify repeating elements (e.g., ZIP file Identify repeating elements (e.g., ZIP file

compression)compression) Pros: No loss of information – all original data Pros: No loss of information – all original data

restoredrestored Cons: CPU intensive – need to compress and Cons: CPU intensive – need to compress and

decompress, large files take a lot of timedecompress, large files take a lot of time

• Identify redundant data that can be discarded Identify redundant data that can be discarded (e.g., JPEG, dead-band, rate-of-change)(e.g., JPEG, dead-band, rate-of-change)

Pros: Fast, reduces network traffic, well suited for Pros: Fast, reduces network traffic, well suited for streaming data streaming data

Cons: Some data loss Cons: Some data loss

Page 3: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

Customer quotes when I ask them about compression?

“Disk space is cheap.”

“We don’t want to lose any data so we store everything”

“Today’s computers are so fast there’s no penalty for storing everything.”

“We’re a regulated industry…. We aren’t allowed to use compression.”

From all of the above, you might come to believe that data compression is an antiquated response to a problem that no longer exists. Computers are fast, storage is cheap, so store everything.

Page 4: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

Why compression is (still) importantWhy compression is (still) important

““Needle in the haystack” problemNeedle in the haystack” problem Much more difficult to find the truly interesting dataMuch more difficult to find the truly interesting data

Limited network bandwidthLimited network bandwidth Storing terabytes of data is only useful if you can Storing terabytes of data is only useful if you can

easily extract it easily extract it High long-term costs High long-term costs

Disk drives are “cheap”, but managing the data gets Disk drives are “cheap”, but managing the data gets expensiveexpensive

Superior performanceSuperior performance Storing the minimum necessary data Storing the minimum necessary data greatlygreatly

increases system performance and speed for clients increases system performance and speed for clients & servers.& servers.

Page 5: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

GE Historian Compression GE Historian Compression MethodsMethods

The Proficy Historian has two forms of data The Proficy Historian has two forms of data compression”compression”• Collector compression (CC)—Also called, “dead Collector compression (CC)—Also called, “dead

band” compression. It works by examining band” compression. It works by examining data and discarding any that does not exceed data and discarding any that does not exceed a defined limit (e.g. +/- 0.5 Deg F.)a defined limit (e.g. +/- 0.5 Deg F.)

• Archive Compression (AC)—Also called “rate of Archive Compression (AC)—Also called “rate of change” or “swinging door” compression. It change” or “swinging door” compression. It works by examining data (after CC) and works by examining data (after CC) and discarding any that falls within a slope range discarding any that falls within a slope range (more on this later.) (more on this later.)

Page 6: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

Collector CompressionCollector Compression

Dead bandxx x x x

x x xx

xx

xx x x

x

Discarded samples

Stored sample

Collector compression overview• Pros:

• Good at filtering out noise• Reduces data storage by 80 to ~90+%• Easy to understand

• Cons:• Unable to reduce data when slope (vs.

value) is unchanged (see constant slope section above)

Constant slope line

Page 7: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

Archive CompressionArchive Compression

Archive compression looks at the Archive compression looks at the data data afterafter collector compression collector compression

It only stores data that “changes It only stores data that “changes direction” beyond a configured rangedirection” beyond a configured range• In effect, it stores data based on its In effect, it stores data based on its rate rate

of changeof change. Compare to collector . Compare to collector compression which stores data based on compression which stores data based on the the amount of changeamount of change..

Page 8: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

Archive Compression EffectArchive Compression Effect

Discarded by archive compression

Archive compression overview• Pros:

• Can significantly reduce storage for certain signal types and noise

• Stores only the most relevant values• Cons:

• More difficult to tune• More difficult to understand

Red values are storedGreen values are discarded

Large change in slope, so values is stored

Page 9: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

Archive Compression –A Archive Compression –A deeper divedeeper dive

How does it compare to OSI’s How does it compare to OSI’s Swinging Door compression?Swinging Door compression?

Page 10: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

PI checks to see if all points lie inside the compression blanket, a dead band parallelogram drawn from end points using the CompDev as a tolerance. If any points fall outside the dead band, an archive event is triggered.

Even though this is the point that falls outside the dead band, this is the one that gets archived because it is the last end point for which all points were inside the dead band.

OSI PI Swinging Door OSI PI Swinging Door ComrpessionComrpession

Page 11: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

2) Calculate y for this x.

1) Calculate slope of this line

3) Calculate difference

4) Check if ABS difference < CompDev

3) Calculate slope of lower line

1) Calculate slope of upper line

4) Calculate lower y for this x.

2) Calculate upper y for this x.

5) Check if point y is < upper y

6) Check if point y is > lower y

OSI PI swinging door algorithm checks if a point is inside parallelogram.

The GE Historian algorithm checks if line between end points intersects the tolerance bar.

Archive Compression vs. PIArchive Compression vs. PI

Page 12: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

New Point

New Point

Archived Archived PointPoint

Archived Archived PointPoint

Swinging Door method.

GE Proficy Historian

Instead of checking if each point is inside the parallelogram, the GE Proficy Historian checks if the line intersects the dead band of each point.

GE Archive Compression vs. PIGE Archive Compression vs. PI

Page 13: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

As an additional benefit, there is no need to buffer all points between the last archived point and the newest point.

Here’s an example of how it works. The key points to understand:

• An “Archived Point” is one that is stored

• A “Held Point” is the last good value that arrived. We don’t know if it will be stored until the next value arrives to tell us if the slope has changed sufficiently.

After a point is archived, the next point becomes the held point.

Held PointArchived Archived

PointPoint

GE Archive Compression ExampleGE Archive Compression Example

Page 14: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

Construct error bands around the held point.

PI: E = “CompDev”

GE: E = deadband / 2

Archived Archived PointPoint

E

E

Held Point

GE Archive Compression ExampleGE Archive Compression Example

Page 15: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

Step 1: Calculate the slopes of the two lines, U and L, connecting the archived point with the upper and lower ends of the error bands (dead band) associated with the held point.

Held PointArchived Archived

PointPoint

_L

_U

GE Archive Compression ExampleGE Archive Compression Example

Page 16: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

The upper and lower slopes define a critical aperture window.

Held Point

Critical Aperture Window

Archived Archived PointPoint

_L

_U

GE Archive Compression ExampleGE Archive Compression Example

Page 17: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

New Point

Held PointArchived Archived

PointPoint

_L

_U

If the slope of the line N, connecting the archived point with the new point, is between the upper and lower slopes, it intersects the dead band of the held point.

_N

GE Archive Compression ExampleGE Archive Compression Example

Page 18: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

You can forget about this point now.

Remember the lowest upper slope and the highest lower slope.

New Point

• As new points are added, the previous new point becomes the current held point, and the same process is repeated.

• The critical aperture window will always be constructed from the lowest upper slope and the highest lower slope to insure that the conditions necessary to compress all previous points will be preserved.

• If the slope of the new point is within the critical aperture window, the previous held point may be discarded.

Held Point

Forget the slope of this line

Forget the slope of this line

GE Archive Compression ExampleGE Archive Compression Example

Page 19: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

New Point

Forget

Keep

Forget

Held Point

Forget

With each new point the process is continued, narrowing the aperture and discarding unnecessary points as you go.

GE Archive Compression ExampleGE Archive Compression Example

Page 20: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

New Point

Forget

Forget

Held Point

ForgetKeep

GE Archive Compression ExampleGE Archive Compression Example

With each new point the process is continued, narrowing the aperture and discarding unnecessary points as you go.

Page 21: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

New Point

Forget

Forget

Held Point

ForgetKeep

With each new point the process is continued, narrowing the aperture and discarding unnecessary points as you go.

If this continues long enough, the critical aperture window will close, converging on the slope of the trend for this segment.

GE Archive Compression ExampleGE Archive Compression Example

Page 22: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

New Point

Held PointForget

Forget

Forget

Keep

When the slope of the new point lies outside of the critical aperture window, an archive event is triggered.

Outside critical aperture window.

GE Archive Compression ExampleGE Archive Compression Example

Page 23: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

Held Point

The held point is now archived.

The held point is archived, the new point becomes the held point and the process starts anew.

Archived Archived PointPoint

The previous new point is now the held point.

GE Archive Compression ExampleGE Archive Compression Example

Page 24: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

Held Point

The process continues, as additional data arrive the critical aperture grows longer and thinner until a new value triggers an archive event.

GE Archive Compression ExampleGE Archive Compression Example

Page 25: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

PI Compression CompDev=0.4, ExcDev=0

14.2

14.4

14.6

14.8

15

15.2

15.4

15.6

15.8

16

16.2

16.4

archivedcompressedinputsVsT

xH Compression CompDev=0.4, ExcDev=0

14.2

14.4

14.6

14.8

15

15.2

15.4

15.6

15.8

16

16.2

16.4

archivedcompressedinputlSuSSeries7Series4

23 out of 120 points archived 10 out of 120 points archived

This one example is very encouraging, but more statistically significant work must be done as well as a data quality assessment comparing these approaches.

GE Archive Compression ExampleGE Archive Compression Example

Page 26: GE Proficy Historian Data Compression Introduction Stephen Friedenthal EVSystems  sfriedenthal@evsystems.net

QuestionsQuestions

Stephen FriedenthalStephen FriedenthalEVSystemsEVSystems

www.evsystems.netwww.evsystems.net 617.916.5101617.916.5101


Top Related