Download - Think Automation First
PDF RedactorUiPathTeam PDF Redactor Custom Activity
Version 3.0
May 2021
Bernard Lawes
2
Content
1. What Export Control is and why it is important
2. The legislation applicable to UiPath and the penalties
for not complying with it
3. Our obligations and sales responsibilities
4. The red flags you should be able to identify on your
own
5. Detailed 3rd Party Screening Process
6. Whom should you contact for further guidance
3
● Lack of Awareness of redaction protocols
● Manual, Tedious, and Error Prone
● Time-Consuming even with Desktop Software
● Batch Processing is not readily available
● Wastes lots of Paper (Environmental Impact)
● Mistakes can lead to costly compliance risks
Why is Redaction so challenging?
5
1
2
3 Reduce time required to safely publish or deliver requested documents
Ensure released documents are in compliance with data privacy laws
Fast and secure automated redaction without 3rd party software
Value Proposition
6
Digitize Extract Redact
Digitize OCR Activities Extraction Activities Redaction Activity
Redaction Plugin for Document Understanding
Leverage Studio’s available OCR
Libraries
Identify PII data using Regex, Form
Extractor, or Machine Learning
Irreversibly redact any text on the
document
7
Where to find help for PDF Redactor?
https://forum.uipath.com/t/pdf-redaction-
custom-activity/236846
https://connect.uipath.com/marketplace/co
mponents/pdf-redactor
Download Custom Activity Here
Forum Discussion and Examples
8
How does PDF Redactor work?
Leveraging OCR, a UiPath Robot locates and redacts sensitive data based on given Regex patterns and extracted data from UiPath Document Understanding.
The activity comes with pre-built regex patterns for items such as social security number, phone number, dates and others. In addition, the user can define their
own custom regex patterns as well.
Enabled to ingest ExtractionResults and DocumentObjectModels from Document Understanding, the activity can redact any text value in the document.
Moreover, the activity benefits from Document Understanding’s ever-growing extraction capabilities using Form Extraction, Machine Learning, and others
9
Arg Description Available Options Type Direction Examples
FileInput Input PDF File String In C:\temp\input.pdf
FileOutput Full path of output redacted PDF File String In/Out C:\temp\output.pdf
FormulaAutoPre-Built and auto-generated regular expressions for PII
data
• ssn
• phone
• ein
• dates
• currency
String Array In {"ssn","ein","dates","currency","email","phone"}
Formula User-defined Custom Regular expression String In \d{6,}
Keywords String array of keywords to be redacted String Array In {“John”,”MacArthur”,”Jack”,”Hibbs”}
HighlightOnly To Highlight words, set to True; Leave False to Redact• True
• FalseString In
RedactColor Sets the color of the Redaction polygon or highlight System.Drawing.Color In System.Drawing.Color.Black
Silent Set to true to Hide Status Bar• True
• FalseBoolean In
OCREngine Sets the OCR Engine. Defaults to Omnipage
• OmniPage
• UiPath
String In “omnipage”
OCRAPIKey API Key for the selected OCR Engine String In
OCREndpoint Endpoint URL for the selected OCR Engine String In
PDFRedaction Activty Arguments
10
Arg Description Available Options Type Direction Example
FileInput Input PDF File String In C:\temp\input.pdf
FileOutput Full path of output redacted PDF File String In/Out C:\temp\output.pdf
FormulaAutoPre-Built and auto-generated regular expressions for PII
data
• ssn
• phone
• ein
• dates
• currency
String Array In {"ssn","ein","dates","currency","email","phone"}
Formula User-defined Custom Regular expression String In \d{6,}
Keywords String array of keywords to be redacted String Array In {“John”,”MacArthur”,”Jack”,”Hibbs”}
HighlightOnly To Highlight words, set to True; Leave False to Redact• True
• FalseString In
RedactColor Sets the color of the Redaction polygon or highlight System.Drawing.Color In System.Drawing.Color.Black
Silent Set to true to Hide Status Bar• True
• FalseBoolean In
WaternarkFileFile to a PNG use as a watermark or logo on the
redacted document
• OmniPage
• UiPath
String In “omnipage”
WatermarkLocation Set the origin point for the Watermark to appear System.Drawing.Point In New System.Drawing.Point(0,0)
RedactFieldsRedactFields is an Array of the Fields from the existing
taxonomy, that you want to redact.String Array In {“name”,”address”,”invoice”,”dob”}
DocumentObjectModelDocumentObjectModel - Result from Document
Understanding Digitize ActivityDocumentObjectModel In
ExtractionResultExtractionResult > Result from the Document
Understanding Extraction Scope ActivityExtractionResult In
DU Redaction Plugin Activity Arguments
11
Limitations
Limitation Description
Contiguous Words Only The Basic PDFRedaction activity activity does not redact phrases, sentences, or any group of words like
paragraphs or characters separated by a space, tab, or carriage return. Regex patterns will match only contiguous
words.
To redact groups of words, please use the DU Redaction Plugin activity and specify the fields from the taxonomy to
be redacted
OCR Engines Available OCR Engines
• Google Vision OCR
• OMNIPage
• UiPath Document OCR
12
Dependencies
Library Description Version License Information
PDFSharp.ExtensionsExtension methods for PDFSharp to support and simplify some common operations
including image extraction.0.1.2.2
https://github.com/gheeres/P
DFSharp.Extensions/blob/ma
ster/LICENSE
PDFSharp.Standard 1.51.12
https://github.com/CLMSUK/
PDFsharp/blob/master/LICE
NSE
UiPath.OCR.ActivitiesThe UiPath.OCR.Activities package contains the "UiPath Screen OCR" and "UiPath
Document OCR" activities, that use UiPath's OCR engines.2.41
https://www.uipath.com/hubfs
/legalspot/UiPath_MSSA.pdf
UiPath.OmniPage.ActivitiesThe UiPath OmniPage Activities pack contains the OmniPage OCR activity, using the
Nuance OmniPage OCR Engine.1.50
UiPath.OmniPageBundle.ExtendedThe package contains the binaries needed by OmniPage OCR Engine.Powered by
OmniPage OCR.
UiPath.PDF.Activities 3.2.2
UiPath.System.ActivitiesPackage contains core activities which enable the automation of desktop applications,
browsers, and virtual machines. Please check the documentation for more details.20.4.0
UiPath.UIAutomation.Activities
Core activities which enable the robots to manipulate data tables and collections, work with
files and folders, communicate with Orchestrator. Package also contains workflow
operators, dialog forms, debugging and invoking methods.
20.4.2
UiPath.IntelligentOCR.Activities
Core activities that enable the usage of a complete document processing framework, from
taxonomy definition, digitization, document classification, data extraction, data validation
and classifier / extractor training.
4.5.2
UiPath.OCR.ActivitiesThe UiPath.OCR.Activities package contains the "UiPath Screen OCR" and "UiPath
Document OCR" activities, that use UiPath's OCR engines2.40
UiPathTeam.StatusProgress.Activities Status Message and Progress Bar that can be positioned anywhere on the screen. 1.02
13
Where to find help for PDF Redactor?
https://forum.uipath.com/t/pdf-redaction-
custom-activity/236846
https://connect.uipath.com/marketplace/co
mponents/pdf-redactor
Download Custom Activity Here
Forum Discussion and Examples
https://www.uipath.com/hubfs/legalspot/UiP
ath_Activity_License_Agreement.pdf
License Agreement
14
➢ UiPath Document Understanding webpage ← for trial and
more resources
➢ How AI Can Continuously Improve and Scale Automations
(webinar with customers sharing case studies)
➢ Guide on Document Understanding (white paper)
➢ Academy training
➢ Documentation on Document Understanding framework &
extractors and ML model training via AI Fabric
➢ AI & RPA webpage
Where can you find out more?