Follow

Process Management Framework - Redaction of information in PDFs

Article published on the 21st of July, 2022.

The contents of this article may be subject to change due to ongoing development of the Process Management Framework product.

1. Case description

When working with documents which contains sensitive data it is often necessary to be able to manipulate said documents in various ways.

One example could be when documents containing sensitive data needs to be provided to a third party, but with the sensitive data removed in some way. Another example could be adding disclaimers inside documents noting confidentiality or that the document is a draft only.

Redaction is done by replacing text in a document with a colored block, as shown in the following image:

To demonstrate the options for accomplishing these tasks within WorkPoint, we have set up the following case.

As a case manager, I need to be able to:

  • Run a process which redacts personal identification numbers from a case PDF document.
  • The process should also redact the title of the client of the case.

In this case, the personal identification data we will obstruct follows the format xxxxxx-xxxx where all instances of "x" is a digit.

2. Implementation

For this case, we will work with a PDF document stored on a case registered under a specific client. The client's name is Jesper Jensen and their name is mentioned throughout the document. Along with their name, their personal identification number, 170878-1353, is also mentioned throughout the document.

Here's a screenshot of the document:

We are gong to implement a process for redacting the personal information from the document.

We begin in the Process Builder inside the WorkPoint 365 Administration:

  1. We begin by clicking "New" to create a new process.
  1. In this instance, we create the new process from scratch.
WorkPoint Process Builder - Google Chrome
  1. We title the process "Manipulate PDF".
  2. Next, we select "User Process" and place the process in the "Item Processes" group.
  3. Finally, we click the "Begin" button to create the new process.

For this process, we are going to use two steps: a Search item form, and an Edit and manipulate PDF step.

Our Search item form is configured as follows:

WorkPoint Process Builder - Google Chrome

With this configuration, we will be able to search for documents inside the Documents library of the current case site. We have defined a view for the library which only shows PDF documents, which we use as a default view for the search form. We have also selected to not allow multi-select.

The Edit and manipulate PDF step is configured as follows:

WorkPoint Process Builder - Google Chrome
  1. In the Step Input field in the General tab, we select the output from the Search item form.
WorkPoint Process Builder - Google Chrome
  1. In the Options tab, we could set a Compress level, but since our PDF document is not very large or contains images, we leave this field blank.
  2. Next, we add a new configuration of type "Regular expression Redaction", which we title "CPR redaction". This will be the configuration which redacts the personal information in the document.
  3. In the "Regular expression to redact" field, we can enter a regular expression which will be used to identify a pattern of a phrase. This can help us find all instances of the text we want to redact. In this instance, the regular expression we use is the following:
^(0[1-9]|[12]\d|3[01])(0[1-9]|1[0-2])\d{2}[-]\d{4}

This regular expression will match phrases following the pattern xxxxxx-xxxx where all instances of x is a digit from 1 to 9.

  1. In the "Replace color", we can select which color will be used to redact the matches of the regular expression in the document. In this instance, we select "Black", but there are many more options.

Next, we add another configuration to the step:

  1. We click the "New configuration" button and title it "Client name redaction".
  2. This process will be run from a Case site, but the name we want to redact from the document is the title of the client. We can get the title of the client by looking at the current case's wpParent field. In the "Text to redact" field, we use the following adaptive expression:
Entity.wpParent.LookupValue

In our case, this adaptive expression should return "Jesper Jensen", which is the name we want to redact from the document.

  1. In this case, we select "Case sensitive" in the "Text match options" field. If we wanted to only find and redact only instances of the whole phrase "Jesper Jensen", we could also select "Whole word".
  2. In the "Replace color" field we select "Black".

At this point, we can save and publish the process. In the next section of this article, we will run the process and see the result.

3. Execution

In this section, we will run the Manipulate PDF process we created previously.

A My Tools button has been set up on case sites to run the process. Let's see it in action.

We begin in a case site:

We note that the client of the case is "Jesper Jensen".

One of the documents on the case is the "Evaluation" PDF document. This is the document from which we want to redact the personal identification number as well as the client name.

The document in it's current form is as follows:

Let's now run the process and redact the mentioned information:

Blue - Home - Google Chrome
  1. In the My Tools panel, we click the button to run the Manipulate PDF process.
  1. In the Search item form step, we select the PDF from the case we want to redact information from and click "Continue".

The Edit and manipulate PDF step now runs and redacts the information we specified:

  1. After the process is succeeded, we click the "Close" button.

Let's open the PDF and see if the information was correctly redacted:

We can see that the personal identification numbers and the client's title (Jesper Jensen) has correctly been redacted from the document. Users are not able to read or copy this information from the document.

Have more questions? Submit a request