🗓️ Live Webinar November 9: How HealthMatch.io Used Customer.io and RudderStack to Launch Their New Business Model in 24 Hours
Blog
PRODUCT
Protect Personally Identifiable Information (PII) in Your Apps Using RudderStack
Subscribe
We'll send you updates from the blog and monthly release notes.
Dipanjan Biswas
Co-Founder and Director at RudderStack India
April 03, 2020
Introduction
Personally Identifiable Information (PII) is the information that may be used to identify and track an individual. GDPR mandates software companies to encrypt any PII and ensure that they protect the users’ identity from any misuse. As a result, In a post-GDPR world, all organizations need to detect and mask/obfuscate/delete PII data flowing through their information systems. Refer to this article for more information on PII and how to protect it.
One typical reason for PII data leak is human error. Developers add various user-centric attributes as part of the “traits” structure in a message. Due to this, one might inadvertently include a PII event that gets forwarded to the destination. This could be almost a disaster from the data privacy perspective.
At RudderStack, we simplify the process of performing PII checks and incorporating corrective actions on the streaming data. Hence, we have provided template PII Detection and Masking code on GitHub. Developers/administrators can introduce this code as a user transformation in their RudderStack installation via the Config Plane. This transformation will protect human PII data leak oversights from resulting in serious non-compliances.
You can mask/obfuscate PII within RudderStack, as a result, you eliminate:
- The need to encrypt such data for GDPR compliance
- Risks associated with a potential data breach
- The need to search through and delete data in the event of withdrawal of consent
Implementation
First of all, copy the code from GitHub into the Transformation
window under Transformation Settings
for a user transformation:
User Transformation Page in Rudder Configuration Plane
The code in question leverages the fuzzysort implementation. The following code block demonstrates the implementation:
JAVASCRIPT
function transform(events) {//traverse the JSON structure and replace PII fields with obfuscations for fuzzy matched keysfor( var i = 0; i < events.length; i++) {event = events[i];walk(event,['SSN','Social Security Number', 'social security no.','social sec num', 'ssnum'],'XXX-XX-XXXX');}return events;}function walk(obj,targetKeyArray,newValue) {for (var key in obj) {value = obj[key]if (value && (typeof value == 'object')){ //recurse till leaf is reachedwalk(value,targetKeyArray,newValue)}fs = fuzzysortNew()matches = fs.go(key,targetKeyArray,{allowTypo: true})if ((typeof matches != "undefined") && Array.isArray(matches)) {if (matches.length > 0) {obj[key] = newValue;}}}
PII Substitution Logic
The transform
method is the entry-point for any user transformation. It takes an array of event
objects as an argument and returns an array of transformed event
objects.
So, in the sample above, we iterate through each event in the array and invoke the walk
method. The walk
method takes three arguments:
- The object itself which has to be traversed
- List of keywords (i.e., field names) for which fuzzy matching is to be performed
- The substitution value
You can include all variations of all target field names in a single list and pass that as an argument. Furthermore, the values of all fields such as SSN/Social Security Number/Credit Card Number/First Name/Last Name would get replaced with a single masked value, XXX-XX-XXXX. In contrast, the requirement might be to have different substitution values for different fields. In that case, you can call walk
multiple times with different lists to match. The walk
method recurses through the object hierarchy. So, as part of the recursion, the actual substitution takes place only at the leaf
nodes. A leaf is a primitive data type node.
You can choose to use different variations of methods available in the library and described in fuzzysort. The sample implementation uses the go
method and specifies allowTypo
as true
. So, the code would consider that a word with a single transposed letter as a valid match. As a result, whenever the code finds a match, it replaces the value of the matched key with the substitution specified.
Sample PII Detection and Masking Transformation in Action
The following screenshots show the transformation in action. The first screenshot shows an exact match with one of the keywords specified. Furthermore, the second screenshot shows a fuzzy match where the field name does not exactly match any of the keywords supplied.
Replacement with Direct Match
Conclusion
Template PII Detection and Masking code is a handy tool to integrate PII detection and masking into your RudderStack installation. Because of this, we have made it available as a part of our open-source sample transformations collection. You can choose to vary the degree of complexity/flexibility you want in your detection and masking code and modify the walk event accordingly.
Sign up for Free and Start Sending Data
Test out our event stream, ELT, and reverse-ETL pipelines. Use our HTTP source to send data in less than 5 minutes, or install one of our 12 SDKs in your website or app. Get started.
ABOUT THE AUTHOR
Dipanjan Biswas
Co-Founder and Director at RudderStack India
Recent Posts
Subscribe
We'll send you updates from the blog and monthly release notes.
Get started today
Start building smarter customer data pipelines today with RudderStack. Our solutions engineering team is here to help.
This site uses cookies to improve your experience. If you want to learn more about cookies and why we use them, visit our cookie policy. We’ll assume you’re ok with this, but you can opt-out if you wish Cookie Settings.