The team at FormAssembly is constantly working to improve its platform for the benefit of its customers and their clients. FormAssembly leverages cutting-edge technology to improve platform stability and performance while protecting customer privacy.
From small businesses to major metropolitan governments, FormAssembly helps organizations around the world collect a lot of data — much of which is sensitive information. It’s also FormAssembly’s mission to help customers be good stewards of the data entrusted to them. As a whole, the company takes both of those responsibilities very seriously. That means that no employee ever accesses the data that’s collected.
FormAssembly’s desire to constantly improve while practicing proper data stewardship can create some challenges for our engineers. For example, the QA and Engineering departments need realistic response data for the purposes of testing and development. However, because of stringent internal privacy controls, the team can’t use any response data collected by its production systems.
So, can FormAssembly’s engineering team generate synthetic, yet realistic-looking data while protecting customer privacy? With a neural network, they can.
The Engineering Challenge
FormAssembly forms are open-ended and created by platform users rather than the FormAssembly team. As a result, the engineering team simply doesn’t know apriori what sort of responses they need to generate. This, in turn, means that the team can’t simply write a Python script or two to programmatically generate response data and expect it to look anything like real life.
However, given an arbitrary form, an experienced FormAssembly engineer can easily produce an appropriate mock response. The problem is that that process doesn’t scale. The team needs hundreds of thousands of realistic responses in order to stress-test FormAssembly’s systems. But there is a way to replicate the judgment of an engineer at scale: neural networks.
A neural network (NN) is “a method in artificial intelligence that teaches computers to process data in a way that is inspired by the human brain.” Basically, it’s a fancy equation that’s capable of taking complicated inputs and transforming them into sophisticated outputs. NNs deal very well with ambiguous problems like “produce a realistic response for this form.”
In FormAssembly’s case, the team realized they needed to translate one language to another. Thankfully, NNs are really good at translating. By turning the response generation problem into a translation problem, the team was able to take advantage of all the work that’s been done in that area in the past few years. Now, the engineers were in the position to give their in-house NN a form and have it generate a bunch of realistic-looking responses. Now, the team can generate the data they need for engineering and QA efforts without giving staff direct access to actual responses. Here’s how it works on a technical level.
Diving into the Nerdy, Technical Details
The non-technical TLDR: The engineering team collected a large, random sample of real forms that FormAssembly customers use to collect data. They ran pairs of forms and responses through a transformer to teach it out to process responses. The end result is a neural network that will produce an appropriate response field with synthetic data.
Behind the scenes, FormAssembly forms and responses are stored as structured documents. The documents describing the forms are highly variable, and there’s a lot of context that the engineering team wants the NN to pick up. For example, the team wants to generate name-looking strings for fields with labels such as “name” or “surname” or “nombre,” as well as generate address-looking strings for fields with labels such as “address” or “street” or “addr. line 1.” The desire to capture this rich-but-fuzzy context is what made the engineers look into NNs in the first place. What the team needed was a function such as this:
f(<form>) → <response>
For those familiar with functions, this looks a lot like a translation problem. The team is mapping a lump of text describing a form to another lump of text describing a response. Characterizing the problem in this way allows the team to deploy a NN and train it using pairs such as this:
(<form field>, <response field>)
Because the team is dealing with structured documents this turns out to be a lot easier, from a computational standpoint, than training a transformer to do full-blown natural language translation. The team is able to produce a transformer with acceptable output in about four hours worth of training on a Macbook Pro.
The trained NN can then be used in conjunction with some harness scripts to create large-scale data sets that are, from a statistical standpoint, representative of what the team sees in production. These data sets can then be used in various integration environments, seeding these environments in this way helps the team ensure that they’re replicating real-world conditions in terms of scale and data composition.
Building a Better Platform for You
So, what does this all mean for you, our customers? It means FormAssembly’s platform will continue to improve, creating a better experience for you and your customers. Safer data, minimized downtime, and new features will help you be a better steward of the data entrusted to you.