We’ve entered a new era for data encryption. Early stage companies are making encryption far more accessible, reliable, and easier to use for developers than ever before.
It’s the progress we need as more data moves to the cloud. The network and access-based approaches we’ve used for securing on-premise data don’t fit the model for cloud security. More encryption is a big part of the solution.
I spoke with Ryan Cooke, the Founder and CEO of JumpWire, about what’s happening in the data encryption market. I first covered JumpWire in my analysis of Y Combinator’s Winter 2022 batch. This discussion goes deeper into both JumpWire and much broader topics around data encryption:
- Data encryption in the cloud: How the age-old problem of data security changes when data moves to the cloud.
- Emerging solutions for data encryption: Comparing new approaches to data encryption.
- How JumpWire works: A deeper dive into how JumpWire helps companies implement data encryption on their existing data architecture.
- Doing data classification and enforcement on the fly: How the approach to data security differs when you can do data classification and enforcement on the fly.
- Approaches for implementing data encryption: Things to think about and nasty ‘gotchas’ when implementing data encryption.
- Why companies implement data encryption: A quick, anecdotal exploration into why people implement data encryption.
- Why data encryption is ready for mass adoption: Overcoming some long-held fears and philosophical concerns about encryption.
I intentionally left this discussion long because (personally) I had a lot of learning to do about how data security and encryption have evolved. Some of the discussion was background knowledge for me. I decided to share it with you rather than assuming you already know everything about the topic. The sections are shorter than usual to help you skip around if you find some parts more relevant than others.
Note: This interview has been lightly edited for clarity.
Data encryption in the cloud
Cole Grolmus: Database security has been a known issue forever, or at least as long as I’ve been in security. Why isn't this a solved problem already?! What happened that left an opportunity for JumpWire to solve such a big problem?
Ryan Cooke: A big part of it is cloud. Traditional data security and data infrastructure tooling that you had in the past is not translated at all to the cloud. A firewall in your data center is very different than an AWS security group. It’s the same thing for other security categories. The technical implementation is completely different between what you would deploy in your data center versus in the cloud.
On the encryption side, TLS is great. It's encrypting bytes on your network wire. You absolutely need that when you're in the data center because anybody can look at traffic across the wire. In the cloud, you're on a virtualization plane where you don't have a network you can just snoop on. So TLS is still important, but it's not providing you with the same security provisions.
In the cloud, all of your infrastructure is an API. Anybody can get access to that infrastructure through the API. Now, your risk and intrusion model changes. In the security industry, we’re going through a pretty major shift in the tooling and the controls of, “okay, what are the actual threats to my data in a Postgres database sitting in Amazon versus a Postgres database sitting on a VM and a blade in a data center?”
Cole Grolmus: We had a heavy over-reliance on things like network security, firewalls, network segmentation, and other controls like that. We could get away with overcompensating on those controls to protect data when everything was on-premise. Fast forward to today, and it seems like encryption may have been the right answer all along. But we rarely did it because we didn't have to.
Ryan Cooke: Yeah, I think that's right. The way I think about it is: because you're in the cloud, you really need to bring everything up to the application level. You need to be putting controls in place between applications and APIs, not at network and storage, which is a lower level part of your compute and tech stack.
We’re not reinventing encryption. We're just using encryption in a different place where there is more risk for data to get lost or improperly accessed.
Emerging solutions for data encryption
Cole Grolmus: It’s an exciting time for data encryption — the most progress I’ve seen in my career, for sure. It’s difficult to keep up with the new approaches because there is so much happening all at once.
At a high level, they fall into two segments:
- Dedicated encryption infrastructure (secure enclave) you put your data into.
- An encryption layer that works with the data infrastructure you already have.
What are the reasons an engineering team would want to pick one approach over another?
Ryan Cooke: For something like PCI, secure enclaves make a ton of sense. PCI is so strongly scoped to systems that have access to PCI data, you want a narrow scope. That's a use case where you really never want to touch the data. However, there’s a whole long tail of information worth protecting that secure enclaves aren't very well built to do from an engineer’s perspective.
If you’re using a secure enclave, you end up incorporating their SDK into your application code, and your application code interacts with their SDK. From an ongoing retrieval aspect, you're talking to an API to get your data. That’s very different code than talking to your database. If you've grown up working on CRUD apps, you have a SQL interface to your database. That's how you think about structuring your data.
Implementing secure enclaves often means segmenting your data. Some of your data is going to be retrieved through a proprietary API. The rest is retrieved through your database. As your data architecture starts to expand, you have to make decisions about where to put the data. You end up putting narrow pieces of data into the secure enclave. There’s a gray area where data may be sensitive or not — it depends on who you ask, or if you’ve compiled a bunch of pieces of data together, now it becomes identifiable, but independently, it’s not.
Our perspective is to make it much easier for you to put an encryption layer around your own database. That way, you don't have to pick and choose between what data is going to go in the vault and what data is going to go in your main database. You end up owning your data architecture. That’s really powerful.
As you go through different iterations of scaling your systems, some things are going to break and be hard to fix with your data infrastructure. And when you go back and revisit that, owning the infrastructure yourself is a huge advantage for engineers.
Cole Grolmus: Native database security and encryption has been around for a while. Why should engineers use a product like JumpWire or a secure enclave instead of implementing their own solution using native features?
Ryan Cooke: That’s one of the hardest points — where we are competing against “let's just build this ourselves.” It's very easy to underestimate the complexity of key management. Even with something like KMS (offered by AWS), you can get into a situation where it's extremely expensive. You're paying for every API call to decrypt the data.
What we've experienced starting down this path in prior roles, you quickly realize that key management is quite difficult — issuing and managing keys, segmenting and rotating keys, partitioning keys by customer, and more. As you start to scale up the solution, some of these use cases become really challenging to build yourself or to continue dedicating engineering resources for maintenance.
Where we see an opportunity is when there's a larger class of data that needs to be protected. Many companies don’t want a dedicated security team managing a data plane or some type of consolidated security layer in the architecture itself.
How JumpWire works
Cole Grolmus: Could you give me an explanation of how JumpWire works and why you’ve chosen to take this direction with the product?
Ryan Cooke: The product has two main components to it. It has a proxy, which we call a proxy engine or policy engine. This is a container that you deploy into your network. It implements HTTP protocols and database protocols. The proxy lives transparently between the application interfaces, whether that's an API, GraphQL, or between an application and a relational database like Postgres.
The idea is that you don't need to change anything on the application side. We're going to manage data security by inspecting requests. We're looking at data as it flows through applications, then making decisions about that data from a security and policy perspective.
We’re sitting inside your network. We're not a SaaS product — today, at least. You don't need to give up all of your most sensitive data just to secure it. You can leave it in your architecture. You don't need to change anything. It gives you control.
The second piece is field-level encryption. We're helping you identify the columns in your database, the properties of a JSON HTTP request, and so on that look like sensitive information. We run heuristics, modeling, and other techniques to say, “this looks like this is sensitive data.” Then, we're giving you a tool that says, “if this is sensitive data, it should be encrypted” with its own key using the approach you want to take for classification, segmentation, and things like that.
The result is if someone were to get into your database, or even internal tools that should not be exposing sensitive data, they're not going to walk away with everything. They're going to see things that are encrypted.
There are a couple different modes that we can operate encryption in. Depending on the database, sometimes field-level encryption happens in the database. You can feel very confident that for any data getting inserted there, encryption happens, because we'll manage triggers and keys using native database capabilities. Or, we can encrypt it in flight as data is being passed through our proxy.
Cole Grolmus: What is a major driver for why customers decide to buy JumpWire?
Ryan Cooke: One of the things engineering leadership finds attractive about our product is their ability to give developers production database access. There are a lot of cases where a non-security incident happens, and someone needs to log in and look at data in the database or inspect production systems. They’re thinking, “Every time I do that, do I open myself up to a risk of that data being compromised?”
What you end up with is a complete picture of data that needs to be treated as sensitive based on its classification. The only way to unencrypt the data is to pass it through our proxy.
From a security perspective, this gives you a very low surface area where you need to be worried about there being a compromise. You can put a lot of people and a lot of tools into your production database. We're giving a really complete level of protection across a very large surface area with a small appliance.
Doing data classification and enforcement on the fly
Ryan Cooke: SOC2 compliance was the kernel that started JumpWire. We’ve been through SOC2 readiness and audits in the past. There’s a huge gap in data classification and data handling policies that never translate into your data architecture. Unless you've been very forward-looking, you lose tons of granularity about how you classify data.
With JumpWire, we're really trying to push those concepts into what's important about any piece of data. Anything that's highly confidential should be field-level encrypted. That's what our tool really unlocks. We do the discovery and data classification. We'll go to all of the schemas in the database, understand the columns being returned, and then run a data classification heuristics based on that. We automate the labeling of all your tables and columns.
Then, we're applying policies on classifications, not on a field that says “first_name.” It gives you holistic policy enforcement and control where you can be bridging your information security program to your data infrastructure in a way that really is not possible without a lot of custom built stuff.
Now, you can audit who's accessing highly confidential data instead of who's accessing social security numbers. You can start to say:
“This application is our email sending service that’s going to send email digests. So, it’s going to need PII. We expect that application to process PII. But this other customer service tool doesn’t ever need an email address. It shouldn't have access to that type of data.”
By running database connectivity through our systems, we can give you that type of control.
Cole Grolmus: This is interesting! So, you're saying that an application connecting to a database via JumpWire — that doesn't know it shouldn't have access to things like email addresses or whatever — JumpWire will block it and not return that sensitive data?
Ryan Cooke: Yeah. We're creating all these virtual proxies. You can create one proxy that says an internal application that needs confidential information can connect through this proxy, and it's going to have its own username and password. And then you have another proxy that just says, “oh, it's only an internal application, it shouldn't be reading PII.”
We're not restricting it or blocking it, but if someone does write a query in that app to pull out sensitive data like a name and email address, they're just going to get the encrypted blob back. They're not able to read it. They don't have the key to read it, so they can't decrypt it.
Cole Grolmus: Data discovery and classification and application-specific enforcement are both really important. For a second, it seemed like magic to me.
You're into a whole other product category. There are entire data security platforms that scan your data repositories, find sensitive data, classify it, etc. I don't think you're sitting here saying you’re going to replace those tools — they have a purpose. But you are saying, “in addition to that, we can actually help you enforce it.”
The enforcement part is really important because after you’ve discovered sensitive data, your security team is sitting there trying to figure out, “well, now what do I do?” JumpWire can say: “Here's all this data. We helped you find it, and we can also implement what you should do to protect it right away.”
Ryan Cooke: Yeah, that's right. Where we play is in situations where you know customer data is sitting in this database, but you don't really know all the schemas it's in. We can just automate that part for you.
Approaches for implementing data encryption
Cole Grolmus: If I was starting greenfield with a brand new application and had a loose schema in my head, that seems like the easiest time to adopt a secure enclave. It's hard to put a percentage on it, but I suspect that’s less than 10% of use cases.
I'm sure the use case you see a lot more is, “I already have an existing application.” And probably worse yet, “I already have a massive existing application that's already scaled beyond the point of no return. We're not going back to do a rewrite here.”
…And then a requirement comes in where we need to secure the data better. “Now what do I do?!” If I'm in that headspace, putting security around my existing data by adding secure data infrastructure seems a whole lot more appealing than migrating data.
Ryan Cooke: Yeah, definitely. Otherwise, it's like a six-month project, right? If you think about what teams have to do to plug in an existing vault into their microservice architecture, where they have a half a dozen databases already, it’s a lot of effort.
Cole Grolmus: Can you talk more about that? What has your experience been with customers bringing existing data architectures and infrastructure in with them?
Ryan Cooke: It's been a big area of focus for us on the product side — how we've architected and implemented our proxies to speak database protocols. There's really no impact that has to happen to your architecture other than self-hosting our proxy.
All of our customers self-host the product today. We think about the rollout in multiple stages. They're not about cycles of engineering work the customer has to do. It’s really validation that everything is working and not breaking existing code in unexpected ways.
We drop our proxy, and it starts forwarding requests to the database. What you're testing is network latency and whether there are any queries that are going to be negatively impacted by JumpWire. If that looks good, we move on to manipulating the data on the fly. So, we're not going to change the data that's been stored in the database. We're simply going to manipulate it as we see the request pass through our proxy.
Then, we create policies and do the classification work that determines which of your fields are sensitive. At that point, if things are looking good, we move to the final phase, which is to rewrite the data in your database. We have tooling that helps do that efficiently. You can run a fairly large scale data migration without impacting in production.
Once that finishes, you’re at a point where new data coming in is being stored in a secure format. Your application is expecting to get it through JumpWire in a particular format, and things run as you expect.
We put a lot of thought into how to roll out a tool and migrate an existing application stack with the least possible interruption, or potential availability faults as we can. What people are really concerned about is not just losing control of their architecture, but introducing something that really degrades the performance of their app. They don't discover that until it's in production and in front of users.
Cole Grolmus: That level of thinking matters — the amount of detail and consideration that's gone into helping people implement JumpWire in their existing data architecture. When you think about customer adoption for something like this, a massive barrier to getting started is, “OMG, how do I take all this data and not break my app to make encryption happen?”
That's got to be a non-starter for a lot of people who don't have a burning platform to implement data encryption. The fact that you can do it in a staged out way that doesn't require a big-bang cutover is really appealing.
Ryan Cooke: We really think about engineering teams. A lot of times, they're not the ones who have decided that they need to have this in their overall architecture. The reality of engineering work is that you get pushed things from others.
We want to make encryption as easy as possible for teams, where the organization has decided it’s the priority, and they're now on the hook to implement it. Do they want to spend several months out of their iteration cycles working on this problem? Or, can we solve it a lot faster so they can go back to product development?
Why companies implement data encryption
Cole Grolmus: What is the main driver of people wanting to adopt a data encryption solution?
Developers are a lot more security-conscious now, but an encryption solution this sophisticated isn’t super high on most people’s priority list. It seems more likely to be pushed down from somewhere else, as opposed to, “hey, we want to do the right thing here.”
Ryan Cooke: We actually do get a decent number of engineers coming to us saying, “I'm uncomfortable with how we're storing data.” Now, can they prioritize that into an initiative to adopt JumpWire? Unfortunately, not always.
Early in the life of a business, the goal is to build a viable product. But there are a set of engineers who are aware of security. There's an opportunity for that basic level of awareness to come to the next level — where I need to have a little more consideration for data that's been stored in my database and how it's getting secured.
Cole Grolmus: I’m interested in the intrinsic versus extrinsic motivation here. Is this developers wanting to do the right thing? Or is it somebody telling them that they have to do it?
I’m sure the reality is some of both. But it’s encouraging to hear people realize that there's some data that's just not okay to have sitting in plain text and know they need to do something about it.
Ryan Cooke: It’s some of both, but mostly external. My expectations when we were starting was the motivation would be 100% external. I've been really surprised with these cases where there is some self-identification happening on the engineering side.
But you're absolutely right — the vast majority of motivation is going to come from higher levels in the organization, maybe even someone in Legal saying, “we're not we're not doing the what we should be doing to handle this data properly.”
Cole Grolmus: Motivation for adoption being driven by other people probably lends itself better to a solution like JumpWire, where they're not having to totally refactor everything. That’s my concern about companies in this category that need developers to be on board from the beginning of an app. Somebody would really have to want to do the right thing from the start.
For that business to work, it requires developers changing a behavior that’s really hard to change. JumpWire is sort of the beneficiary on the other side. The behavior that’s more likely to happen is engineering teams are going to get told down the line that they have to implement encryption. Now they’re stuck working within the data infrastructure they already have. JumpWire is a good solution for doing that.
Why data encryption is ready for mass adoption
Cole Grolmus: A lot of people are afraid of encrypting their data. They try to encrypt as little as possible because there’s this fear of “encryption is really hard.” And, “If I mess this up, I'm going to lose my data.”
The downside of bad things happening with my data that were irreversible was worse than the risk of having unencrypted data get hacked. That might be an outdated mindset.
What strikes me about companies like yours and other early stage companies who focus on encryption is that you’re making encryption less scary.
Ryan Cooke: Yeah. If you make a mistake, you can shoot yourself in the foot. When you automate and have a programmatic approach to key management, you're reducing the risk of losing a key considerably. That's the worst case scenario.
I remember when we would generate the root encryption keys, put it on a thumb drive, and put it in a safety deposit box. We were so afraid of losing the key because it’s everything. When you approach key management programmatically, it becomes a lot more predictable. We can put in guarantees in the system that's handling the encryption that those keys aren't going to get lost and that they are recoverable.
From a cost perspective, by having a system like ours where we can make decisions around whether to decrypt or not, that's going to vastly reduce the potential that you end up spending a lot on compute to decrypt swaths of data. We’re applying a policy-driven approach. It's very predictable about what is going to happen to compute needs for decrypting some quantity of data. We can demonstrate that up front.
So, those are a lot of the fears practitioners might have from a philosophy standpoint. A lot of things these days are encrypted by default. AWS made a big push on S3, where everything's now encrypted on the server side by default. Every website uses TLS. So, some of the philosophical concerns about encryption are becoming less and less concerning.