Protect customer data in your cloud app
Find out how to store sensitive data in your database so that even if it leaks, there is no harm to you nor your users. For Python lovers, you’ll find a drop-in replacement for TextFields in your favourite ORM you can start using today.
Let’s suppose you have a database - embedded with your desktop app or the one you deploy in the cloud for your web app. Now you end up storing sensitive data there. The problem is you don’t have control over the environment where the data is stored. This brings extra risk. Stay with me to see how I mitigated it in real cases and how you can do so as well.
Embedded SQLite primer
My first case of this problem was a desktop app. Back there it was an SQLite database. The approach was to encrypt the database, so that only authorized software could read it.
How to tackle it? I considered encrypting the SQLite file or encrypting fields within the SQLite. In the end I went with encrypting fields within the database. This way it is reusable and you don’t have to load the entire database in memory to decrypt or leave it in plaintext for any moment.
Since there already was a database logic within the application, the goal was to minimize the amount of changes needed. For this reason the whole encryption magic is to happen inside the ORM and for the rest of the app it’s transparent, business as usual.
Of course, the migration needs to be handled as with any database update. In this case, it’s a bit more complex as it’s not only the schema change, every row also needs the appropriate fields encrypted for the updated logic to work.
AES-256-CTR cipher was chosen. As for the encryption keys, I went with a simple single key for the whole database. There was no reason for anything more complex than that. We ensure the key is generated on upgrade from the old version and on the first run.
To store the encrypted content in the database, first idea was to simply put binary blobs here. This, however, caused some issues. For example, querying the SQLite in terminal for debugging was a pain. Better approach turned to be simple base64 encoding of the ciphertext and storing it in a text field. In addition, new field was created to store the initialization vector for the row. This is random array of bytes and we also base64 encode them when storing in the database.
It is essential to understand that when fields are encrypted, it is not possible to query against them. The ciphertext is a function of plaintext, key and IV, and the IV is specific to the query row. Such query prohibition is a property of ensuring privacy. When it is needed to query against the now encrypted values, the solution is very domain- specific. At times, we can modify the data or compute hashes. However, we must discuss the solution with the customer regarding the security objectives they are to achieve. For these reasons, data migration is also domain-specific.
On another project, I encountered a similar problem. However, this time it was mongo for the database and a lot of sensitive fields. The approach there was to split the fields into sensitive and metadata used for querying. The sensitive fields were encrypted and added into the output json as a single field. On top of that, additional fields for IV and HMAC were added to the output:
payload JSON can be stored in the database then.
For the encryption in Python we use the standard cryptography module from the Python Cryptographic Authority. Encrypting JSONs is no magic - we dump it into bytes and encrypt those as in the example below (AES with CBC used there):
Do you find this useful?
Now, this solution worked in a particular case. It was very domain specific though. Nevertheless, once I saw there is a use case for this and you could easily work with encryption in Python my goal was to create something anyone could use in their project.
For this reason, I created a library where you can add an encrypted text field into SQLAlchemy or Django ORM.
Inside, there is a
CryptoContainer. The idea is to create an object with ciphertext and IV or plaintext and get access to all of ciphertext, plaintext and IV in a unified way.
Moreover, I didn’t like the original mechanism of having a separate column for the
initialization vector as it complicates the migration. What I want to achieve here is
a drop-in replacement for
TextFields. For this reason, I leverage the fact that base64
encoding doesn’t produce special characters. We can use a known extra character, such as
as a separator.
In addition, you can see the
encryptor property passed in. This is an object with a simple
interface. Then we can create encryptors like
AES256CTREncryptor for a chosen encryption method.
In the end we get a simple interface to use in our ORM:
You can work with this fields the same way you do with
When the application code reads the value, the plaintext is returned from the container. On the other hand, when the data are written into database, we encrypt it and leverage
__str__ implementation on the CryptoContainer that concatenates the base64 encoded ciphertext and the IV:
Are you interested in protecting the data in your Python web app? Find out how to get cryptbase and use it in your code today.