In this post you will build an app that runs GPT-2 in a Enclave. You can follow along by cloning this repo.
Using language model APIs and services is common; you’ve probably already used one for a programming task or to help you write a blog post in the past week. However, in order to use some of the most popular ones available, you must relinquish privacy and security. Personal information is passed to these providers, and sensitive data within prompts may be shared and stored.
One way to utilize the power of language models while ensuring the protection of personal and prompt data is by running one in a secure enclave — a cloud based Trusted Execution Environment (TEE). Evervault Enclaves allow you to build, deploy, and scale apps running in Docker containers inside secure enclaves In this guide you will use Python to run a language model inside of an Enclave.
When run, this will be the result.
As you can see it’s not a perfect output We’ve had to use an older, smaller language model and the results won’t be as detailed as more advanced models (but of course, even newer versions make mistakes). To train this model we used the minGPT repo from Andrej Karpathy. The pretrained model weights are available to download here if you’d like to skip training it yourself.
To try training and fine-tuning GPT-2 or minGPT yourself, you can use this Google Colab notebook. It can work running on CPUs, but will work faster if you can access GPUs.
- An AWS account (sign up or log in here)
- An Evervault account (create a free account here)
- Docker installed (get it here)
Install the Enclaves CLI by running the following command.
Next, either using your own weights from the previous steps, or the model weights we've provided, add the
.pt file to an S3 bucket. This will allow your Enclave to access the model.
In a production instance, by loading the model weights from a private S3 bucket they will remain a secret from the user, so as a model provider you could charge for their use.
While in your AWS account, grab your access key and secret access key: You will need to add these to your Evervault account in a few steps.
The back end of the app takes a model downloaded from S3 and passes a prompt you input in a
POST request to the model to generate responses.
This bit of code will set the device as CPU since we won’t have access to GPUs. It will then get the weights from S3 and load them.
The app is running as a simple flask app that will retrieve the prompt from the data sent in a
POST request. You can override the number of samples in
num_samples to increase or decrease the number of responses sent back. The response will also include the time that the text generation took.
This code will take the prompt and generate the given number of responses. It uses a GPT-2 tokenizer provided by HuggingFace and returns the responses generated concatenated as a string.
Open up the Dockerfile. You will use a virtual environment to install the required libraries needed to run the app. You’ll also tell Docker your webserver will listen on port 8008, which matches the Flask server port defined in app.py.
First, make sure that you have Docker running. Then, in your terminal run the following command to initalize the Enclave. You can use the suggested name below or change it to one of your choosing.
You should see that a
enclave.toml are generated. Open up the
enclave.toml. You can see that important details are generated that will help with the deployment of your Enclave.
Your Enclave will need to access your S3 credentials as well as your Enclave name and App ID. Add all of these variables by going to Enclaves > Environment in your Evervault Dashboard. Be sure to check the “secret” box on any sensitive credentials (like the access keys). This will encrypt them and they will only be decrypted in the Enclave.
Now build the Enclave using the following command. This will also generate a Dockerfile used to generate the EIF file which will run inside the Enclave (it may take a few minutes).
Finally, you can deploy the Enclave.
Note that if you are commiting your code, you will want to add the
.eif file to
.gitignore as it is a large file.
In your terminal make a CURL request to the Enclave endpoint and pass in a prompt as JSON.
When run successfully, you should get a response that looks like the below.
In this guide, you used Python to run GPT-2 inside an Enclave. If you ran into an issue or have any questions about this guide, feel free to raise them on GitHub. Running a different model in a Enclave? Let us know — we'd love to hear about it!