Guides

Run GPT-2 Inside a Cage using Python

In this post you will build an app that runs GPT-2 in a Cage. You can follow along by cloning this repo.

Using language model APIs and services is common; you’ve probably already used one for a programming task or to help you write a blog post in the past week. However, in order to use some of the most popular ones available, you must relinquish privacy and security. Personal information is passed to these providers, and sensitive data within prompts may be shared and stored.

One way to utilize the power of language models while ensuring the protection of personal and prompt data is by running one in a secure enclave — a cloud based Trusted Execution Environment (TEE). Evervault Cages allow you to build, deploy, and scale apps running in Docker containers inside secure enclaves In this guide you will use Python to run a language model inside of a Cage.

When run, this will be the result.

Input:

1
prompt = "Encryption is"

Output:

1
Encryption is always a hard problem to solve.
2
Encryption is the process of encoding data into a cipher.
3
Encryption is important to us as this should keep your data safe.
4
Encryption is done through your browser. Here.

As you can see it’s not a perfect output We’ve had to use an older, smaller language model and the results won’t be as detailed as more advanced models (but of course, even newer versions make mistakes). To train this model we used the minGPT repo from Andrej Karpathy. The pretrained model weights are available to download here if you’d like to skip training it yourself.

To try training and fine-tuning GPT-2 or minGPT yourself, you can use this Google Colab notebook. It can work running on CPUs, but will work faster if you can access GPUs.

Prerequisites

Set up

Install the Cages CLI by running the following command.

1
curl https://cage-build-assets.evervault.com/cli/install -sL | sh

Add the model to S3

Next, either using your own weights from the previous steps, or the model weights we've provided, add the .pt file to an S3 bucket. This will allow your Cage to access the model.

In a production instance, by loading the model weights from a private S3 bucket they will remain a secret from the user, so as a model provider you could charge for their use.

While in your AWS account, grab your access key and secret access key: You will need to add these to your Evervault account in a few steps.

Set up the Python app

The back end of the app takes a model downloaded from S3 and passes a prompt you input in a POST request to the model to generate responses.

Load the model

This bit of code will set the device as CPU since we won’t have access to GPUs. It will then get the weights from S3 and load them.

1
device = 'cpu'
2
3
s3 = boto3.client('s3', aws_access_key_id=os.environ.get('ACCESS_KEY'), aws_secret_access_key=os.environ.get('SECRET_ACCESS_KEY'), region_name=os.environ.get('S3_REGION'))
4
s3.download_file(os.environ.get('BUCKET_NAME'),'mingpt','./mingpt.pt')
5
6
model = torch.load('./mingpt.pt')
7
model.to(device)
8
model.eval()

Get the prompt

The app is running as a simple flask app that will retrieve the prompt from the data sent in a POST request. You can override the number of samples in num_samples to increase or decrease the number of responses sent back. The response will also include the time that the text generation took.

1
@app.route('/generate', methods=['POST'])
2
def generate():
3
4
start_time = time.time()
5
prompt = request.json.get('prompt')
6
7
app.logger.info("running")
8
9
generated_text = generate_from_prompt(prompt=prompt, num_samples=5, steps=20)
10
end_time = time.time() - start_time
11
12
app.logger.info("Total Taken => ",end_time)
13
14
response = {
15
'text': generated_text,
16
'time': end_time
17
}
18
19
return jsonify(response)

Generate the response

This code will take the prompt and generate the given number of responses. It uses a GPT-2 tokenizer provided by HuggingFace and returns the responses generated concatenated as a string.

1
def generate_from_prompt(prompt='', num_samples=10, steps=20, do_sample=True):
2
tokenizer = GPT2Tokenizer.from_pretrained('gpt2-xl')
3
if prompt == '':
4
# to create unconditional samples...
5
# huggingface/transformers tokenizer special cases these strings
6
prompt = '<|endoftext|>'
7
encoded_input = tokenizer(prompt, return_tensors='pt').to(device)
8
x = encoded_input['input_ids']
9
10
# we'll process all desired num_samples in a batch, so expand out the batch dim
11
x = x.expand(num_samples, -1)
12
13
# forward the model `steps` times to get samples, in a batch
14
y = model.generate(x, max_new_tokens=steps, do_sample=do_sample, top_k=40)
15
16
responses = ""
17
for i in range(num_samples):
18
out = tokenizer.decode(y[i].cpu().squeeze())
19
responses += out + "\n"
20
21
return responses

Open up the Dockerfile. You will use a virtual environment to install the required libraries needed to run the app. You’ll also tell Docker your webserver will listen on port 8008, which matches the Flask server port defined in app.py.

1
FROM python:3.10-slim
2
3
# create a virtual environment
4
RUN python3 -m venv /opt/venv
5
# install the requirements to the virtual environment
6
COPY requirements.txt requirements.txt
7
RUN /opt/venv/bin/pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html
8
9
COPY app.py /app.py
10
11
# this must match the port your Flask server in app.py is running on
12
EXPOSE 8008
13
# use the python binary in the virtual environment we've set up to run our server
14
ENTRYPOINT ["/opt/venv/bin/python", "/app.py"]

Initialize the Cage

First, make sure that you have Docker running. Then, in your terminal run the following command to initalize the Cage. You can use the suggested name below or change it to one of your choosing.

1
ev-cage init -f ./Dockerfile \
2
--name llm-cage

You should see that a cert.pem, key.pem and cage.toml are generated. Open up the cage.toml. You can see that important details are generated that will help with the deployment of your Cage.

Add Environment Variables

Your Cage will need to access your S3 credentials as well as your Cage name and App ID. Add all of these variables by going to Cages > Environment in your Evervault Dashboard. Be sure to check the “secret” box on any sensitive credentials (like the access keys). This will encrypt them and they will only be decrypted in the Cage.

Build the Cage

Now build the Cage using the following command. This will also generate a Dockerfile used to generate the EIF file which will run inside the enclave (it may take a few minutes).

1
ev-cage build --write --output .

Deploy the Cage

Finally, you can deploy the Cage.

1
ev-cage deploy --eif-path ./enclave.eif

Note that if you are commiting your code, you will want to add the .eif file to .gitignore as it is a large file.

Make a Request

In your terminal make a CURL request to the Cage endpoint and pass in a prompt as JSON.

1
curl -X POST \
2
-H "API-Key: <API_KEY>" \
3
-H 'Content-Type: application/json' \
4
-d '{ "prompt": "Encryption is" }' \
5
https://<cage_name>.<app_uuid>.cages.evervault.com/generate -k

When run successfully, you should get a response that looks like the below.

1
{
2
"text": "Encryption is always a hard problem to solve. Encryption is the process of encoding data into a cipher. Encryption is important to us as this should keep your data safe. Encryption is done through your browser. Here.",
3
"time": 973.0725193023682
4
}

Conclusion

In this guide, you used Python to run GPT-2 inside a Cage. If you ran into an issue or have any questions about this guide, feel free to raise them on GitHub or drop them in the Evervault Discord. Running a different model in a Cage? Let us know — we'd love to hear about it!