CodeGate and Everything is a Filter


By Curtis Collicutt

December 27, 2024
CodeGate and Everything is a Filter
đź’ˇ

The open source project CodeGate is a security proxy that sits between the LLM and the users IDE and filters input and output, with features like avoiding API key leakage, checking for insecure dependencies and insecure code. It is stewarded by Stacklok, a company that is focused on making open source software more secure.


Introduction

Prompt Injection Everything is untrusted

One problem with securing LLMs is that they conflate the data plane and the control plane into one big thing, a giant hairball if you like. This means that almost everything we do to secure their use is some kind of filter, be it pre- or post-processing.

So it is no surprise that we are seeing new products and new open source tools that are LLM security filters, such as the open source project CodeGate that I will be looking at in this post.

CodeGate

"

CodeGate is a local proxy that sits between your AI coding assistant and LLM. CodeGate vets your prompts for any potential secrets exfiltration—encrypting secrets before they leave your desktop and decrypting them in responses. And CodeGate uses Retrieval Augmented Generation to update the knowledge base of any LLM with relevant risk insight.

- CodeGate

If you’ve read any of my posts, or talked to me in person at a TAICO meetup, one thing I think LLMs can do for sure is write code. Integrated Developer Environments (IDEs) have brought LLMs right into the developer workflow, e.g. Cursor, and they can see everything we write, including if we accidentally include an API key or other personal information. In fact, how do we even know if it’s not accidental or intentional when we look at environment variable files like .env? This is part of the problem with LLMs in general: in order to help us, they need to know things about us, see our code, see our environment, possibly even see our desktop.

This is where Codegate comes in, as a sort of security gateway.

The Proxy Problem

The Codegate project is well aware of the “proxy problem”, but they have built their project to solve some specific problems that they found lacking in other proxy projects.

Every other gateway we’ve found suffers from three major shortcomings: (1) they live in the cloud, so your secrets don’t stay on your desktop, (2) they are built by security professionals for security teams who want to measure risk, but not action it, and (3) they are not open source and therefore lack transparency. - CodeGate

CodeGate Security Features

  • Securing API keys from the LLM - CodeGate can help you secure API keys from the LLM by encrypting them before they leave your desktop.

  • Dependency Risk Management - LLMs have a learning cut off–they only know about libraries that were in their training data. Usually they don’t know about new versions of libraries so they will bring in older, possibly insecure dependencies. CodeGate can help by monitoring the dependencies of the code you are writing and alerting you to any potential risks.

  • Insecure code detection - LLMs will surely write insecure code. We can’t rely on them to write secure code all the time, so we need to put in safeguards to help catch insecure code–and this is something Codegate can help with too.

Using CodeGate

I’ll use Github Codepilot and Codegate’s quickstart

  • Run the Codegate docker container
  • Ensure you have VSCode and Github Codepilot installed

Next, you get the Certificate from the Codegate GUI.

ℹ️

This cert is generated against localhost, and you could also create your own cert and use that as well.

Go to:

http://localhost:9090/certificates

codegate-certificate

Download the certificate and install it.

e.g. output on Ubuntu:

sudo update-ca-certificates
Updating certificates in /etc/ssl/certs...
rehash: warning: skipping ca-certificates.crt,it does not contain exactly one certificate or CRL
1 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
Processing triggers for ca-certificates-java (20240118) ...
Adding debian:codegate.pem
done.
done.

Actually, I had to do the below on Ubuntu 24.04 to get VSCode to trust the certificate. It seems like VSCode uses the Chrome certificate store, so we need to add the certificate there.

$ certutil -d sql:$HOME/.pki/nssdb -A -t "C,," -n codegate -i /usr/local/share/ca-certificates/codegate.crt 
$ certutil -d sql:$HOME/.pki/nssdb -L

Certificate Nickname                                         Trust Attributes
                                                             SSL,S/MIME,JAR/XPI

codegate 

Once those are setup, you can use the demo project to test Codegate and its features.

Here we ask Github Codepilot about the demo project’s config.ini file, which has a bunch of secrets in it.

codegate-demo

And here is what we see in the Codegate GUI.

codegate-demo

As well, we can get dependency security information regarding package.py. For example, Codegate knows, via Stacklok Insights, that the invokehttp package is malicious! Very cool.

ℹ️

We can see that Codegate has marked the invokehttp package as malicious.

codegate-demo

In the end, I quite like the idea of Codgegate + Insights.

stacklok-insights

Stacklok

Stacklok is the company behind the OSS project CodeGate.

About Stacklok:

"

Craig McLuckie (co-creator of Kubernetes) and Luke Hinds (creator of Sigstore) founded Stacklok in 2023 with the goal of helping developers produce and consume open source software more safely.

As malicious attacks on open source software continue to grow in number and become more sophisticated (like the recent XZ Utils incident), governments and organizations are calling for increased security and protection against these attacks. Yet open source maintainers—who are often unpaid volunteers, with other full-time jobs—lack the time to stay up to speed on security best practices, and access to freely available tools that can proactively keep their software safe.

Stacklok Insights

One thing that Stacklok does is provide a site/API/database that attempts to determine the trustworthiness of code packages: Stacklok Insights.

CodeGate uses Stacklok Insights to determine the trustworthiness of the code that is being written:

These insights are powered by Stacklok Insight, a free-to-use open source dependency intelligence service. - CodeGate

stacklok-insights

For example, they have a review of our own Baish tool, which is a Python PyPi package. It doesn’t know much about Baish because Baish is new and there’s not much to know yet, but it’s nice to see it there.

Anyone, including me and TAICO, can submit a package to PyPI–this is a feature, not a bug. But it does mean that we need to add some kind of “trustworthiness” filter to the dependency management.

baish

Conclusion

As I mentioned previously, LLMs are both the data plane and the control plane. This makes them difficult to secure because we are mostly reduced to filtering, i.e. filter what we put into the LLM and filter what they give back. (We even do the same thing in our Baish project, at least in terms of what goes into the LLM, looking for prompt injection and things like that.)

Codegate has some interesting features, such as checking for insecure dependencies and insecure code that the LLM is generating, in some cases using specialised tools and databases provided by the parent company Stacklok. The idea of looking for insecure dependencies and insecure code is important when writing applications with LLMs, so it’s not just about what operational secrets we’re exposing (i.e. API keys, etc.), but what code the LLM is actually creating and whether it’s safe or not.

While we currently have a massive overload of cybersecurity tools, with the invention and popularisation of LLMs, we still need more of them–and hopefully more ways to implement them easily and effectively…at the same time making our lives easier.

Clearly we can’t just feed all our information into the LLM: we need to filter it somehow, and one of those filters could be Codegate.

Thanks to the CodeGate Team

I’d also like to thank the CodeGate team for their help in getting this post together, as I had a bit of trouble with the dependency risk feature, and they were very helpful in their discord channel.

Further Reading

Explore more from TAICO