-
Notifications
You must be signed in to change notification settings - Fork 70
Blog post: How to Properly Secure Your Valkey Deployment #389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Bump @madolson or @stockholmux for review |
|
On it now! Sorry for the delay, some work fires recently :) |
madolson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the structure, some nits and misc recommendations, but I think it's close.
| featured_image = "/assets/media/featured/random-06.webp" | ||
| +++ | ||
|
|
||
| Most of the production incidents I’ve helped debug started with misconfigurations rather than zero-days or sophisticated exploits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Most of the production incidents I’ve helped debug started with misconfigurations rather than zero-days or sophisticated exploits. | |
| Most of the production security incidents I’ve helped debug started with misconfigurations rather than zero-days or sophisticated exploits. |
|
|
||
| Security misconfiguration ranks as A05 in the [OWASP Top 10:2021](https://owasp.org/Top10/A05_2021-Security_Misconfiguration/), with 90% of applications tested showing some form of misconfiguration. That's staggering. And when it comes to infrastructure like Valkey, the stakes are even higher - your cache often sits at the heart of your application, touching every request. | ||
|
|
||
| Engineers really care about security - but it is easy to overlook some crucial settings. This is especially true in the cloud, where everything moves really fast. You spin up a Valkey instance inside your VPC, it works, and you move on to the next problem. VPC can lock down your network to the outsiders - but I often see multiple teams being able to access the same VPC. This leaves systems vulnerable to insider threads as well as well intentioned people or microservices that just happen to have a bad day. But using default configurations or enabling unnecessary features can make systems [easy targets for attackers](https://socradar.io/redis-redishell-vulnerability-cve-2025-49844/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Engineers really care about security - but it is easy to overlook some crucial settings. This is especially true in the cloud, where everything moves really fast. You spin up a Valkey instance inside your VPC, it works, and you move on to the next problem. VPC can lock down your network to the outsiders - but I often see multiple teams being able to access the same VPC. This leaves systems vulnerable to insider threads as well as well intentioned people or microservices that just happen to have a bad day. But using default configurations or enabling unnecessary features can make systems [easy targets for attackers](https://socradar.io/redis-redishell-vulnerability-cve-2025-49844/). | |
| Engineers really care about security - but it is easy to overlook some crucial settings. This is especially true in the cloud, where everything moves really fast. You spin up a Valkey instance inside your VPC, it works, and you move on to the next problem. VPC can lock down your network to the outsiders - but I often see multiple teams being able to access the same VPC. This leaves systems vulnerable to insider threats as well as well intentioned people or microservices that just happen to have a bad day. But using default configurations or enabling unnecessary features can make systems [easy targets for attackers](https://socradar.io/redis-redishell-vulnerability-cve-2025-49844/). |
|
|
||
| This is where putting your Valkey node inside a VPC is necessary - but not sufficient. Security groups help reinforce access limitation to make sure that only services and people who are intended to access the cluster can do so. Your CI runners probably don't need direct cache access. Each service should have just the access it needs. | ||
|
|
||
| Modern infrastructure also handles TLS seamlessly. While it is unlikely that an attacker is sniffing your packets on your cloud network, it is best practice to have encryption in transit - even within your own network. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also worth adding that modern hardware handles TLS handshakes and traffic much better, so it's much less of an impact on hardware than it used to. Redis only added TLS support in 2020, some people might not know about that.
|
|
||
| Authentication adds a critical layer of resiliency. The authentication layer protects you if your firewall or other protections fail, unauthenticated clients still can't access your instance. | ||
|
|
||
| Valkey supports [two authentication methods](https://valkey.io/topics/security/#authentication): the newer ACL system (Access Control Lists) and the legacy `requirepass`. ACLs give you more flexibility by allowing you to create users with fine-grained permissions tailored to what each service actually needs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might also consider adding https://valkey.io/topics/ldap/ which avoids having to manage a separate credential system.
| ACL SETUSER admin on >verystrongpassword ~* +@all | ||
| ``` | ||
|
|
||
| This principle of least privilege means that even if credentials are compromised, an attacker is limited to only the operations the user can perform. A read-only monitoring account can't flush your entire cache or modify configurations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe provide a suggested user permissions. I typically suggest:
ACL SETUSER application on >password +@all -@dangerous -@scripting
As a good base for applications. Most issues come from scripting or dangerous commands.
|
|
||
| Once Valkey is running, your operational posture determines how quickly you can detect and contain issues. Enable logging so you can see what's happening. Monitor for unusual patterns like sudden spikes in command execution, connections from unexpected sources, or commands that shouldn't be running in your environment. | ||
|
|
||
| Set resource limits in your configuration. Poorly written operations or runaway commands can impact your cache's availability. `maxmemory`, `timeout`, and `tcp-keepalive` settings aren't just performance tuning - they help protect against resource exhaustion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Set resource limits in your configuration. Poorly written operations or runaway commands can impact your cache's availability. `maxmemory`, `timeout`, and `tcp-keepalive` settings aren't just performance tuning - they help protect against resource exhaustion. | |
| Set resource limits in your configuration. Poorly written operations or runaway commands can impact your cache's availability. `maxmemory`, `timeout`, and `tcp-keepalive` settings aren't just performance tuning - they help protect against resource exhaustion. |
I don't think tcp-keepalive does much for resource exhaustion that timeout won't also cover. It's more for unreliable networks.
I also generally recommend not setting timeout anyways. There is a special timeout for unauthenticated users, and the normal timeout normally just causes unnecessary reconnects for normal applications. Maxmemory makes sense though.
|
|
||
| Set resource limits in your configuration. Poorly written operations or runaway commands can impact your cache's availability. `maxmemory`, `timeout`, and `tcp-keepalive` settings aren't just performance tuning - they help protect against resource exhaustion. | ||
|
|
||
| Observability is part of security! Logs and metrics turn silent failures into visible signals, and visibility is what buys you time to respond before small issues become incidents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest adding acl_access_denied_auth here, it's authentication failures.
Description
Blog post: How to Properly Secure Your Valkey Deployment
Issues Resolved
#388
Check List
--signoffBy submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.