Skip to content

Conversation

@allenheltondev
Copy link

Description

Blog post: How to Properly Secure Your Valkey Deployment

Issues Resolved

#388

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.

@allenheltondev
Copy link
Author

Bump @madolson or @stockholmux for review

@madolson
Copy link
Member

madolson commented Nov 7, 2025

On it now! Sorry for the delay, some work fires recently :)

Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the structure, some nits and misc recommendations, but I think it's close.

featured_image = "/assets/media/featured/random-06.webp"
+++

Most of the production incidents I’ve helped debug started with misconfigurations rather than zero-days or sophisticated exploits.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Most of the production incidents I’ve helped debug started with misconfigurations rather than zero-days or sophisticated exploits.
Most of the production security incidents I’ve helped debug started with misconfigurations rather than zero-days or sophisticated exploits.


Security misconfiguration ranks as A05 in the [OWASP Top 10:2021](https://owasp.org/Top10/A05_2021-Security_Misconfiguration/), with 90% of applications tested showing some form of misconfiguration. That's staggering. And when it comes to infrastructure like Valkey, the stakes are even higher - your cache often sits at the heart of your application, touching every request.

Engineers really care about security - but it is easy to overlook some crucial settings. This is especially true in the cloud, where everything moves really fast. You spin up a Valkey instance inside your VPC, it works, and you move on to the next problem. VPC can lock down your network to the outsiders - but I often see multiple teams being able to access the same VPC. This leaves systems vulnerable to insider threads as well as well intentioned people or microservices that just happen to have a bad day. But using default configurations or enabling unnecessary features can make systems [easy targets for attackers](https://socradar.io/redis-redishell-vulnerability-cve-2025-49844/).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Engineers really care about security - but it is easy to overlook some crucial settings. This is especially true in the cloud, where everything moves really fast. You spin up a Valkey instance inside your VPC, it works, and you move on to the next problem. VPC can lock down your network to the outsiders - but I often see multiple teams being able to access the same VPC. This leaves systems vulnerable to insider threads as well as well intentioned people or microservices that just happen to have a bad day. But using default configurations or enabling unnecessary features can make systems [easy targets for attackers](https://socradar.io/redis-redishell-vulnerability-cve-2025-49844/).
Engineers really care about security - but it is easy to overlook some crucial settings. This is especially true in the cloud, where everything moves really fast. You spin up a Valkey instance inside your VPC, it works, and you move on to the next problem. VPC can lock down your network to the outsiders - but I often see multiple teams being able to access the same VPC. This leaves systems vulnerable to insider threats as well as well intentioned people or microservices that just happen to have a bad day. But using default configurations or enabling unnecessary features can make systems [easy targets for attackers](https://socradar.io/redis-redishell-vulnerability-cve-2025-49844/).


This is where putting your Valkey node inside a VPC is necessary - but not sufficient. Security groups help reinforce access limitation to make sure that only services and people who are intended to access the cluster can do so. Your CI runners probably don't need direct cache access. Each service should have just the access it needs.

Modern infrastructure also handles TLS seamlessly. While it is unlikely that an attacker is sniffing your packets on your cloud network, it is best practice to have encryption in transit - even within your own network.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also worth adding that modern hardware handles TLS handshakes and traffic much better, so it's much less of an impact on hardware than it used to. Redis only added TLS support in 2020, some people might not know about that.


Authentication adds a critical layer of resiliency. The authentication layer protects you if your firewall or other protections fail, unauthenticated clients still can't access your instance.

Valkey supports [two authentication methods](https://valkey.io/topics/security/#authentication): the newer ACL system (Access Control Lists) and the legacy `requirepass`. ACLs give you more flexibility by allowing you to create users with fine-grained permissions tailored to what each service actually needs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also consider adding https://valkey.io/topics/ldap/ which avoids having to manage a separate credential system.

ACL SETUSER admin on >verystrongpassword ~* +@all
```

This principle of least privilege means that even if credentials are compromised, an attacker is limited to only the operations the user can perform. A read-only monitoring account can't flush your entire cache or modify configurations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe provide a suggested user permissions. I typically suggest:

ACL SETUSER application on >password +@all -@dangerous -@scripting

As a good base for applications. Most issues come from scripting or dangerous commands.


Once Valkey is running, your operational posture determines how quickly you can detect and contain issues. Enable logging so you can see what's happening. Monitor for unusual patterns like sudden spikes in command execution, connections from unexpected sources, or commands that shouldn't be running in your environment.

Set resource limits in your configuration. Poorly written operations or runaway commands can impact your cache's availability. `maxmemory`, `timeout`, and `tcp-keepalive` settings aren't just performance tuning - they help protect against resource exhaustion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Set resource limits in your configuration. Poorly written operations or runaway commands can impact your cache's availability. `maxmemory`, `timeout`, and `tcp-keepalive` settings aren't just performance tuning - they help protect against resource exhaustion.
Set resource limits in your configuration. Poorly written operations or runaway commands can impact your cache's availability. `maxmemory`, `timeout`, and `tcp-keepalive` settings aren't just performance tuning - they help protect against resource exhaustion.

I don't think tcp-keepalive does much for resource exhaustion that timeout won't also cover. It's more for unreliable networks.

I also generally recommend not setting timeout anyways. There is a special timeout for unauthenticated users, and the normal timeout normally just causes unnecessary reconnects for normal applications. Maxmemory makes sense though.


Set resource limits in your configuration. Poorly written operations or runaway commands can impact your cache's availability. `maxmemory`, `timeout`, and `tcp-keepalive` settings aren't just performance tuning - they help protect against resource exhaustion.

Observability is part of security! Logs and metrics turn silent failures into visible signals, and visibility is what buys you time to respond before small issues become incidents.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest adding acl_access_denied_auth here, it's authentication failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants