-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client cannot recover from version skew #75541
Comments
Thank you for the link, @leerob. It's not entirely clear from the documentation that the encryption key affects the non-deterministic action IDs. If it does, that doesn't fully address this issue. Even if the encryption key is kept the same across builds it is still possible to get a different action ID and the client still has no way to recover when that happens. I couldn't find any documentation anywhere on how action IDs are generated. Is it some sort of hash involving the file name, function name, and the |
After some more experimentation the following code also does not catch any client-side errors, even though the server output indicates that the action cannot be found: "use client";
import { logServer2 } from "./actions";
export function CallServerActionButton() {
return (
<button
onClick={async () => {
try {
await logServer2();
} catch (e) {
// This never catches anything, even if the action is not found server-side
console.error("Error in CallServerActionButton:", e);
}
}}
>
Server Action
</button>
);
} |
Having the same issue on a Next.js ^15.0.2 app deployed on AWS with SST v2.
This has been really tough to troubleshoot since there are no client-side errors, no way to identify which action is causing it, and no way to track it down. We also can’t determine the user’s experience. We assume the action isn’t executed, leaving the app broken without any way to handle or provide feedback to the user. |
I tried to open an issue to get clarity around this in docs and it was immediately closed by a bot. |
@mbranch For that it's working as expected because we need a GitHub repo link (you provided a link to an issue instead)—the bot should not run when you run the Documentation issue template. |
@knpwrs Did you confirm this with your reproduction? I am not seeing the Environment Variable in in your reproduction. I do agree we could improve our Documentation here, so taking a look at that as well— |
@samcx the reproduction is if you rename a function or do a similar refactoring such as moving a function. Clients which have not refreshed between deployments will attempt to call non-existing actions and the client has no way to recover —- no errors are thrown, and even if one were to install a service worker to intercept fetch calls the response code is 200, even though something like 404 would probably be more appropriate (though given that the network call is abstracted away this doesn’t matter as much as just making some sort of error the client can recover from). |
Exactly the same issue here and this is just the most often reason when an issue like #76149 happened.
I tried something but seems like it will be hard to figure out what happend on service from client side, although nextjs will try to return the whole page if server action is not able to use (not sure with that but seems like it does). And I have an idea, about reading the response on client side, if the server action is done, I can't get my own normal response wrapper back (like code and data), so maybe I could check if the response actually has the related props, to know if the current server action is down. But one more considering is I can't actually get anything back because the whole server action is down, rather than part of it, and even for try catch on server side will also not help. So I can't tell if my server is crashed, or service is updating, or something else. I can't guide user to reload my app again if my server is really crashed. So any better idea? I think it will be more important to know the version or service has changed on the client side, rather than the version control on server side. Of course it would be easier if we have an exposed built-in support for version control on server side. |
And by reading this https://nextjs.org/docs/app/building-your-application/data-fetching/server-actions-and-mutations#overwriting-encryption-keys-advanced, I am also wondering if and why keep the encryption key in sync can solve this issue, for my opinion I think it can't. And I actually only have one pod on one machine, and it should be not the reason about the machines are not in sync, but the client side and server side are not in sync. Also this issue normally won't happen on route handlers, which is normal API. Because if you codes well, you won't always change the API like rename the Anyway, hope we can add more info in docs about this, and much better if we could logs more when this issue happend. It was so hard to debug on a online nextjs service. |
By my tests and some reads from stackoverflow, I think at this time logServer2 is not returning nothing or else, it will return the whole page back, which completes the request. |
We've added it to the build process and the live servers and don't see any difference (still loads of untraceable errors). I'm still not clear if this is a runtime or build-time encryption key. |
|
I think I have a rough solution based on @knpwrs 's repo: LikeDreamwalker/nextjs-skew-recovery-bug To get started, we need to be clear with some concepts:
So based on this issue's original reproduce, we can know that this issue happens because client side is trying to request to server with an out of date action id, which causes the version skew issue. My idea is we try to do something in middleware, if we can know that the incoming server action (or any other) requests is out of date, we can use some way to notice this client to ask it to reload the page. For the deployment part, I choose deploymentId to keep the client and server in sync. This feature seems very much like the normal package verison, but it will be exposed to every request's header if we set it manually. For the checking part, I use middleware to achieve this: import { NextResponse } from "next/server";
import type { NextRequest } from "next/server";
import deploymentId from "./staticId";
export function middleware(request: NextRequest) {
if (request.nextUrl.pathname === "/refresh") {
return NextResponse.next();
}
const clientDeploymentId =
request.nextUrl.searchParams.get("dpl") ||
request.headers.get("x-deployment-id");
if (clientDeploymentId && clientDeploymentId !== deploymentId) {
console.log("Client deployment id:", clientDeploymentId);
console.log("Server deployment id:", deploymentId);
const isAction =
request.method === "POST" && request.headers.has("Next-Action");
const refreshUrl = new URL("/refresh", request.url);
if (isAction) {
// For server actions, set a header to be handled by the action
const response = NextResponse.next();
response.headers.set("x-action-redirect", refreshUrl.toString());
return response;
} else {
// For regular requests, redirect directly
return NextResponse.redirect(refreshUrl);
}
}
}
export const config = {
matcher: "/((?!_next/static|_next/image|favicon.ico).*)",
}; This is a version for test, and if you want to use as a workaround solution, please update the specific parts. There are many ways to notice the client with something, but if you want client to be noticed ASAP, by control both the server action and normal request would be better and redirect to a special route is also acceptable. And here is how you could test with my version:
And this solution is a kind of rough because:
20250221-0430-10.1054142.mp4Solve #76149 |
Does the action-id only change if you change the server-action code somehow? Edit: Got the the answer I guess a workaround would be to just not use serveractions when self-hosting and just use route handlers.. :/ |
What a great work you have done investigating @LikeDreamwalker 🙌 This issue should be quite broad for people using Next 15 and standalone right? Good to address security in v.15 but this is now a serious issue for us in production affecting several customers and very hard to track, understand and fix. Is there any way to turn it off? We are considering doing a re-work and remove all server-actions in favor of route-handlers instead. But as you can imagen this would be a to cumbersome and a fix would be much more preferred. @leerob I think this issue should not be overlooked because this is something that could potentially cause users a lot of issues. You(or someone else) help to find a solution for this would be very appreciated by many I think 🙏. |
@J4v4Scr1pt To reiterate what was mentioned above, you can essentially opt-out of this behavior when self-hosting → https://nextjs.org/docs/app/building-your-application/data-fetching/server-actions-and-mutations#overwriting-encryption-keys-advanced |
Thank you so much for your response! I bet you have 100 other things to do so thx again for your help 🙏. |
Yes! So it'll be up to your digression on how you want to rotate the key. |
Thank you so so much for your reply and I understand it now. But I am thinking if this is a little bit of paradox? For the safety considering, the server actions will be "changed" after every build by default, and this can cause version skew after every build; To avoid the version skew, we can keep the encryption key be the same for every build, which actually make every server action, or we can say the endpoint of our server action API routes, always be the same for every build. This is a little bit of complex but seems like the current conclusion is we accept the version skew for better security, or we solve the version skew and give up for better security. I don't know if I also missed some info (and don't know why I just read the encryption part again to understand this rather than the first time), does next have a way to detect the version skew on the server side, or will we have a plan for the future? I think it would be a better solution if we can detect version skew, set up a business-based solution for version skew (like refresh or something), then we could solve the two issues I have mentioned above. Thanks again! And just for a reminder for others:
|
In my personal (and I suppose professional) opinion, the encryption key for action ids does not provide increased security whatsoever. Especially since there is a mapping of action ids sent down to the client in the server-rendered HTML. I’ve never had a need to obscure endpoints in my last 13 years as a professional web developer. It would be far better to encourage good security practices for developers and avoid all this encryption rigmarole altogether. I can certainly see the argument for using an encryption key to send sensitive session state down to the client, but not for obscuring the action ids. It certainly seems to create more problems than it solves. In any case, this issue is about the client not being able to recover from version skew. The client has no way of knowing that an action call has failed. |
I really agree here. Actions should be treated like any API endpoint with respect to backwards compatibility for clients in the wild. Obfuscating them randomly isn't really security and only hurts logging and observability. I think very often features that might make sense for deploying in Vercel are leaking into Next.js as a framework without as much consideration for those who are deploying and supporting Next.js on their own. I still think there's a lot of missing information about
|
For me, after I realized even for server actions are still the APIs and can be directly called under some conditions, they means not secure, or secure as the normal API to me. I still think the version skew caused by the encryption key and clients recovery from that are more important. After all there are too many other ways to secure a request, but seems like not too many for clients to recovery. However if the encrypt key is good for security, it can be preserved, but it should not have conflicts with the recocery and response. I think recovery and security are two parallel things, no need to bind them together, which will be more complex. Also I wonder if the encrypt key is really designed for the security. In my opinion I think it is more like a tool to make the server actions anonymous. Think about this, if we make a server action named like getUserInfo(), next actually can't make sure there will only be one of them. So the easiest way is to abstract the server actions into a random and only id and register it, use it to call. As for the changing after every build, I think this is the way it does. I would worry more if the id is not changing and causing the cross between server actions if I am designing this. Just a thought, no evidence and I think there must be more consideration which I don't know, and with respect. |
Have you all tried the NEXT_SERVER_ACTIONS_ENCRYPTION_KEY? Edit: But a question, if I change the serverAction code it will also result in a new Id correct? |
I agree we need more details on how this variable is used. I've tested locally and it seems we only need this during build time to retain consistent action ids. Can someone confirm this? I extracted this from the source code, this is how keys are currently generated
I agree with this sentiment unless we implement our own skew protection, rotating this key would cause the same issue. It seems like we have to make a tough trade-off between user experience vs security. |
I think they can remove this completely instead and have it as it was in v14. And I still see this error in our logging service with 5 days passed since I added the NEXT_SERVER_ACTIONS_ENCRYPTION_KEY... |
IDK, do we have the "encryption" part in v14? I thought this is always existed since server action released... |
I based that on this, so maybe I'm wrong :). |
I just remembered, yes. But seems like this not actually helps a lot especially attackers could just observe the business actions to know where the ids points to. 😂 Maybe based on this key to encrypt and decrypt the request and response body will be more completely. |
Like @J4v4Scr1pt asked;
Can we get some clarification on this? Does it change if I edit the specific action? Or other actions in the same file? |
Just adding cacophony of voices. We have implemented our first dozen or so server actions that we were ready to ship to production until we discovered this problem in QA. We are deploying on AWS. As far as I can tell, there is no fix for this. Setting the encryption key NEXT_SERVER_ACTIONS_ENCRYPTION_KEY in both our build and runtime environments does not result in stable IDs. Unless someone has better info on how to achieve stable IDs, we have to delay our new launch and rewrite everything using client side fetching. |
Just a thought, maybe there is a solution is building up a stable and secure API like // A fake API router for understanding
// Incoming a server action name to call the server action on the server side always
const callActions = ({ name }, { name: string }) => {
const res = {}
// Switch Case or some ways to call the specific actions
switch (name) {
case 'exampleServerAction':
const res = exampleServerAction(name);
break;
default:
break;
}
// Return the result as the response, and can be modified for easier using
return res;
} I think this solution, might work because:
This solution seems like stable logically and I didn't test it, but I think it might be helpful if you want to solve the version skew issues thoroughly, which is more like building up your own server action. So the only issue is the cost and you should also keep it safe with external safety check for this API. |
Thanks for the thorough suggestion! We actually have an external api that we use, and its all in the same VPN. There is very little cost to us switching to client side mutations and refetching using SWR for now, and waiting to Vercel comes up with a better solution (or maybe we migrate to a provider that handles this better). |
@jonschmidt If you're using vercel you can probably turn on Skew Protection and not worry about it, right?
I have tried it. While it does not prevent the generation of new server action IDs, they are more stable. I believe what I've observed is the following: Without a consistent NEXT_SERVER_ACTIONS_ENCRYPTION_KEY encryption key:
With a consistent encryption key:
So I think this is a valid workaround and what I will probably try for our deployments:
Hopefully that will work. Anyway this is not good to have to opt into using this environment variable during build steps in order for server actions to be (only semi-) backward-compatible with clients. It would be great if there was some way to have explicit control over what each server action's name is. Or otherwise some other config option that allows us to make sure they do not change from build to build. |
You mean a normal endpoint huh? |
Sure if there's some very easy way to convert all server actions we've already developed into normal endpoints that would be a fine workaround. |
Link to the code that reproduces this issue
https://github.com/knpwrs/nextjs-skew-recovery-bug
To Reproduce
npm run build
npm start
http://localhost:3000
and make sure the browser development tools are open.Server Action
button.200
response code indicating no errors and atext/x-component
mime type.logServer
function inactions.ts
and update the import and usage incomponents.tsx
to match (for instance,logServer
can be renamed tologServer2
).npm run build
npm start
Server Action
button.[Error: Failed to find Server Action "006c3c7b08402d18959b82a9692db1011f32bcc8fd". This request might be from an older or newer deployment. Original error: Cannot read properties of undefined (reading 'workers')]
200
response code indicating no errors and atext/html
mime type.Throw Error
button. Observe an uncaught error in the console.Note that I couldn't get
error.tsx
orglobal-error.tsx
to work for either the failed function call or the thrown client-side error.Current vs. Expected behavior
Currently the client is not able to recover from version skew when a server action cannot be called. Everything appears normal to the client.
I would expect the error boundary to catch an error so the client can refresh and recover.
Provide environment information
Operating System: Platform: darwin Arch: arm64 Version: Darwin Kernel Version 24.2.0: Fri Dec 6 19:01:59 PST 2024; root:xnu-11215.61.5~2/RELEASE_ARM64_T6000 Available memory (MB): 32768 Available CPU cores: 10 Binaries: Node: 23.6.0 npm: 10.9.2 Yarn: 1.22.19 pnpm: 9.12.2 Relevant Packages: next: 15.2.0-canary.33 // Latest available version is detected (15.2.0-canary.33). eslint-config-next: N/A react: 19.0.0 react-dom: 19.0.0 typescript: 5.7.3 Next.js Config: output: N/A
Which area(s) are affected? (Select all that apply)
Server Actions, Error Handling
Which stage(s) are affected? (Select all that apply)
next start (local), Other (Deployed), Vercel (Deployed)
Additional context
This is particularly problematic given the following quote from this blog post:
I couldn't find any documentation about this. It appears that action IDs can change at any time and clients which haven't refreshed yet won't have any way to deal with this.
The text was updated successfully, but these errors were encountered: