Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client cannot recover from version skew #75541

Open
knpwrs opened this issue Jan 31, 2025 · 39 comments
Open

Client cannot recover from version skew #75541

knpwrs opened this issue Jan 31, 2025 · 39 comments
Labels
linear: next Confirmed issue that is tracked by the Next.js team. Server Actions Related to Server Actions.

Comments

@knpwrs
Copy link

knpwrs commented Jan 31, 2025

Link to the code that reproduces this issue

https://github.com/knpwrs/nextjs-skew-recovery-bug

To Reproduce

  1. Run npm run build
  2. Run npm start
  3. Open http://localhost:3000 and make sure the browser development tools are open.
  4. Press the Server Action button.
  5. Observe logs on the server indicating the function was called.
  6. The network response has a 200 response code indicating no errors and a text/x-component mime type.
  7. Shutdown the server, leave the app running in the web browser.
  8. Rename the logServer function in actions.ts and update the import and usage in components.tsx to match (for instance, logServer can be renamed to logServer2).
  9. Run npm run build
  10. Run npm start
  11. Go to the already running app
  12. Press the Server Action button.
  13. Observe an error on the server: [Error: Failed to find Server Action "006c3c7b08402d18959b82a9692db1011f32bcc8fd". This request might be from an older or newer deployment. Original error: Cannot read properties of undefined (reading 'workers')]
  14. There are no errors on the client. Error boundaries do not trigger. There are no uncaught errors in the console. There is no way for the client to know that the function call failed and no way for the client to recover.
  15. The network response has a 200 response code indicating no errors and a text/html mime type.
  16. Press the Throw Error button. Observe an uncaught error in the console.

Note that I couldn't get error.tsx or global-error.tsx to work for either the failed function call or the thrown client-side error.

Current vs. Expected behavior

Currently the client is not able to recover from version skew when a server action cannot be called. Everything appears normal to the client.

I would expect the error boundary to catch an error so the client can refresh and recover.

Provide environment information

Operating System:
  Platform: darwin
  Arch: arm64
  Version: Darwin Kernel Version 24.2.0: Fri Dec  6 19:01:59 PST 2024; root:xnu-11215.61.5~2/RELEASE_ARM64_T6000
  Available memory (MB): 32768
  Available CPU cores: 10
Binaries:
  Node: 23.6.0
  npm: 10.9.2
  Yarn: 1.22.19
  pnpm: 9.12.2
Relevant Packages:
  next: 15.2.0-canary.33 // Latest available version is detected (15.2.0-canary.33).
  eslint-config-next: N/A
  react: 19.0.0
  react-dom: 19.0.0
  typescript: 5.7.3
Next.js Config:
  output: N/A

Which area(s) are affected? (Select all that apply)

Server Actions, Error Handling

Which stage(s) are affected? (Select all that apply)

next start (local), Other (Deployed), Vercel (Deployed)

Additional context

This is particularly problematic given the following quote from this blog post:

Secure action IDs: Next.js now creates unguessable, non-deterministic IDs to allow the client to reference and call the Server Action. These IDs are periodically recalculated between builds for enhanced security.

I couldn't find any documentation about this. It appears that action IDs can change at any time and clients which haven't refreshed yet won't have any way to deal with this.

@github-actions github-actions bot added Error Handling Related to handling errors (e.g., error.tsx, global-error.tsx). Server Actions Related to Server Actions. labels Jan 31, 2025
@leerob
Copy link
Member

leerob commented Jan 31, 2025

Have you tried this? https://nextjs.org/docs/app/building-your-application/data-fetching/server-actions-and-mutations#overwriting-encryption-keys-advanced

@knpwrs
Copy link
Author

knpwrs commented Feb 1, 2025

Thank you for the link, @leerob. It's not entirely clear from the documentation that the encryption key affects the non-deterministic action IDs. If it does, that doesn't fully address this issue.

Even if the encryption key is kept the same across builds it is still possible to get a different action ID and the client still has no way to recover when that happens.

I couldn't find any documentation anywhere on how action IDs are generated. Is it some sort of hash involving the file name, function name, and the NEXT_SERVER_ACTIONS_ENCRYPTION_KEY? In this case renaming the file, renaming the function, moving anything, or even building on a new machine can change the action ID (say, if the action ID is generated with a full absolute path to the file).

@knpwrs
Copy link
Author

knpwrs commented Feb 3, 2025

After some more experimentation the following code also does not catch any client-side errors, even though the server output indicates that the action cannot be found:

"use client";

import { logServer2 } from "./actions";

export function CallServerActionButton() {
  return (
    <button
      onClick={async () => {
        try {
          await logServer2();
        } catch (e) {
          // This never catches anything, even if the action is not found server-side
          console.error("Error in CallServerActionButton:", e);
        }
      }}
    >
      Server Action
    </button>
  );
}

@andresg747
Copy link

Having the same issue on a Next.js ^15.0.2 app deployed on AWS with SST v2.
I agree with @knpwrs. The documentation isn’t clear enough on:

  • How this helps resolve different action IDs across builds
  • How to generate the custom encryption key

This has been really tough to troubleshoot since there are no client-side errors, no way to identify which action is causing it, and no way to track it down. We also can’t determine the user’s experience. We assume the action isn’t executed, leaving the app broken without any way to handle or provide feedback to the user.

@mbranch
Copy link

mbranch commented Feb 5, 2025

I tried to open an issue to get clarity around this in docs and it was immediately closed by a bot.
#75448

@knpwrs
Copy link
Author

knpwrs commented Feb 5, 2025

@mbranch I tried the same thing before opening this issue and it was also closed by the bot: #75492

It seems like the bot just closes all documentation issues because there isn't a field in the template for a reproduction.

@samcx samcx removed the Error Handling Related to handling errors (e.g., error.tsx, global-error.tsx). label Feb 7, 2025
@samcx
Copy link
Member

samcx commented Feb 7, 2025

@mbranch @knpwrs Looks like there's an issue with the GitHub actions closing these Documentation template issues, taking a look 👁

@mbranch
Copy link

mbranch commented Feb 7, 2025

Thanks for looking into this @samcx. Not directly related to this issue, but I similarly tried to open an issue about issues getting closed too quickly (it also got closed 😂): #75449

@samcx
Copy link
Member

samcx commented Feb 8, 2025

@mbranch For that it's working as expected because we need a GitHub repo link (you provided a link to an issue instead)—the bot should not run when you run the Documentation issue template.

@samcx
Copy link
Member

samcx commented Feb 8, 2025

Even if the encryption key is kept the same across builds it is still possible to get a different action ID and the client still has no way to recover when that happens.

@knpwrs Did you confirm this with your reproduction? I am not seeing the Environment Variable in in your reproduction.

I do agree we could improve our Documentation here, so taking a look at that as well—

@knpwrs
Copy link
Author

knpwrs commented Feb 8, 2025

@samcx the reproduction is if you rename a function or do a similar refactoring such as moving a function. Clients which have not refreshed between deployments will attempt to call non-existing actions and the client has no way to recover —- no errors are thrown, and even if one were to install a service worker to intercept fetch calls the response code is 200, even though something like 404 would probably be more appropriate (though given that the network call is abstracted away this doesn’t matter as much as just making some sort of error the client can recover from).

@LikeDreamwalker
Copy link

Exactly the same issue here and this is just the most often reason when an issue like #76149 happened.
For my scenario, this is just a normal build up, but in my app actually many users will stay longer than usual.
You can assume a scenario like this:

  1. User visit your app.
  2. For some reasons, they stay and do nothing. This can be normal if you open up Github and do nothing on it, then maybe hours later you come back to visit some repos, that where the issue affects.
  3. You build up your app, a server action id is changed from 123 to 456.
  4. Maybe hours later, user is back, and try to call the server action 123.
  5. But in your current service, there is actually no server action 123 anymore, only 456, and that cause the issue.

I tried something but seems like it will be hard to figure out what happend on service from client side, although nextjs will try to return the whole page if server action is not able to use (not sure with that but seems like it does).

And I have an idea, about reading the response on client side, if the server action is done, I can't get my own normal response wrapper back (like code and data), so maybe I could check if the response actually has the related props, to know if the current server action is down. But one more considering is I can't actually get anything back because the whole server action is down, rather than part of it, and even for try catch on server side will also not help. So I can't tell if my server is crashed, or service is updating, or something else. I can't guide user to reload my app again if my server is really crashed.

So any better idea? I think it will be more important to know the version or service has changed on the client side, rather than the version control on server side. Of course it would be easier if we have an exposed built-in support for version control on server side.

@LikeDreamwalker
Copy link

And by reading this https://nextjs.org/docs/app/building-your-application/data-fetching/server-actions-and-mutations#overwriting-encryption-keys-advanced, I am also wondering if and why keep the encryption key in sync can solve this issue, for my opinion I think it can't. And I actually only have one pod on one machine, and it should be not the reason about the machines are not in sync, but the client side and server side are not in sync.

Also this issue normally won't happen on route handlers, which is normal API. Because if you codes well, you won't always change the API like rename the /passport/is-login to /passport/is-login-v2 directly, this will actually cause the issue we are having now. Normally add an extra props can solve this issue, like uid: 123 to uid: 123, uid_v2: 12345, or just add the /is-login-v2 and also keep the /passport/is-login.

Anyway, hope we can add more info in docs about this, and much better if we could logs more when this issue happend. It was so hard to debug on a online nextjs service.

@LikeDreamwalker
Copy link

After some more experimentation the following code also does not catch any client-side errors, even though the server output indicates that the action cannot be found:

"use client";

import { logServer2 } from "./actions";

export function CallServerActionButton() {
return (
<button
onClick={async () => {
try {
await logServer2();
} catch (e) {
// This never catches anything, even if the action is not found server-side
console.error("Error in CallServerActionButton:", e);
}
}}
>
Server Action

);
}

By my tests and some reads from stackoverflow, I think at this time logServer2 is not returning nothing or else, it will return the whole page back, which completes the request.
You can test it by using a fetch, or just curl or fetch the server action request directly, if you change the next action header to any other things, it will return the page DOM, rather than 404 or other errors.

@mbranch
Copy link

mbranch commented Feb 19, 2025

@LikeDreamwalker I am also wondering if and why keep the encryption key in sync can solve this issue.

We've added it to the build process and the live servers and don't see any difference (still loads of untraceable errors). I'm still not clear if this is a runtime or build-time encryption key.

@LikeDreamwalker
Copy link

@LikeDreamwalker I am also wondering if and why keep the encryption key in sync can solve this issue.

We've added it to the build process and the live servers and don't see any difference (still loads of untraceable errors). I'm still not clear if this is a runtime or build-time encryption key.

Image
I think this strategy is mainly focused on if we build the same app on different machines, but use them together. Giving a static and synchronized key can indeed make everything in sync at the build time, but not when lots of user are using.
So I think it may solve the issue if we really have conflicts in multiple machines or pods (although I don't undertstand in this scenario why don't we use the same image to deploy), but not if we have conflicts between an old client and a new service. And seems like this issue is pointing to this, not the multiple machines conflicts. Sadly.

@LikeDreamwalker
Copy link

LikeDreamwalker commented Feb 21, 2025

I think I have a rough solution based on @knpwrs 's repo: LikeDreamwalker/nextjs-skew-recovery-bug

To get started, we need to be clear with some concepts:

  1. This issue is actually pointing to the scenario that Client and Server are not in sync, not if the multiple pods or machines on Server are not in sync, so if you are in the second scenario this solution can't help.
  2. This solution is not well-structured and not well-tested. Especially I use some hacky way from community.

So based on this issue's original reproduce, we can know that this issue happens because client side is trying to request to server with an out of date action id, which causes the version skew issue. My idea is we try to do something in middleware, if we can know that the incoming server action (or any other) requests is out of date, we can use some way to notice this client to ask it to reload the page.

For the deployment part, I choose deploymentId to keep the client and server in sync. This feature seems very much like the normal package verison, but it will be exposed to every request's header if we set it manually.

For the checking part, I use middleware to achieve this:

import { NextResponse } from "next/server";
import type { NextRequest } from "next/server";
import deploymentId from "./staticId";

export function middleware(request: NextRequest) {
  if (request.nextUrl.pathname === "/refresh") {
    return NextResponse.next();
  }

  const clientDeploymentId =
    request.nextUrl.searchParams.get("dpl") ||
    request.headers.get("x-deployment-id");

  if (clientDeploymentId && clientDeploymentId !== deploymentId) {
    console.log("Client deployment id:", clientDeploymentId);
    console.log("Server deployment id:", deploymentId);

    const isAction =
      request.method === "POST" && request.headers.has("Next-Action");
    const refreshUrl = new URL("/refresh", request.url);

    if (isAction) {
      // For server actions, set a header to be handled by the action
      const response = NextResponse.next();
      response.headers.set("x-action-redirect", refreshUrl.toString());
      return response;
    } else {
      // For regular requests, redirect directly
      return NextResponse.redirect(refreshUrl);
    }
  }
}

export const config = {
  matcher: "/((?!_next/static|_next/image|favicon.ico).*)",
};

This is a version for test, and if you want to use as a workaround solution, please update the specific parts. There are many ways to notice the client with something, but if you want client to be noticed ASAP, by control both the server action and normal request would be better and redirect to a special route is also acceptable.

And here is how you could test with my version:

  1. Run npm run build
  2. Run npm start
  3. Open http://localhost:3000 and make sure the browser development tools are open.
  4. Press the Server Action button.
  5. Observe logs on the server indicating the function was called.
  6. The network response has a 200 response code indicating no errors and a text/x-component mime type.
  7. Shutdown the server, leave the app running in the web browser.
  8. Rename the action and also the deploymentId in the staticId.ts file, like "version-skew-v20"
  9. Run npm run build
  10. Run npm start
  11. Go to the already running app
  12. Press the Server Action button.
  13. Now middleware found out this request is from the old client, and it will try to redirect to the client to the refresh route, which will call window.location.href after 3 seconds
  14. Since the page has been reloaded, user will be free from the old client and verison skew.

And this solution is a kind of rough because:

  1. No idea why, but seems like for the first request it won't have the deploymentId, so we should check if the deploymentId is valued then check if they are matched. Don't know if this will have side effects, but since in this scenario the old client should already have sent requests, so it should be fine.
  2. You can't handle server action directly in middleware Redirect in middleware when request is a server action fails with "failed to forward action response" #64993, and my solution is from there. I have no idea if this have side effects or will be blocked in the future.
  3. Because 2 seems like we will still receive one log about Error: Failed to find Server Action, but after this user should be at the new client, so I can't say it will solve the error log completely but can make it better than before.
  4. You must set a deploymentId in every build, and to keep the deploymentId in sync from server to client, you should generate it in static in build time, not like a random function to generate one by call: Because the nextjs server will call it again, so they will never be the same if you generate it in runtime.
  5. For the specific redirect way, I haven't checked if the router.refresh() can work better than location.href(). But for version skew I think it can be understandable to force reload rather than refresh.
    You can watch this video to understand:
20250221-0430-10.1054142.mp4

Solve #76149

@J4v4Scr1pt
Copy link

J4v4Scr1pt commented Feb 21, 2025

Does the action-id only change if you change the server-action code somehow?
Or does it get a new Id on every build?

Edit: Got the the answer

Image

I guess a workaround would be to just not use serveractions when self-hosting and just use route handlers.. :/

@J4v4Scr1pt
Copy link

J4v4Scr1pt commented Feb 21, 2025

What a great work you have done investigating @LikeDreamwalker 🙌
You workaround seems solid, but should not be needed imo.

This issue should be quite broad for people using Next 15 and standalone right?

Good to address security in v.15 but this is now a serious issue for us in production affecting several customers and very hard to track, understand and fix. Is there any way to turn it off?

We are considering doing a re-work and remove all server-actions in favor of route-handlers instead. But as you can imagen this would be a to cumbersome and a fix would be much more preferred.

@leerob I think this issue should not be overlooked because this is something that could potentially cause users a lot of issues. You(or someone else) help to find a solution for this would be very appreciated by many I think 🙏.

@samcx
Copy link
Member

samcx commented Feb 21, 2025

@J4v4Scr1pt To reiterate what was mentioned above, you can essentially opt-out of this behavior when self-hosting → https://nextjs.org/docs/app/building-your-application/data-fetching/server-actions-and-mutations#overwriting-encryption-keys-advanced

@J4v4Scr1pt
Copy link

@J4v4Scr1pt To reiterate what was mentioned above, you can essentially opt-out of this behavior when self-hosting → https://nextjs.org/docs/app/building-your-application/data-fetching/server-actions-and-mutations#overwriting-encryption-keys-advanced

Thank you so much for your response!
Sry that I missed this information. After reading it, just to make sure I understand correctly. Does it mean if I set the NEXT_SERVER_ACTIONS_ENCRYPTION_KEY that the action Id will essentially always be the same after deployments?

I bet you have 100 other things to do so thx again for your help 🙏.

@samcx
Copy link
Member

samcx commented Feb 22, 2025

Does it mean if I set the NEXT_SERVER_ACTIONS_ENCRYPTION_KEY that the action Id will essentially always be the same after deployments?

Yes! So it'll be up to your digression on how you want to rotate the key.

@LikeDreamwalker
Copy link

Does it mean if I set the NEXT_SERVER_ACTIONS_ENCRYPTION_KEY that the action Id will essentially always be the same after deployments?

Yes! So it'll be up to your digression on how you want to rotate the key.

Thank you so so much for your reply and I understand it now. But I am thinking if this is a little bit of paradox? For the safety considering, the server actions will be "changed" after every build by default, and this can cause version skew after every build; To avoid the version skew, we can keep the encryption key be the same for every build, which actually make every server action, or we can say the endpoint of our server action API routes, always be the same for every build.

This is a little bit of complex but seems like the current conclusion is we accept the version skew for better security, or we solve the version skew and give up for better security.

I don't know if I also missed some info (and don't know why I just read the encryption part again to understand this rather than the first time), does next have a way to detect the version skew on the server side, or will we have a plan for the future? I think it would be a better solution if we can detect version skew, set up a business-based solution for version skew (like refresh or something), then we could solve the two issues I have mentioned above.

Thanks again! And just for a reminder for others:

Image
https://nextjs.org/docs/app/building-your-application/data-fetching/server-actions-and-mutations#closures-and-encryption

@knpwrs
Copy link
Author

knpwrs commented Feb 22, 2025

In my personal (and I suppose professional) opinion, the encryption key for action ids does not provide increased security whatsoever. Especially since there is a mapping of action ids sent down to the client in the server-rendered HTML. I’ve never had a need to obscure endpoints in my last 13 years as a professional web developer. It would be far better to encourage good security practices for developers and avoid all this encryption rigmarole altogether.

I can certainly see the argument for using an encryption key to send sensitive session state down to the client, but not for obscuring the action ids. It certainly seems to create more problems than it solves.

In any case, this issue is about the client not being able to recover from version skew. The client has no way of knowing that an action call has failed.

@mbranch
Copy link

mbranch commented Feb 22, 2025

I can certainly see the argument for using an encryption key to send sensitive session state down to the client, but not for obscuring the action ids.

I really agree here. Actions should be treated like any API endpoint with respect to backwards compatibility for clients in the wild. Obfuscating them randomly isn't really security and only hurts logging and observability.

I think very often features that might make sense for deploying in Vercel are leaking into Next.js as a framework without as much consideration for those who are deploying and supporting Next.js on their own.

I still think there's a lot of missing information about NEXT_SERVER_ACTIONS_ENCRYPTION_KEY:

  • Is this a build-time or runtime env var? Or maybe both?
  • What is the exact format of the key? Docs: Document expected NEXT_SERVER_ACTIONS_ENCRYPTION_KEY format #61020 (comment)
  • The statement "This variable must be AES-GCM encrypted." doesn't clarify. I think what's meant here is something along the lines of "This key is a server-side secret and should be treated accordingly. If it is compromised, an adversary could possibly use it to decrypt server-side secrets in payloads to the client."
  • More examples of exactly how this symmetric key is used to encrypt/decrypt payloads would be helpful. What exactly are the payloads? This can help Next.js users understand the consequences of a key leaking, etc.

@LikeDreamwalker
Copy link

In my personal (and I suppose professional) opinion, the encryption key for action ids does not provide increased security whatsoever.

For me, after I realized even for server actions are still the APIs and can be directly called under some conditions, they means not secure, or secure as the normal API to me. I still think the version skew caused by the encryption key and clients recovery from that are more important. After all there are too many other ways to secure a request, but seems like not too many for clients to recovery.

However if the encrypt key is good for security, it can be preserved, but it should not have conflicts with the recocery and response. I think recovery and security are two parallel things, no need to bind them together, which will be more complex.

Also I wonder if the encrypt key is really designed for the security. In my opinion I think it is more like a tool to make the server actions anonymous. Think about this, if we make a server action named like getUserInfo(), next actually can't make sure there will only be one of them. So the easiest way is to abstract the server actions into a random and only id and register it, use it to call. As for the changing after every build, I think this is the way it does. I would worry more if the id is not changing and causing the cross between server actions if I am designing this. Just a thought, no evidence and I think there must be more consideration which I don't know, and with respect.

@J4v4Scr1pt
Copy link

J4v4Scr1pt commented Feb 25, 2025

Have you all tried the NEXT_SERVER_ACTIONS_ENCRYPTION_KEY?
I still see this error from time to time, not as much as before but still. 🤔

Edit:
Maybe I'm to eager, gonna let this be in production for a while. It could be due to customers having tabs open from before fix.

But a question, if I change the serverAction code it will also result in a new Id correct?

@petewins
Copy link

petewins commented Feb 25, 2025

I still think there's a lot of missing information about NEXT_SERVER_ACTIONS_ENCRYPTION_KEY:

  • Is this a build-time or runtime env var? Or maybe both?
  • What is the exact format of the key? Docs: Document expected NEXT_SERVER_ACTIONS_ENCRYPTION_KEY format #61020 (comment)
  • The statement "This variable must be AES-GCM encrypted." doesn't clarify. I think what's meant here is something along the lines of "This key is a server-side secret and should be treated accordingly. If it is compromised, an adversary could possibly use it to decrypt server-side secrets in payloads to the client."
  • More examples of exactly how this symmetric key is used to encrypt/decrypt payloads would be helpful. What exactly are the payloads? This can help Next.js users understand the consequences of a key leaking, etc.

I agree we need more details on how this variable is used. I've tested locally and it seems we only need this during build time to retain consistent action ids. Can someone confirm this?

I extracted this from the source code, this is how keys are currently generated

function arrayBufferToString(
    buffer: ArrayBuffer | Uint8Array<ArrayBufferLike>
) {
    const bytes = new Uint8Array(buffer)
    const len = bytes.byteLength

    // @anonrig: V8 has a limit of 65535 arguments in a function.
    // For len < 65535, this is faster.
    // https://github.com/vercel/next.js/pull/56377#pullrequestreview-1656181623
    if (len < 65535) {
        return String.fromCharCode.apply(null, bytes as unknown as number[])
    }

    let binary = ''
    for (let i = 0; i < len; i++) {
        binary += String.fromCharCode(bytes[i])
    }
    return binary
}

async function generateKey() {
    const key = await crypto.subtle.generateKey(
        {
            name: 'AES-GCM',
            length: 256,
        },
        true,
        ['encrypt', 'decrypt']
    )
    const exported = await crypto.subtle.exportKey('raw', key)
    const result = btoa(arrayBufferToString(exported))

    return result
}

Thank you so so much for your reply and I understand it now. But I am thinking if this is a little bit of paradox? For the safety considering, the server actions will be "changed" after every build by default, and this can cause version skew after every build; To avoid the version skew, we can keep the encryption key be the same for every build, which actually make every server action, or we can say the endpoint of our server action API routes, always be the same for every build.

This is a little bit of complex but seems like the current conclusion is we accept the version skew for better security, or we solve the version skew and give up for better security.

I agree with this sentiment unless we implement our own skew protection, rotating this key would cause the same issue. It seems like we have to make a tough trade-off between user experience vs security.

@J4v4Scr1pt
Copy link

I think they can remove this completely instead and have it as it was in v14.
The security should be up to us developers, you will not create an api-end-point without a proper protection layer.
And a ServerAction is basically an api-end-point. Don't get me wrong it's super awesome Vercel looking into enhancing security by default 🙌, but it seems this one was done with hosting on Vercel in mind.

And I still see this error in our logging service with 5 days passed since I added the NEXT_SERVER_ACTIONS_ENCRYPTION_KEY...

@LikeDreamwalker
Copy link

I think they can remove this completely instead and have it as it was in v14. The security should be up to us developers, you will not create an api-end-point without a proper protection layer. And a ServerAction is basically an api-end-point. Don't get me wrong it's super awesome Vercel looking into enhancing security by default 🙌, but it seems this one was done with hosting on Vercel in mind.

And I still see this error in our logging service with 5 days passed since I added the NEXT_SERVER_ACTIONS_ENCRYPTION_KEY...

IDK, do we have the "encryption" part in v14? I thought this is always existed since server action released...

@J4v4Scr1pt
Copy link

I think they can remove this completely instead and have it as it was in v14. The security should be up to us developers, you will not create an api-end-point without a proper protection layer. And a ServerAction is basically an api-end-point. Don't get me wrong it's super awesome Vercel looking into enhancing security by default 🙌, but it seems this one was done with hosting on Vercel in mind.
And I still see this error in our logging service with 5 days passed since I added the NEXT_SERVER_ACTIONS_ENCRYPTION_KEY...

IDK, do we have the "encryption" part in v14? I thought this is always existed since server action released...

I based that on this, so maybe I'm wrong :).
https://nextjs.org/blog/next-15#enhanced-security-for-server-actions

@LikeDreamwalker
Copy link

I think they can remove this completely instead and have it as it was in v14. The security should be up to us developers, you will not create an api-end-point without a proper protection layer. And a ServerAction is basically an api-end-point. Don't get me wrong it's super awesome Vercel looking into enhancing security by default 🙌, but it seems this one was done with hosting on Vercel in mind.
And I still see this error in our logging service with 5 days passed since I added the NEXT_SERVER_ACTIONS_ENCRYPTION_KEY...

IDK, do we have the "encryption" part in v14? I thought this is always existed since server action released...

I based that on this, so maybe I'm wrong :). https://nextjs.org/blog/next-15#enhanced-security-for-server-actions

I just remembered, yes. But seems like this not actually helps a lot especially attackers could just observe the business actions to know where the ids points to. 😂

Maybe based on this key to encrypt and decrypt the request and response body will be more completely.

@benjick
Copy link

benjick commented Mar 5, 2025

Like @J4v4Scr1pt asked;

But a question, if I change the serverAction code it will also result in a new Id correct?

Can we get some clarification on this? Does it change if I edit the specific action? Or other actions in the same file?

@github-actions github-actions bot added the linear: next Confirmed issue that is tracked by the Next.js team. label Mar 24, 2025
@jonschmidt
Copy link

Just adding cacophony of voices. We have implemented our first dozen or so server actions that we were ready to ship to production until we discovered this problem in QA. We are deploying on AWS.

As far as I can tell, there is no fix for this. Setting the encryption key NEXT_SERVER_ACTIONS_ENCRYPTION_KEY in both our build and runtime environments does not result in stable IDs. Unless someone has better info on how to achieve stable IDs, we have to delay our new launch and rewrite everything using client side fetching.

@LikeDreamwalker
Copy link

LikeDreamwalker commented Mar 28, 2025

Just adding cacophony of voices. We have implemented our first dozen or so server actions that we were ready to ship to production until we discovered this problem in QA. We are deploying on AWS.

As far as I can tell, there is no fix for this. Setting the encryption key NEXT_SERVER_ACTIONS_ENCRYPTION_KEY in both our build and runtime environments does not result in stable IDs. Unless someone has better info on how to achieve stable IDs, we have to delay our new launch and rewrite everything using client side fetching.

Just a thought, maybe there is a solution is building up a stable and secure API like /api/actions, and because API won't have the version skew issues if we keep the pathname still, so it can avoid this; And for everywhere is calling the server action, now call it inside this API, and this API can use the incoming params to call the related server actions.

// A fake API router for understanding
// Incoming a server action name to call the server action on the server side always
const callActions = ({ name }, { name: string }) => {
  const res = {}
  // Switch Case or some ways to call the specific actions
  switch (name) {
    case 'exampleServerAction':
      const res = exampleServerAction(name);
      break;
  
    default:
      break;
  }
  // Return the result as the response, and can be modified for easier using
  return res;
}

I think this solution, might work because:

  • API routes won't have Version Skew issue as far as I know
  • Client side is sending a request to the server without Version Skew issue
  • Server side instance in most time is stable, which means we won't have Version Skew issue mostly
  • Server side calls server actions independently, without any client side codes (including the dynamic server action ID)

This solution seems like stable logically and I didn't test it, but I think it might be helpful if you want to solve the version skew issues thoroughly, which is more like building up your own server action. So the only issue is the cost and you should also keep it safe with external safety check for this API.

@jonschmidt
Copy link

Just adding cacophony of voices. We have implemented our first dozen or so server actions that we were ready to ship to production until we discovered this problem in QA. We are deploying on AWS.
As far as I can tell, there is no fix for this. Setting the encryption key NEXT_SERVER_ACTIONS_ENCRYPTION_KEY in both our build and runtime environments does not result in stable IDs. Unless someone has better info on how to achieve stable IDs, we have to delay our new launch and rewrite everything using client side fetching.

Just a thought, maybe there is a solution is building up a stable and secure API like /api/actions, and because API won't have the version skew issues if we keep the pathname still, so it can avoid this; And for everywhere is calling the server action, now call it inside this API, and this API can use the incoming params to call the related server actions.
...

Thanks for the thorough suggestion! We actually have an external api that we use, and its all in the same VPN. There is very little cost to us switching to client side mutations and refetching using SWR for now, and waiting to Vercel comes up with a better solution (or maybe we migrate to a provider that handles this better).

@jklnr
Copy link

jklnr commented Mar 28, 2025

(or maybe we migrate to a provider that handles this better).

@jonschmidt If you're using vercel you can probably turn on Skew Protection and not worry about it, right?
https://vercel.com/blog/version-skew-protection

Have you all tried the NEXT_SERVER_ACTIONS_ENCRYPTION_KEY?
@J4v4Scr1pt

I have tried it. While it does not prevent the generation of new server action IDs, they are more stable.

I believe what I've observed is the following:

Without a consistent NEXT_SERVER_ACTIONS_ENCRYPTION_KEY encryption key:

  • Server action IDs change between builds, even when no code changes are made at all
  • Each build generates unique IDs, regardless of whether the code is identical
  • This creates / exacerbates version skew, each build is a breaking change

With a consistent encryption key:

  • Server action IDs remain consistent between builds but only for those server actions whose function signatures did not change.
  • If you change the implementation of the function without changing its signature, the server action ID will not change, not even if you change other server actions within the same file @benjick

So I think this is a valid workaround and what I will probably try for our deployments:

  • use a constant value for NEXT_SERVER_ACTIONS_ENCRYPTION_KEY in all our builds
  • Any time we would change the function signature of a server action, instead we'll add a brand new server action and leave the old one alone.

Hopefully that will work.

Anyway this is not good to have to opt into using this environment variable during build steps in order for server actions to be (only semi-) backward-compatible with clients. It would be great if there was some way to have explicit control over what each server action's name is. Or otherwise some other config option that allows us to make sure they do not change from build to build.

@fro-furnishedfinder
Copy link

Anyway this is not good to have to opt into using this environment variable during build steps in order for server actions to be (only semi-) backward-compatible with clients. It would be great if there was some way to have explicit control over what each server action's name is. Or otherwise some other config option that allows us to make sure they do not change from build to build.

You mean a normal endpoint huh?

@jklnr
Copy link

jklnr commented Mar 31, 2025

Anyway this is not good to have to opt into using this environment variable during build steps in order for server actions to be (only semi-) backward-compatible with clients. It would be great if there was some way to have explicit control over what each server action's name is. Or otherwise some other config option that allows us to make sure they do not change from build to build.

You mean a normal endpoint huh?

Sure if there's some very easy way to convert all server actions we've already developed into normal endpoints that would be a fine workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linear: next Confirmed issue that is tracked by the Next.js team. Server Actions Related to Server Actions.
Projects
None yet
Development

No branches or pull requests