diff --git a/README.md b/README.md index f1c52a4c..c8cc37fe 100644 --- a/README.md +++ b/README.md @@ -5,14 +5,13 @@ A content repository for the Atlas Architecture Center documentation site. ## Report Issues To file issues or requests regarding the documentation, go to the -`Documentation Jira Project `_. +[Documentation Jira Project](https://jira.mongodb.org/browse/DOCS). ## License -All documentation is available under the terms of a `Creative Commons -License `_. +All documentation is available under the terms of a +[Creative Commons License](https://creativecommons.org/licenses/by-nc-sa/3.0/). -If you have any questions, please contact `docs@mongodb.com -`_. +If you have any questions, please contact [docs@mongodb.com](mailto:docs@mongodb.com). -- The MongoDB Documentation Team diff --git a/snooty.toml b/snooty.toml index 523de324..d8152cd5 100644 --- a/snooty.toml +++ b/snooty.toml @@ -27,8 +27,11 @@ intersphinx = [ "https://www.mongodb.com/docs/master/objects.inv", toc_landing_pages = [ "/getting-started", + "/deployment-paradigms", + "/deployment-paradigms/multi-region", "/operational-efficiency", "/security", + "/auth", "/reliability", "/performance", "/cost-optimization", @@ -39,6 +42,8 @@ adf = "Atlas Data Federation" adl = "Atlas Data Lake" ak8so = "Atlas Kubernetes Operator" app-services = "App Services" +arp = "Atlas Resource Policy" +arps = "Atlas Resource Policies" atlas-admin-api = "Atlas Administration API" atlas-app-services = "Atlas App Services" atlas-arch-center = "Atlas Architecture Center" @@ -132,6 +137,7 @@ PIT-Restore = "Continuous Cloud Backup" pit-restore = "continuous cloud backup" pkce = ":abbr:`PKCE (Proof Key of Code Exchange)`" playground = "Atlas Search Playground" +ps = "MongoDB's `Professional Services `__" qe = "Queryable Encryption" Realm = "Realm" sdk = ":abbr:`SDK (Software Development Kit)`" @@ -176,6 +182,7 @@ waf = "`Well-Architected Framework `. You can :manual:`configure manual auditing ` of most of the - documented :manual:`system event actions ` + documented :manual:`system event actions ` in |service|. Granular MongoDB database auditing allows you to track usage of all DDL (Data Definition Language), DML (Data Manipulation Language), and DCL (Data Control Language) commands in detail. See also @@ -57,48 +57,78 @@ multiple users. As an |service| administrator, you can: Accessing Audit Logs ~~~~~~~~~~~~~~~~~~~~ -.. include:: /includes/cloud-docs/logs.rst +You can use the {+atlas-cli+}, {+atlas-admin-api+}, or {+atlas-ui+} for the following auditing activities: -You can review and update your auditing configuration per project. Use -the following {+atlas-cli+} commands: +- :ref:`View and download audit logs ` to track :ref:`system event actions ` + for deployments with multiple users. {+service+} administrators can configure a custom auditing filter to + choose the actions, database users, {+service+} roles, and |ldap| groups that they want to audit. -- :ref:`atlas auditing describe ` returns the - auditing configuration for the specified project. -- :ref:`atlas auditing update ` updates - the auditing configuration for the specified project. +- :ref:`View and download MongoDB logs ` to track log events for your deployment, including incoming connections, + commands run, and issues encountered. Generally, log messages are useful for diagnosing issues, monitoring your deployment, and tuning performance. -You can :ref:`view authentication attempts ` that users make -against your {+cluster+} with the {+atlas-cli+}, {+atlas-admin-api+}, or {+atlas-ui+}. -|service| logs both successful and unsuccessful authentication attempts, -including the timestamp of each attempt and which user tried to authenticate. +- :ref:`View project and organization events ` in the :guilabel:`Project Activity Feed` + and :guilabel:`Organization Activity Feed`. These activity feeds list all events at the organization or project level, including + changes related to {+service+} access, alert configurations and monitoring, billing, and more. -You can also :ref:`view and filter the activity feed ` -for an organization or project with the {+atlas-cli+}, {+atlas-admin-api+}, -or {+atlas-ui+}. The activity feed lists all events at the organization or project level. +- :ref:`View database authentication attempts ` that users make against your {+cluster+} in your access logs + (i.e. :guilabel:`Database access history` in the {+atlas-ui+}). + |service| logs both successful and unsuccessful authentication attempts, + including the timestamp of each attempt and which user tried to authenticate. -To perform a full audit, you can use a combination of audit logs, -:manual:`MongoDB log messages `, and -:ref:`the project and organization activity feed `. +Programmatic Access to Audit Logs +````````````````````````````````` + +To integrate with tools beyond the built-in integrations, we recommend +that you retrieve logs with the following programmatic tools and feed +the |json|-formatted output to your external tools: +- To continually push logs to an |aws| |s3| bucket, use the {+atlas-admin-api+} + endpoints for :oas-atlas-tag:`Push-Based Log Export `. +- To retrieve deployment logs and lists of project events, + use the {+atlas-admin-api+} endpoints for :oas-atlas-tag:`Monitoring and Logs ` + and :oas-atlas-tag:`Project and Organization Events `. +- To retrieve deployment logs, use the :ref:`atlas deployment logs ` + command in the {+atlas-cli+}. + .. _arch-center-auditing-logging-recs: Recommendations for {+service+} Auditing and Logging ---------------------------------------------------- +.. collapsible:: + :heading: Single-Region Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments in a single region + :expanded: true + + Singe-region deployments have no unique considerations for auditing and logging. + See the next section for "All Deployment Paradigm Recommendations". + +.. collapsible:: + :heading: Multi-Region and Multi-Cloud Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments across multiple regions or multiple cloud providers + :expanded: true + + Multi-region and multi-cloud deployments have no unique considerations + for auditing and logging. See the next section for "All Deployment Paradigm Recommendations". + +All Deployment Paradigm Recommendations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following recommendations apply to all :ref:`deployment paradigms +`. + We recommend that you :atlas:`set up database auditing ` -when you provision your {+clusters+}. +when you provision your {+clusters+}. To perform a full audit, +you can use a combination of :manual:`audit logs `, +:manual:`MongoDB log messages `, +and :ref:`the project and organization activity feed `. + Auditing puts additional load on your {+clusters+} and increases costs. -To optimize costs, you can disable auditing in lower environments for development. +To optimize {+cluster+} performance and minimize costs, we recommend that you +limit the number of users that you audit, and disable auditing in development environments. Certain industries, such as healthcare and financial services, may opt to keep auditing enabled in development environments for compliance reasons. -Enabling auditing for all database users, including application -service users, might affect cluster performance. We recommend that you -audit only the actions of users that require auditing. - -For staging and production environments, enable auditing for -additional security. - We recommend that you audit the following events at a minimum: - Failed logon @@ -114,38 +144,24 @@ We recommend that you audit the following events at a minimum: - Altering security - Running database start and stop commands -For all of the previous events, you should include in the audit log the following -information at a minimum: - -- Session ID -- Client hostname and IP address -- Database server hostname and IP address -- Database user -- Database name -- OS user -- Service/instance name -- Port -- Application -- Query -- SQL command -- Object -- Timestamp -- Error code (if applicable) - -Programmatic Access to Audit Logs -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -To integrate with tools beyond the built-in integrations, we recommend -that you retrieve logs with the following programmatic tools and feed -the JSON-formatted output to your external tools: - -- To continually push logs to an |aws| |s3| bucket, use the {+atlas-admin-api+} - endpoints for :oas-atlas-tag:`Push-Based Log Export `. -- To retrieve deployment logs and lists of project events, - use the {+atlas-admin-api+} endpoints for :oas-atlas-tag:`Logs ` - and :oas-atlas-tag:`Project and Organization Events `. -- To retrieve deployment logs, use :ref:`atlas deployment logs ` - command in the {+atlas-cli+}. To learn more, see :atlas:`Atlas Deployment Logs `. +By default, audit log messages are returned in a format designed by MongoDB, +called the :manual:`mongo schema `. +Audit log messages that follow the ``mongo`` schema always include the following information: + +- Action type (``atype``) +- Timestamp +- Client connection ID (UUID) +- Client IP address and port number +- Incoming connection IP address and port numbe +- Username(s) +- User authentication database(s) +- User role(s) +- User role database(s) +- ``param`` document containing specific details for the event +- Result value or error code + +For a full list of audit action types and their associated ``param`` details and ``result`` values, +see :manual:`mongo Schema Audit Messages `. Automation Examples: {+service+} Auditing and Logging ----------------------------------------------------- @@ -163,68 +179,156 @@ In addition to the following examples, see the blogpost .. tab:: CLI :tabid: cli - Update Audit Configuration - ~~~~~~~~~~~~~~~~~~~~~~~~~~ + Create and Enable Filter + ~~~~~~~~~~~~~~~~~~~~~~~~ - Run the following {+atlas-cli+} command to audit all authentication events - for known users in your project: + The following document defines an audit filter that restricts audits to only the + authentication operations that occur against the ``test`` database. + To learn more, see :manual:`Configure Audit Filters `. - .. include:: /includes/examples/cli-example-audit-logs-known-users.rst + .. include:: /includes/examples/cli/cli-example-audit-filter.rst - Run the following {+atlas-cli+} command to audit a known user - via a configuration file: + To enable an audit filter, run the :ref:`atlas auditing update ` + command with the ``--enabled`` flag and specify the audit filter document in single quotes + to pass the document as a string: - .. include:: /includes/examples/cli-example-audit-logs-config-file.rst + .. include:: /includes/examples/cli/cli-example-audit-filter-use.rst - Describe Audit Configuration - ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + The following {+atlas-cli+} command enables an audit filter that is defined in + the specified |json| configuration file: - Run the :ref:`atlas auditing describe ` - {+atlas-cli+} command to return the auditing configuration for the specified project: + .. include:: /includes/examples/cli/cli-example-audit-logs-config-file.rst - .. include:: /includes/examples/cli-example-audit-logs-describe.rst + Update Audit Configuration + ~~~~~~~~~~~~~~~~~~~~~~~~~~ - Create and Use Audit Filter - ~~~~~~~~~~~~~~~~~~~~~~~~~~~ + To update your project's audit configuration, use the :ref:`atlas auditing update ` + command and specify the new audit filter. The following command replaces the existing + audit filter configuration with a new filter that audits all authentication events for known users in the project: - Create an audit filter to only audit the authenticate operations - that occur against the test database. To learn more, - see :manual:`Configure Audit Filters `. + .. include:: /includes/examples/cli/cli-example-audit-logs-known-users.rst - .. include:: /includes/examples/cli-example-audit-filter.rst + Describe Audit Configuration + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - To use an audit filter that you created, update the audit configuration - using the :ref:`atlas auditing update ` - {+atlas-cli+} command: + Run the :ref:`atlas auditing describe ` + command to return the auditing configuration for the specified project: - .. include:: /includes/examples/cli-example-audit-filter-use.rst + .. include:: /includes/examples/cli/cli-example-audit-logs-describe.rst Retrieve Logs ~~~~~~~~~~~~~ - To retrieve the access log, use a command similar to the following. - This command returns a JSON-formatted list of all authentication - requests made against the {+clusters+} named ``Cluster0`` for the project - with the ID ``618d48e05277a606ed2496fe``: + Each |mongod| and |mongos| instance in a cluster outputs its own + :manual:`MongoDB log ` and audit log messages + with potentially different contents than other instances. + You can view these log messages in the {+atlas-cli+} using the + :ref:`atlas deployment logs ` command. + + To retrieve audit log entries for a |mongod| instance in your cluster, + provide the |mongod| hostname and specify ``mongodb-audit-log.gz`` as the name of the audit log file: + + .. include:: /includes/examples/cli/cli-example-retrieve-audit-logs-mongod.rst + + To retrieve audit log entries for a |mongos| instance in a sharded cluster deployment, + provide the |mongos| hostname and specify ``mongos-audit-log.gz`` as the name of the audit log file: + + .. include:: /includes/examples/cli/cli-example-retrieve-audit-logs-mongos.rst - .. include:: /includes/examples/cli-example-retrieve-logs.rst + To retrieve :manual:`MongoDB log messages `, provide the hostname of your |mongod| or |mongos| instance, + and specify the name of the log file as ``mongodb.gz`` or ``mongos.gz``, respectively: - Retrieve All Log Events for Organization in a JSON File - ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + .. include:: /includes/examples/cli/cli-example-retrieve-logs.rst - To return all events for the specified organization, use a command - similar to the following. This command returns a JSON-formatted list - of events for the organization with the ID ``5dd5a6b6f10fab1d71a58495``: + You can also use the :ref:`atlas accessLogs list ` command to view the access log for + a node or cluster. + The access log is a |json|-formatted list of all authentication requests against your specified node or cluster. + To retrieve the access log, run the :ref:`atlas accessLogs list ` command + and specify the hostname or {+cluster+} name of the target node or {+cluster+}: - .. include:: /includes/examples/cli-example-retrieve-logs-org.rst + .. include:: /includes/examples/cli/cli-example-retrieve-access-logs.rst Download Logs ~~~~~~~~~~~~~ - Run the following {+atlas-cli+} command to download a compressed file that - contains the MongoDB logs for the specified host in your project. + Each |mongod| and |mongos| instance in a cluster has its own + :manual:`MongoDB log ` and audit log + with potentially different contents than other instances. + You can download each log as a compressed file using the + :ref:`atlas logs download ` {+atlas-cli+} command. + + To download the audit log for a |mongod| instance in your cluster, + provide the |mongod| hostname and the audit log file name ``mongodb-auditlog.gz`` as arguments: + + .. include:: /includes/examples/cli/cli-example-download-audit-logs-mongod.rst + + To download the audit log for a |mongos| instance in a sharded cluster deployment, + provide the |mongos| hostname and the audit log file name ``mongos-auditlog.gz`` as arguments: + + .. include:: /includes/examples/cli/cli-example-download-audit-logs-mongos.rst + + To download the :manual:`MongoDB log ` for a |mongod| or |mongos| instance, + provide as arguments the hostname of the instance and the log file names ``mongodb.gz`` or ``mongos.gz``, respectively: + + .. include:: /includes/examples/cli/cli-example-download-logs.rst + + Retrieve All Project Alerts + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + You can use the following {+atlas-cli+} commands to return alerts triggered by + events for your project or organization. {+service+} provides alerts such as + :alert:`Replica set has no primary` and :alert:`User joined the project` by default. + These events provide a record of significant activities and changes within the project or organization, + including significant database, billing, or security activities or status changes. + To customize which events trigger alerts for your project and organization, see :ref:`configure-alerts`. + + Retrieve All Log Events for Your Project or Organization + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + You can use the following {+atlas-cli+} commands to return project or organization events from your + :guilabel:`Project Activity Feed` or :guilabel:`Organization Activity Feed`. + :alert:`Replica set has no primary` and :alert:`User joined the project` by default. + + To return all events for your organization, use the :ref:`atlas events organizations list ` + command and specify your organization ID. The following command returns a JSON-formatted list of events for the + organization with the ID ``5dd5a6b6f10fab1d71a58495``: + + .. include:: /includes/examples/cli/cli-example-retrieve-logs-org.rst + + To return all events for your project, use the :ref:`atlas events projects list ` + command and specify your project ID. The following command returns a JSON-formatted list of events for the + project with the ID ``64ac57bfe9810c0263e9d655``: + + .. include:: /includes/examples/cli/cli-example-retrieve-logs-org.rst + + Retrieve All Project Alerts + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + You can use the following {+atlas-cli+} commands to return alerts triggered by + events for your project or organization. {+service+} provides alerts such as + :alert:`Replica set has no primary` and :alert:`User joined the project` by default. + These events provide a record of significant activities and changes within the project or organization, + including significant database, billing, or security activities or status changes. + To customize which events trigger alerts for your project and organization, see :ref:`configure-alerts`. + + Retrieve All Log Events for Your Project or Organization + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + You can use the following {+atlas-cli+} commands to return project or organization events from your + :guilabel:`Project Activity Feed` or :guilabel:`Organization Activity Feed`. + :alert:`Replica set has no primary` and :alert:`User joined the project` by default. + + To return all events for your organization, use the :ref:`atlas events organizations list ` + command and specify your organization ID. The following command returns a JSON-formatted list of events for the + organization with the ID ``5dd5a6b6f10fab1d71a58495``: + + .. include:: /includes/examples/cli/cli-example-retrieve-logs-org.rst + + To return all events for your project, use the :ref:`atlas events projects list ` + command and specify your project ID. The following command returns a JSON-formatted list of events for the + project with the ID ``64ac57bfe9810c0263e9d655``: - .. include:: /includes/examples/cli-example-download-logs.rst + .. include:: /includes/examples/cli/cli-example-retrieve-logs-org.rst .. tab:: Terraform :tabid: Terraform @@ -253,7 +357,7 @@ In addition to the following examples, see the blogpost by creating audit filters. To learn more about configuring audit filters, see :manual:`Configure Audit Filters `. - .. include:: /includes/examples/tf-example-auditing-filter.rst + .. include:: /includes/examples/terraform/tf-example-auditing-filter.rst Retrieve Logs ~~~~~~~~~~~~~ diff --git a/source/auth.txt b/source/auth.txt index 6a94502e..ea3276b0 100644 --- a/source/auth.txt +++ b/source/auth.txt @@ -21,621 +21,22 @@ Guidance for {+service+} Authorization and Authentication Authentication is the process of verifying the identity of a user. |service| requires all users to authenticate themselves in order -to determine their access. +to determine their access. Authorization is the process of assigning +permissions to an authenticated user. -Although authentication and authorization are closely connected, -authentication is distinct from authorization: +For :ref:`Authentication `, |service| +provides robust authentication mechanisms that seamlessly integrate with your +existing identity systems, providing secure access to the UI, database, and +|api|\s. -- Authentication verifies the identity of a user. +For :ref:`Authorization `, |service| +provides Role-Based Access Control (RBAC) to govern +access to |service|. You must grant a user one or more roles that determine the +user's access to database resources and operations. Outside of role +assignments, the user has no access to the system. - |service| provides robust authentication mechanisms that seamlessly - integrate with your existing identity systems, providing secure access - to the UI, database, and |api|\s through strong identity federation. - You can manage access to |service-fullname| {+clusters+} by - configuring authentication. +.. toctree:: + :titlesonly: -- Authorization determines the verified user's access to resources and - operations. - - |service| provides Role-Based Access Control (RBAC) to govern access to - |service|. You must grant a user one or more roles that determine the - user's access to database resources and operations. Outside of role - assignments, the user has no access to the system. - -Features for {+service+} Authentication ---------------------------------------- - -|service-fullname| supports a variety of authentication methods to -ensure robust security. - -- The best practice for user authentication is utilizing |service|'s - seamless integration with identity providers using federated authentication - through OpenID Connect (OIDC) or SAML 2.0, and enhancing security with Multi-Factor - Authentication (MFA) to ensure a modern authentication and security posture. -- For workload authentication, |service| supports OAuth2.0, allowing - seamless compatibility with authorization services and integration - into your federated :abbr:`IdP (Identity Provider)`. - -|service| requires all users to authenticate to access {+atlas-ui+}, -|service| database, and {+atlas-admin-api+}. The following -authentication methods for each |service| resource ensure that -authentication is both secure and adaptable. |service| provides the -following authentication mechanisms: - -.. _arch-center-authentication-recs: - -Recommendations for {+service+} Authentication ----------------------------------------------- - -{+atlas-ui+} Authentication -~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -We recommend federated authentication for {+atlas-ui+} access. To -configure federated authentication, you must create {+service+} -credentials or log in with Google or Github. - -For {+service+} credentials, -we recommend that you use a strong password accompanied by a -phishing-resistant :abbr:`MFA (Multi-Factor Authentication)`, such as -biometrics. We highly recommend that you set up a secondary -:abbr:`MFA (Multi-Factor Authentication)` factor -to avoid account lock-outs. To ensure -:abbr:`MFA (Multi-Factor Authentication)` access for {+service+} -credentials, turn on :abbr:`MFA (Multi-Factor Authentication)` -enforcement in the Organization Settings. After you set up federation -for your domain, you should use {+service+} credentials to -authenticate only in the emergency break-glass scenarios when federated -authentication is broken. - -Federated Authentication -```````````````````````` - -Federated authentication allows you to manage all authentication to -the {+atlas-ui+} across multiple systems and applications through a -central identity provider. For UI access, |service| supports workforce -identity federation using :abbr:`SAML (Security Assertion Markup -Language)` 2.0. You can use any :abbr:`SAML (Security Assertion -Markup Language)` compatible identity provider such as Okta, Microsoft -Entra ID, or Ping Identity to enforce security policies such as password -complexity, credential rotation, and :abbr:`MFA (Multi-Factor -Authentication)` within your identity provider. - -You must configure the IP access list in the {+atlas-ui+} to allow only connections -from IP ranges that include your users and application servers. - -To learn more, see :ref:`atlas-federated-authentication`. - -Multi-Factor Authentication -``````````````````````````` - -For any human user that has access to the |service| control plane, we recommend -that you require :abbr:`MFA (Multi-Factor Authentication)` for enhanced security. -When :abbr:`MFA (Multi-Factor Authentication)` is enabled, |service| requires -two forms of identification: - -- The user's credentials -- One of the following recommended factors: - - - Security keys - - Biometrics - - OTP authenticators - - Push notifications - - SMS (not recommended as primary factor) - - Email (not recommended as primary factor) - -To learn more, see :ref:`atlas-enable-mfa`. - -|service| Database Authentication -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -|service| supports various database authentication mechanisms. - -To configure workforce (human user) access to your |service| -database via tools such as {+mongodbsh+} and Compass, -use :ref:`Workforce Identity Federation ` with {+oidc+}. - -To configure workload (application) access to your |service| -database using MongoDB drivers, use :ref:`Workload Identity Federation `, -AWS-IAM authentication, or X.509 certificate authentication. We recommend :manual:`SCRAM ` -password authentication for use only in development or testing environments. - -|service| also supports: - -- :ref:`Creating temporary database users ` - with just-in-time database access -- :ref:`Third-party secrets manangement ` - -Workforce Identity Federation -````````````````````````````` - -Workforce Identity Federation allows you to manage all authentication to -the |service| database through your identity provider. For database -access, we recommend {+oidc+}-compatible identity -providers such as Okta, Microsoft Entra ID, or Ping Identity to enforce -security policies such as password complexity, credential rotation, and -:abbr:`MFA (Multi-Factor Authentication)` within your identity provider. - -To learn more, see :ref:`oidc-authentication-workforce`. - -Workload Identity Federation and AWS IAM Role Authentication -```````````````````````````````````````````````````````````` - -Workload Identity Federation enables applications running in cloud -environments like |azure| and Google Cloud to authenticate with -|service| without the need to manage separate database user credentials. -With Workload Identity Federation, you can manage |service| database -users using |azure| Managed Identities, Google Service Accounts, or any -OAuth 2.0-compliant service. These authentication mechanisms simplify -management and enhance security by allowing for passwordless access to -the |service| database. - -We recommend Workload Identity Federation for all applications running in -production. You shouldn't allow human users to connect except in the most -extreme break-glass emergency scenarios. - -You can also authenticate -through |aws| |iam| roles. - -To learn more, see the following: - -- :ref:`oidc-authentication-workload` -- :ref:`set-up-pwdless-auth` - -X.509 Client Certificates and SCRAM -``````````````````````````````````` - -We recommend that you use Workforce or Workload Identity Federation -through an identity provider for security and ease of access to all aspects of the -|service| control and data plane. - -If you don't have an identity provider for federation, -|service| {+clusters+} also support X.509 client certificates for -user authentication. X.509 certificates provide the security of mutual TLS, -making them suitable for staging and production environments, and you -can bring your own certificate authority for use with X.509. -The disadvantage of X.509 is that you must manage certificates and -the security of these certificates on the application side, while -Workload Identity Federation allows for passwordless access and easier -application security. - -|service| {+clusters+} also support SCRAM password authentication for user authentication, -but we recommend SCRAM only for use in development and test environments. - -If you leverage X.509 or SCRAM authentication, we recommend that you use -third-party secrets manager like -`HashiCorp Vault `__ -or |aws| Secrets Manager to generate and store complex database credentials. - -To learn more, see the following manual pages: - -- :manual:`X.509 ` -- :manual:`SCRAM ` - -.. _arch-center-just-in-time: - -Just-in-Time Access -``````````````````` - -|service| also supports creating temporary database users -that automatically expire after the predefined times. A user can be -created for the following periods: - -- 6 hours -- 1 day -- 1 week - -To learn more, see :ref:`mongodb-users`. - -.. _arch-center-secrets: - -Secrets Management -`````````````````` - -We recommend using a third-party secrets manager like `HashiCorp Vault -`__ -or |aws| Secrets Manager to generate and store complex database credentials. -A secrets manager can generate database credentials dynamically based on configured -roles for |service| databases. - -To learn more, see the blog :website:`Manage -MongoDB Atlas Database Secrets in HashiCorp Vault -`. - -.. _arch-center-admin-api-recs: - -{+atlas-admin-api+} Authentication -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -|service| provides two ways to authenticate to the {+atlas-admin-api+}: - -- :atlas:`Service accounts ` - (currently available as a :website:`Preview feature - `\) -- |api| keys - -Service Accounts -```````````````` - -Service accounts use industry-standard OAuth2.0 to securely authenticate -with {+service+} through the {+atlas-admin-api+}. We recommend that you use service accounts instead of |api| keys when possible because they provide added security through use short-lived -access tokens and required credential rotations. - -Service accounts are -available as a Preview feature, and you can manage programmatic access for service accounts only by using the {+atlas-ui+} or the {+atlas-admin-api+}. You can't manage -programmatic access for service accounts through the {+atlas-cli+} or Terraform. - -To learn more, see :atlas:`Service Accounts Overview `. - -API Keys -```````` - -If you don't use service accounts, you can use |api| key-based authentication to securely manage -programmatic access. |api| key-based authentication uses |http| Digest authentication to protect requests. -The |api| public key functions as the username, and the corresponding -private key serves as the password. -You should store these keys in a third party secrets management system, -such as |aws| Secrets Manager or {+vault+}. To learn how to securely store these -keys in Vault, see the blog -`Manage MongoDB Atlas Database Secrets in HashiCorp Vault `__. - -To further enhance security and -minimize the risk of unauthorized access: - -- Follow best practices for - rotating |api| keys regularly. To learn how to rotate these keys with - {+vault+}, for example, see `the Hashicorp documentation `__. - -- Use the IP access list for your API keys. To learn more, see - :atlas:`Require an IP Access List for the {+atlas-admin-api+} `. - -To learn more, see :ref:`api-authentication`. - -Deployments -~~~~~~~~~~~ - -To learn our recommendations for deployments, which relate to authentication, see :ref:`arch-center-hierarchy`. - -.. _arch-center-authorization-recs: - -Recommendations for {+service+} Authorization ---------------------------------------------- - -You must implement |service|'s robust Role-Based Access Control (RBAC) -to effectively manage access across all resources. |service| includes -built-in roles that provide different levels of access commonly needed -for managing the |service| control plane. For connecting to |service| -{+clusters+} in the data plane, we recommend using database -fine-grained custom roles to provide granular scoping based on the access -to the data access required for the role to perform its function. -This approach enables you to follow the principle of least privilege. - -Additionally, by integrating |service| with a federated identity provider, -you can use just-in-time provisioning by mapping identity provider groups to |service| roles. -This streamlines access management and ensures secure and organized role -assignments throughout the platform. You can grant access programmatically -based on the provisioning process of your orchestration layer. - -In general, it is a best practice to always restrict access to upper -environments to only programmatic service accounts with scripts that -are tested for security and deterministic outcomes. Human access should -only be allowed in lower environments during development and testing. - -{+atlas-ui+} and {+atlas-admin-api+} (Control Plane) Authorization -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -You can assign users, service accounts, and API keys to predefined -roles, specifying the -actions they can perform within |service| organizations, projects, or -both. Use Identity Federation to manage access by linking your identity -provider groups to |service| roles through group-role mappings. - -We recommend that you use a modern federated Identity Provider (IdP) that -provides SSO with SAML such as Azure Entra ID, Okta, or Ping Identity -as this makes the authorization process more secure and supports the -flexibility needed to programmatically assign :abbr:`IdP (Identity Provider)` -groups to |service| roles. You should restrict your company's domain from -preventing users from logging into |service| when they are not authorized -for access, which you can do by following the procedure in -:atlas:`Manage Domain Mapping for Federated Authentication -`. From here, we recommend that -you map your :abbr:`IdP (Identity Provider)` groups to |service| roles -as shown in :atlas:`Role Mapping Process -`. - -If you have followed the standard |service| hierarchy of a single billing -organization with a linked organization for each {+BU+} or department, then -you should restrict organization users to the operations or platform team admins. -In contrast, you should assign project roles to the development or product teams -responsible for building applications. Only programmatic access should be -allowed in upper environments. The following recommendations for the most -commonly used roles can serve as a general guideline: - -* The ``Organization Owner`` role should be heavily restricted and not assigned - to a human, as it has the ability to change organization-wide settings and delete - configurations. This role should be assigned to a service account which you use - only to initially set up and configure the organization. Minimize configuration - changes after the initial creation. To avoid account lockouts, you - can create the following items: - - - SAML Organization Owner group with - :ref:`arch-center-just-in-time`. - - API key with the Organization Owner role. Keep it in a - secure place with strong access management for break-glass - emergency scenarios. - -* The ``Organization Member`` role should be for admins on the operations and - platform team that can view settings and configuration for the organization. - -* The ``Organization Project Creator`` role should be a programmatic service - account used to create projects on behalf of new applications for development - and product teams. - -* The ``Organization Billing Admin`` role should be a programmatic service - account used to pull invoices programmatically from the Billing API - and feed them into your FinOps tool. This same service account should have - access to all linked organizations for which it is responsible for reporting usage. - -* The ``Project Owner`` role should be used for governance enforced by the - operations and provisioning team. Assign this role to a programmatic service - account, as it has the ability to create and delete {+clusters+}. For sandbox - environments, you may consider granting a user ``Project Owner`` access to - enable them to quickly provision {+clusters+} for testing code and use cases - without going through the orchestration deployment pipeline. - -* In lower environments, use the ``Project Data Access Admin`` role to - grant access to the development team building the application so they can - access the query and performance metrics of the {+cluster+} during - development and testing. This access allows them to debug data issues - with the Data Explorer. - Don't allow this role in production environments. It has the - ability to view and edit data, including creating and dropping databases, - collections, and indexes on the {+cluster+}, which is useful for rapid - experimentation and development. - If you are not comfortable giving development teams this level of access in the development environment, you - can grant them read-only access to the {+cluster+}\'s data and performance - statistics with the ``Project Data Access Read Only`` role. - - To grant read-only access to the {+cluster+}'s - data in production environments, use the ``Project Observability Viewer`` role. - -To learn more, see :ref:`user-roles`. - -Database Authorization -~~~~~~~~~~~~~~~~~~~~~~ - -Workforce and workload users can be assigned fine-grained database -roles, predefined or custom, with permissions tailored to specific -projects or individual {+clusters+}. In staging and production -environments, we recommend using Identity Federation to streamline -access management by linking your Identity Provider (IdP) to |service| -for a more modern and streamlined authentication and authorization flow -for data access. - -By configuring :atlas:`Group Membership ` in your :abbr:`IdP (Identity -Provider)`, you can map groups to database users, simplifying access -control within the :abbr:`IdP (Identity Provider)`. However, for -workload identities, we recommend assigning roles directly using the -``users`` claim instead of ``groups``. In development and test environments, -you can default to the predefined ``readWriteAny`` role to simplify the -development and testing process. When moving the application to higher -environments, you should build a custom role to restrict the access that -the application server has based on the principle of least privilege. - -To learn more, see the following: - -- :ref:`mongodb-users-roles-and-privileges` -- :ref:`mongodb-roles` - -Automation Examples: {+service+} Authentication and Authorization ------------------------------------------------------------------ - -.. include:: /includes/complete-examples.rst - -The following examples configure authentication and custom roles using -|service| :ref:`tools for automation `. - -.. tabs:: - - .. tab:: Dev and Test Environments - :tabid: devtest - - .. tabs:: - - .. tab:: CLI - :tabid: cli - - Run the following command to create a user authentication - with IAM credentials to a specified {+cluster+}. - - .. include:: /includes/examples/cli-example-auth-aws-iam-devtest.rst - - Run the following command to create a temporary user with - SCRAM authentication. - - .. include:: /includes/examples/cli-example-auth-temp-user-devtest.rst - - Run the following command to configure Workforce Identity - Federation with OIDC. - - .. include:: /includes/examples/cli-example-auth-oid-devtest.rst - - .. tab:: Terraform - :tabid: Terraform - - The following examples demonstrate how to configure - authentication and authorization. Before you can create - resources with Terraform, you must: - - - :ref:`Create your paying organization - ` and :ref:`create an API key - ` for the paying organization. - Store your API key as environment variables by running the - following command in the terminal: - - .. code-block:: - - export MONGODB_ATLAS_PUBLIC_KEY="" - export MONGODB_ATLAS_PRIVATE_KEY="" - - - `Install Terraform `__ - - Common Files - ~~~~~~~~~~~~ - - You must create the following files for each example. Place - the files for each example in their own directory. Change - the IDs and names to use your values. Then run the commands - to initialize Terraform, view the Terraform plan, and apply - the changes. - - azure.tf - ```````` - - .. include:: /includes/examples/tf-example-auth-tfazure.rst - - variables.tf - ```````````` - - .. include:: /includes/examples/tf-example-auth-variables-devtest.rst - - terraform.tfvars - ```````````````` - - .. include:: /includes/examples/tf-example-auth-tfvars-devtest.rst - - - Authentication Example - ~~~~~~~~~~~~~~~~~~~~~~ - - Use the following to create an |service| user with username - and password authentication. - - main.tf - ``````` - - .. include:: /includes/examples/tf-example-auth-scram-devtest.rst - - Use the following example to set up an :abbr:`OIDC (OpenID Connect)` - federated identity provider in |service|, for using it with - |azure| and then create an :abbr:`OIDC (OpenID Connect)` - federated authentication user. It uses OIDC tokens issued by |azure| Active Directory to allow access. - - main.tf - ``````` - - .. include:: /includes/examples/tf-example-auth-oidc-devtest.rst - - outputs.tf - `````````` - - .. include:: /includes/examples/tf-example-auth-tfoutputs-devtest.rst - - Authorization Example - ~~~~~~~~~~~~~~~~~~~~~ - - Use the following to grant users admin rights on the - {+cluster+} and project member rights for the projects in the - {+cluster+}. - - main.tf - ``````` - - .. include:: /includes/examples/tf-example-auth-grant-roles-devtest.rst - - .. tab:: Staging and Prod Environments - :tabid: stagingprod - - .. tabs:: - - .. tab:: CLI - :tabid: cli - - Run the following command to create a database user from a specific - group in the Identity Provider. You can manage the user - authentication and authorization through the identity provider, - Okta. The command also grants the users in the identity provider - group ``dbAdmin`` and ``readWrite`` privileges on the |service| - {+cluster+}. - - .. include:: /includes/examples/cli-example-auth-okta-stagingprod.rst - - Run the following command to create an :abbr:`OIDC (OpenID - Connect)`-compatible identity providers from your federation - settings. - - .. include:: /includes/examples/cli-example-auth-oidc-stagingprod.rst - - .. tab:: Terraform - :tabid: Terraform - - The following examples demonstrate how to configure - authentication and authorization. Before you can create - resources with Terraform, you must: - - - :ref:`Create your paying organization - ` and :ref:`create an API key - ` for the paying organization. - Store your API key as environment variables by running the - following command in the terminal: - - .. code-block:: - - export MONGODB_ATLAS_PUBLIC_KEY="" - export MONGODB_ATLAS_PRIVATE_KEY="" - - - `Install Terraform `__ - - Common Files - ~~~~~~~~~~~~ - - You must create the following files for each example. Place - the files for each example in their own directory. Change - the IDs and names to use your values. Then run the commands - to initialize Terraform, view the Terraform plan, and apply - the changes. - - azure.tf - ```````` - - .. include:: /includes/examples/tf-example-auth-tfazure.rst - - variables.tf - ```````````` - - .. include:: /includes/examples/tf-example-auth-variables-stagingprod.rst - - terraform.tfvars - ```````````````` - .. include:: /includes/examples/tf-example-auth-tfvars-stagingprod.rst - - outputs.tf - `````````` - - .. include:: /includes/examples/tf-example-auth-tfoutputs-stagingprod.rst - - Configure Federated Settings for Identity Provider - ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - - Use the following to set up an :abbr:`OIDC (OpenID Connect)` - federated identity provider in |service|, for using it with - |azure|. It allows access by using OIDC tokens issued by |azure| - Active Directory. - - .. include:: /includes/examples/tf-example-auth-oidc-stagingprod.rst - - Use the following to create an :abbr:`OIDC (OpenID Connect)` - federated authentication user. - - .. include:: /includes/examples/tf-example-auth-create-oidc-user-stagingprod.rst - - Configure Custom Role - ~~~~~~~~~~~~~~~~~~~~~ - - Use the following to create a custom role named ``my_custom_role`` - which allows update, add, and delete operations on any collection - in the database named ``myDb``. - - .. include:: /includes/examples/tf-example-auth-create-custom-role-stagingprod.rst - -For an example of an |service| project with the |service| role assigned -to a specific group, see :ref:`Examples `. + Authentication + Authorization diff --git a/source/auth/authentication.txt b/source/auth/authentication.txt new file mode 100644 index 00000000..76e2b0e6 --- /dev/null +++ b/source/auth/authentication.txt @@ -0,0 +1,273 @@ + +.. _arch-center-authentication-recs: + +======================================== +Guidance for {+service+} Authentication +======================================== + +.. default-domain:: mongodb + +.. facet:: + :name: genre + :values: reference + +.. meta: + :description: Learn about the different authentication mechanisms that Atlas supports. + +.. contents:: On this page + :local: + :backlinks: none + :depth: 2 + :class: onecol + + +|service-fullname| supports a variety of authentication methods to +ensure robust security. |service| requires all users to authenticate to +access the {+atlas-ui+}, |service| databases, and the {+atlas-admin-api+}. + +.. note:: + + In this context, a "user" can be a human or an application. We refer to + human users as "Workforce Identity" and applications as "Workload Identity". + +Two factors determine which authentication types to use: + +- The identity type (human or machine) +- The resource that the identity needs access to. The resource can be one + of the following: |service| UI, |service| Database, or |service| APIs. + +Recommendations for {+service+} Authentication: Single-Region and Multi-Region +------------------------------------------------------------------------------- + +.. collapsible:: + :heading: Single-Region Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments in a single region + :expanded: true + + Singe-region deployments have no unique considerations for authentication. + See the next section for "All Deployment Paradigm Recommendations". + +.. collapsible:: + :heading: Multi-Region and Multi-Cloud Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments across multiple regions or multiple cloud providers + :expanded: true + + For authentication through the :ref:`Workforce Identity Federation `, + use a globally-distributed identity provider. + +All Deployment Paradigm Recommendations +--------------------------------------- + +The following recommendations apply to all :ref:`deployment paradigms +`. + +|service| UI Authentication +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +- Workforce Users + + Use IP access restrictions, and then either: + + - Use :ref:`Federated Authentication ` with an SAML 2.0 + identity provider, such as Okta, Microsoft Entra ID, or Ping Identity, or + - Use Atlas credentials with + :ref:`Multi-factor Authentication (MFA) `. + +- Workload Users + + This only applies to Workforce users. + +|service| Database Authentication +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +- Workforce Users + + Use :ref:`Workforce Identity Federation `. + + For development and test environments, you can also use + :ref:`SCRAM `. Consider :ref:`creating temporary database + users ` with just-in-time database access. + +- Workload Users + + Use one of the following: + + - :ref:`Workload Identity Federation ` + - :ref:`AWS-IAM authentication ` + - :ref:`X.509 ` certificates + + For development and test environments, you can also use + :ref:`X.509 ` certificates or :ref:`SCRAM `. + +|service| API Authentication +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. note:: + + This applies to both Workforce and Workload users. + +Use :ref:``. +For development and test environments, you can also use +:ref:`` or :ref:``. + +Authentication Types +~~~~~~~~~~~~~~~~~~~~~ + +The following sections provide details about the authentication methods used when +accessing the {+atlas-ui+}, |service| Database, or the {+atlas-admin-api+}. + +.. _arch-center-federated_auth: + +Federated Authentication +````````````````````````` + +Federated authentication allows you to manage all authentication to +the {+atlas-ui+} across multiple systems and applications through a +central identity provider, which reduces user management complexity. With +federated authentication, you can enforce security policies such as password +complexity, credential rotation, and :abbr:`MFA (Multi-Factor +Authentication)` within your identity provider's tools. + +For the {+atlas-ui+}, you can use any :abbr:`SAML (Security Assertion +Markup Language) 2.0` compatible identity provider such as Okta, Microsoft +Entra ID, or Ping Identity. + +.. _arch-center-workforce_if: + +Workforce Identity Federation ++++++++++++++++++++++++++++++++ + +Workforce Identity Federation allows you to manage all authentication to +the |service| database through your identity provider. To learn more, +see :ref:`oidc-authentication-workforce`. + +.. _arch-center-workload_if: + +Workload Identity Federation ++++++++++++++++++++++++++++++++ + +Workload Identity Federation enables applications running in cloud +environments like |azure| and Google Cloud to authenticate with +|service| without the need to manage separate database user credentials. +With Workload Identity Federation, you can manage |service| database +users using |azure| Managed Identities, Google Service Accounts, or any +OAuth 2.0-compliant service. These authentication mechanisms simplify +management and enhance security by allowing for passwordless access to +the |service| database. + +.. _arch-center-iam: + +AWS IAM Role Authentication +```````````````````````````` + +You can also authenticate through |aws| |iam| roles. To learn more, +see :ref:`set-up-pwdless-auth`. + +To learn more, see :ref:`oidc-authentication-workload` and +:ref:`atlas-federated-authentication`. + +.. _arch-center-mfa: + +Multi-Factor Authentication +````````````````````````````` + +For any human user that has access to the |service| control plane, we recommend +:abbr:`MFA (Multi-Factor Authentication)` for enhanced security. When +:abbr:`MFA (Multi-Factor Authentication)` is enabled, |service| requires +two forms of identification: + +- The user's credentials +- One of the following recommended factors: + + - Security keys + - Biometrics + - OTP authenticators + - Push notifications + - SMS (not recommended as primary factor) + - Email (not recommended as primary factor) + +.. note:: + + If you are using Federated Auth, you configure and manage MFA in the + :abbr:`IdP (Identity Provider)`. If you are using |service| credentials, + MFA is configured and managed within |service|. MFA is required when + using |service| credentials. + +To learn more, see :ref:`atlas-enable-mfa`. + +.. _arch-center-x509: + +X.509 Client Certificates +``````````````````````````` + +X.509 certificates provide the security of mutual TLS, +making them suitable for staging and production environments, and you +can bring your own certificate authority for use with X.509. +The disadvantage of X.509 is that you must manage certificates and +the security of these certificates on the application side, while +Workload Identity Federation allows for password-less access and easier +application security. + +To learn more, see :manual:`X.509 `. + +.. _arch-center-scram: + +SCRAM Password Authentication +``````````````````````````````` + +|service| clusters support SCRAM password authentication for user authentication, +but we recommend SCRAM only for use in development and test environments. + +If you leverage X.509 or SCRAM authentication, we recommend that you use +a third-party secrets manager like +`HashiCorp Vault `__ +or |aws| Secrets Manager to generate and store complex database credentials. + +To learn more, see :manual:`SCRAM `. + +.. _arch-center-admin-service-accounts: + +Service Accounts +````````````````` + +Service accounts use industry-standard OAuth2.0 to securely authenticate +with {+service+} through the {+atlas-admin-api+}. We recommend that you use +service accounts instead of |api| keys when possible because they provide added +security through use short-lived access tokens and required credential rotations. + +You can manage programmatic access for service accounts only by using the +{+atlas-ui+} or the {+atlas-admin-api+}. You can't manage programmatic access +for service accounts through the {+atlas-cli+} or Terraform. + +To learn more, see :atlas:`Service Accounts Overview `. + +.. _arch-center-admin-api-keys: + +API Keys +````````` + +Service accounts are the preferred method of authentication. {+service+} +provides legacy support for |api| key-based authentication for managing +programmatic access. |api| key-based authentication uses |http| Digest +authentication to protect requests. To learn more, see :ref:`api-authentication`. + +.. _arch-center-secrets: + +Secrets Management +``````````````````` + +We recommend using a third-party secrets manager like `HashiCorp Vault +`__ +or |aws| Secrets Manager to generate and store complex database credentials. +A secrets manager can generate database credentials dynamically based on configured +roles for |service| databases. + +To learn more, see the blog :website:`Manage +MongoDB Atlas Database Secrets in HashiCorp Vault +`. + +Deployments +````````````` + +To learn our recommendations for deployments that relate to authentication, +see :ref:`arch-center-hierarchy`. diff --git a/source/auth/authorization.txt b/source/auth/authorization.txt new file mode 100644 index 00000000..dda2f11f --- /dev/null +++ b/source/auth/authorization.txt @@ -0,0 +1,161 @@ +.. _arch-center-authorization-recs: + +====================================== +Guidance for {+service+} Authorization +====================================== + +.. default-domain:: mongodb + +.. facet:: + :name: genre + :values: reference + +.. meta: + :description: Learn about the different authorization mechanisms that Atlas supports. + +.. contents:: On this page + :local: + :backlinks: none + :depth: 2 + :class: onecol + +|service-fullname| supports a variety of authorization methods to +ensure robust access to resources. |service| requires all users to authenticate. +Once the user is authenticated, authorization determines the user's access to +resources. + +When implementing |service| authorization, you must use +:ref:`Role-Based Access Control (RBAC) `. Using an +:ref:`Identity Federation Provider's ` groups with RBAC +simplifies management. + +The following recommendations apply to both workforce (human) and workload +(application) users in all :ref:`deployment paradigms `. + +Recommendations for Authorization: Single-Region and Multi-Region +------------------------------------------------------------------ + +.. collapsible:: + :heading: Single-Region Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments in a single region + :expanded: true + + Singe-region deployments have no unique considerations for authorization. + See the next section for "All Deployment Paradigm Recommendations". + +.. collapsible:: + :heading: Multi-Region and Multi-Cloud Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments across multiple regions or multiple cloud providers + :expanded: true + + Multi-region and multi-cloud deployments have no unique considerations + for authorization. See the next section for "All Deployment Paradigm Recommendations". + +All Deployment Paradigm Recommendations +--------------------------------------- + +The following recommendations apply to all :ref:`deployment paradigms +`. + +.. _arch-center-rbac: + +RBAC and Predefined Roles +~~~~~~~~~~~~~~~~~~~~~~~~~ + +|service| uses Role-Based Access Control (RBAC) to simplify management of user +authorization. |service| includes predefined +:atlas:`user roles ` that +provide specific levels of access commonly needed for managing the |service| with +the UI and APIs. To simplify management, you can then map the roles to to +:abbr:`IdP (Identity Provider)` groups. + +For connecting to |service| {+clusters+}, use fine-grained custom database roles +to provide granular scoping based on the access required for the role to perform +its function. This approach enables you to follow the principle of least privilege. + +.. note:: + You should always restrict access by assigning the lowest-needed RBAC roles. + You should also use domain restrictions. + +There are two levels of roles that you can assign: organization-level and +project-level. + +Organization-Level Roles +````````````````````````` + +Organization-level roles are used by service accounts to automate tasks +such as creating new projects, managing IAM, and billing. They may also be +used for platform team members. + +* The ``Organization Owner`` role should be heavily restricted and not assigned + to a human, as it has the ability to change organization-wide settings and delete + configurations. This role should be assigned to a service account which you use + only to initially set up and configure the organization. Minimize configuration + changes after the initial creation. To avoid account lockouts, you + can create the following items: + + - SAML Organization Owner group with :ref:`arch-center-just-in-time`. + - Service account with the Organization Owner role. Keep it in a + secure place with strong access management for break-glass + emergency scenarios. + +* The ``Organization Member`` role should be for admins on the operations and + platform team that can view settings and configuration for the organization. + +* The ``Organization Project Creator`` role should be a programmatic service + account used to create projects on behalf of new applications for development + and product teams. + +* The ``Organization Billing Admin`` role should be a programmatic service + account used to pull invoices programmatically from the Billing API + and feed them into your FinOps tool. This same service account should have + access to all linked organizations for which it is responsible for reporting usage. + +Project-Level Roles +````````````````````` + +Project-level roles are for the development, test, and product teams that are +responsible for application development and maintenance. As with +organization-level roles, you should always follow the principle of least +privilege. For example, the ``Project Owner`` role should only be used for +governance enforced by the operations and provisioning team. Because a Project +Owner can create and delete {+clusters+}, you should assign this role to a +programmatic service account unless you are working in a sandbox environment. + +To learn more about project-level roles, see: + +- :atlas:`Atlas User Roles ` +- :atlas:``. + +.. _arch-center-fip: + +Federated Identity Providers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +By integrating |service| with a federated identity provider, +you can use just-in-time provisioning by mapping identity provider groups to +|service| roles. This streamlines access management and ensures secure and +organized role assignments throughout the platform. You can grant access +programmatically based on the provisioning process of your orchestration layer. + +You should use a modern Federated Identity Provider (FIP) that +provides SSO, such as Azure Entra ID, Okta, or Ping Identity. +This makes the authorization process more secure and supports the +flexibility needed to programmatically assign :abbr:`IdP (Identity Provider)` +groups to |service| roles. You should restrict access to your company's domain, +which prevents users from logging into |service| when they are not authorized +for access. + +To learn more about mapping roles to Federated Identity provider groups, see +:atlas:`Role Mapping Process `. + +.. _arch-center-just-in-time: + +Just-in-Time Access +~~~~~~~~~~~~~~~~~~~~ + +|service| also supports creating temporary database users +that automatically expire after the predefined times. A user can be +created for 6 hours, 1 day, or 1 week. + +To learn more, see :ref:`mongodb-users`. \ No newline at end of file diff --git a/source/automation.txt b/source/automation.txt index 63de2705..4c329340 100644 --- a/source/automation.txt +++ b/source/automation.txt @@ -21,6 +21,7 @@ Guidance for {+service+} Automated Infrastructure Provisioning |service-fullname| provides tools that enable programmatic management of the deployment, scaling, and maintenance of your Atlas {+clusters+}. + {+service+} provides the flexibility to implement Infrastructure as Code (IaC) using either imperative or declarative programming. For example, developers can write imperative scripts that call functions from our {+atlas-go-sdk+} client, or manage {+service+} resources using declarative {+iac+} tools like the {+ak8so+}, Terraform, {+aws-cf+}, or the {+aws-cdk+}. @@ -38,12 +39,13 @@ We recommend that our enterprise customers use {+iac+} tools for the following b - **Improved Change Management**: {+iac+} tools support reviews and standardization of infrastructure, allowing for better change management practices and compliance. -Features for Atlas Automation ------------------------------ + +Features for |service| Automation +--------------------------------- You can automate the configuration, provisioning, and management of |service| building blocks like database users and roles, and |service| -{+clusters+}, projects, and organizations. You can also automate +clusters, projects, and organizations. You can also automate various configuration and management tasks for {+cluster+} resources, including enabling auto-scaling compute and storage, creating and updating multi-cloud {+clusters+}, monitoring {+cluster+} performance and health, @@ -144,6 +146,26 @@ To learn more, see :ref:`ak8so-quick-start-ref`. Recommendations for {+service+} Automation ------------------------------------------ +.. collapsible:: + :heading: Single-Region Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments in a single region + :expanded: true + + Recommendations here + +.. collapsible:: + :heading: Multi-Region and Multi-Cloud Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments across multiple regions or multiple cloud providers + :expanded: true + + Recommendations here + +All Deployment Paradigm Recommendations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following recommendations apply to all :ref:`deployment paradigms +`. + If you already have an existing tool integrated into your deployment workflow that you use today, we recommend that you use that tool for automation. For example, if your developers and operations team are @@ -163,4 +185,3 @@ tasks that are ephemeral in nature. The {+atlas-cli+} is great for local development as well as integration into a suite of tests as part of your CI/CD pipeline for application development because it improves response times and reduces costs. - diff --git a/source/backups.txt b/source/backups.txt index 1ad92ec3..65af2edf 100644 --- a/source/backups.txt +++ b/source/backups.txt @@ -100,8 +100,36 @@ monthly, each with its own retention period. Recommendations for {+service+} Backups --------------------------------------- +.. collapsible:: + :heading: Single-Region Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments in a single region + :expanded: true + + Recommendations here + +.. collapsible:: + :heading: Multi-Region and Multi-Cloud Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments across multiple regions or multiple cloud providers + :expanded: true + + If you select a multi-region deployment model, we recommend that you deploy to three or more regions to + support automatic failover in the event of a primary region failure. The built-in replication of a multi-region + cluster ensures high availability and fault tolerance without the use of backups. To learn more about which multi-region + deployment models ensure high availability of data, see :ref:`arch-center-deployment-topologies`. + + We also recommend that multi-region and multi-cloud deployments enable local |service| cloud backups or continuous cloud backups + for multiple regions to support recovery to previous versions of data in case of large data loss or corruption events such as + accidental deletions of data. This backup configuration meets the same compliance requirements as enabling + :atlas:`multi-region snapshot distribution ` for a single-region deployment. + +All Deployment Paradigm Recommendations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following recommendations apply to all :ref:`deployment paradigms +`. + Recommendations for Backup Strategy -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +```````````````````````````````````` You must align your backup strategy with specific Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO) to meet business @@ -111,8 +139,9 @@ maximum acceptable amount of data loss during an incident, while RTO defines how quickly your application must recover. Since data varies in importance, you must evaluate RPO and RTO for each application individually. For example, any mission-critical data will likely have -different requirements than clickstream analytics. Your requirements -for RTO, RPO, and the backup retention period will influence the cost +different requirements than clickstream analytics. + +Your requirements for RTO, RPO, and the backup retention period will influence the cost and performance considerations of maintaining backups. In development and test environments, we recommend that you disable backup to save costs. In staging and production environments, ensure that backup is @@ -139,7 +168,7 @@ your RTO. .. _backup-policy-recommendations: Recommendations for Backup Policy -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +````````````````````````````````` |service| provides predefined backup snapshot schedules including frequency of snapshots, and retention period. Retaining backup snapshots @@ -187,7 +216,7 @@ the following: .. _atlas-backup-distribution: Recommendations for Backup Distribution -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +``````````````````````````````````````` |service| provides options for backup locations. To further enhance resilience, we recommend distributing backups to a local region and to an @@ -203,15 +232,17 @@ we recommend striking a balance between availability and cost. However, your critical workloads might require multiple copies of snapshots in various locations. +.. _atlas-backup-compliance: + Recommendations for Backup Compliance Policy -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +```````````````````````````````````````````` We recommend enforcing |service|'s `Backup Compliance Policy `__ to prevent unauthorized modifications or deletions of backups, thereby maintaining data integrity and supporting robust disaster recovery. Recommendations for PIT Recovery -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +```````````````````````````````` Continuous Cloud Backups enable precise Point In Time (PIT) recovery, which minimizes data loss during failures. |service| can quickly recover to the exact @@ -229,7 +260,7 @@ for recovery, we recommend designing templates that identify the best compromise between reasonable recovery options and cost. Recommendations for Backup Costs -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +```````````````````````````````` To optimize |service| backup costs, you must adjust the backup frequency and retention policies to align with data criticality, reducing @@ -243,7 +274,7 @@ By :ref:`selecting regions ` strategically for backup, you can avoid cross-region data transfer fees and choose the right {+cluster+} disk size based on workload to prevent overspending. By implementing these strategies, you can effectively manage costs while -maintaining secure and reliable backups. +maintaining secure and reliable backups. Automation Examples: {+service+} Backups ---------------------------------------- @@ -261,17 +292,17 @@ backup is enabled for the {+cluster+}. .. tab:: CLI :tabid: cli - Run the following command take a backup snapshot for the {+cluster+} + Run the following command to take a backup snapshot for the {+cluster+} named myDemo and retain the snapshot for 7 days: - .. include:: /includes/examples/cli-example-backup-take-snapshot.rst + .. include:: /includes/examples/cli/cli-example-backup-take-snapshot.rst Enable backup compliance policy for your project with a designated, authorized user (``governance@example.org``) who alone can turn off this protection after completing a verification process with MongoDB support. - .. include:: /includes/examples/cli-example-backup-compliance-policy-enable.rst + .. include:: /includes/examples/cli/cli-example-backup-compliance-policy-enable.rst Run the following command to create a compliance policy for scheduled backup snapshots that enforces the number of times @@ -279,7 +310,7 @@ backup is enabled for the {+cluster+}. the duration for retaining the snapshots, which is set to ``1`` month. - .. include:: /includes/examples/cli-example-backup-compliance-policy-schedule.rst + .. include:: /includes/examples/cli/cli-example-backup-compliance-policy-schedule.rst .. tab:: Terraform :tabid: Terraform @@ -311,7 +342,7 @@ backup is enabled for the {+cluster+}. variables.tf ```````````` - .. include:: /includes/examples/tf-example-backup-variables.rst + .. include:: /includes/examples/terraform/tf-example-backup-variables.rst Configure Backup Schedule for the {+Cluster+} ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -322,7 +353,7 @@ backup is enabled for the {+cluster+}. main.tf ``````` - .. include:: /includes/examples/tf-example-backup-snapshot-schedule-tier1.rst + .. include:: /includes/examples/terraform/tf-example-backup-snapshot-schedule-tier1.rst Use the following to configure a :ref:`Tier 2 ` backup schedule for the {+cluster+}. @@ -330,7 +361,7 @@ backup is enabled for the {+cluster+}. main.tf ``````` - .. include:: /includes/examples/tf-example-backup-snapshot-schedule-tier2.rst + .. include:: /includes/examples/terraform/tf-example-backup-snapshot-schedule-tier2.rst Use the following to configure a :ref:`Tier 3 ` backup schedule for the {+cluster+}. @@ -338,7 +369,7 @@ backup is enabled for the {+cluster+}. main.tf ``````` - .. include:: /includes/examples/tf-example-backup-snapshot-schedule-tier3.rst + .. include:: /includes/examples/terraform/tf-example-backup-snapshot-schedule-tier3.rst Configure Backup and PIT Restore for the {+Cluster+} ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -349,4 +380,4 @@ backup is enabled for the {+cluster+}. main.tf ``````` - .. include:: /includes/examples/tf-example-backup-snapshot-pit-restore.rst + .. include:: /includes/examples/terraform/tf-example-backup-snapshot-pit-restore.rst diff --git a/source/compliance.txt b/source/compliance.txt index 7caf0f8a..4b69a468 100644 --- a/source/compliance.txt +++ b/source/compliance.txt @@ -61,6 +61,72 @@ standard to ensure compliance. To learn more, visit the |atlas-trust|, a comprehensive compliance and trust program for MongoDB {+service+}. +.. _arch-center-compliance-atlas-gov: + +MongoDB {+service+} for Government +----------------------------------- + +MongoDB |atlas-gov| is the first fully-managed multi-cloud developer data platform +authorized at FedRAMP® Moderate that: + +- Uses a secure, fully-managed, dedicated FedRAMP® authorized environment. +- Supports the unique requirements and missions of the U.S. Government. +- Offers a set of features and the scalability needed to modernize legacy applications. + +.. _arch-center-resource-policies: + +{+arps+} +----------------------- + +To support your compliance requirements, {+arps+} offer +organization-wide controls for configuring and managing {+service+} resources in +alignment with your security, compliance, and operational best practices. Organization +Owners can define rules that govern user actions when creating or modifying resources +such as clusters, network configurations, and project settings. + +{+arps+} support your compliance objectives by enabling you to: + +- **Enforce Minimum TLS Version:** Mandate the use of modern |tls| protocols across + all {+service+} deployments, enhancing security and mitigating risks associated with + older, less secure versions. This ensures adherence to contemporary encryption + standards for all data in transit. + +- **Customize Default TLS Ciphers:** Select a specific set of allowed |tls| ciphers + to optimize security based on operational needs while eliminating + vulnerabilities associated with legacy encryption methods. This allows for fine-tuning + encryption protocols to meet specific compliance requirements. + +- **Restrict VPC Peering Modifications:** Enable secure cross-network communication + through established |vpc| peering connections while preventing configuration changes. + Current project-level peerings remain active with their existing routing tables + and security protocols, allowing customers to view but not alter these one-to-one + |vpc| relationships and their associated network control mechanisms. + +- **Restrict Private Endpoint Modifications:** Maintain secure service connectivity + through existing private endpoint configurations with read-only access. Project-level + connections remain functional with their current private IP addressing scheme, while + customers can view but not modify these dedicated service connection points within + their |vpc|. + +- **Control IP Access Lists:** Prevent unauthorized modifications to IP access lists, + ensuring consistent and controlled network access to your databases. This strengthens + database security by preserving carefully defined network boundaries and protecting + against accidental configuration changes. + +- **Set Cluster Tier Limits:** Define deployment guardrails by establishing both + maximum and minimum cluster size limits that developers must adhere to when + provisioning resources. This boundary-setting approach ensures teams can deploy + appropriately sized environments within organization-approved parameters, optimizing + infrastructure utilization while enforcing consistent resource allocation policies + across all project workloads. + +- **Set Maintenance Window Requirement:** Enhance platform stability by requiring + a maintenance window for all projects. This governance control ensures organizations + establish a predictable update period (without dictating a specific timeframe) + to supporting consistent system maintenance according to operational needs. + +To learn more, see :atlas:`Atlas Resource Policies `. + .. _arch-center-compliance-encryption: Encryption @@ -139,7 +205,7 @@ in a different geographical location. For example, you can store European customer data in Europe, while storing U.S. customer data in the U.S. This allows you to comply with data sovereignty regulations and reduces latency for users accessing data from their respective regions. To learn -more, see :atlas:`Global Clusters `. +more, see :atlas:`Global Clusters `. .. _arch-center-compliance-backup-distribution: diff --git a/source/cost-saving-config.txt b/source/cost-saving-config.txt index a3c281ce..66d42fc0 100644 --- a/source/cost-saving-config.txt +++ b/source/cost-saving-config.txt @@ -23,4 +23,24 @@ To better understand and streamline your spending, especially as your usage expands, |service-fullname| offers tools to manage and control your organization's database costs. +.. collapsible:: + :heading: Single-Region Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments in a single region + :expanded: true + + Recommendations here + +.. collapsible:: + :heading: Multi-Region and Multi-Cloud Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments across multiple regions or multiple cloud providers + :expanded: true + + Recommendations here + +All Deployment Paradigm Recommendations +--------------------------------------- + +The following recommendations apply to all :ref:`deployment paradigms +`. + .. include:: /includes/billing-optimizations.rst diff --git a/source/data-encryption.txt b/source/data-encryption.txt index fbf5786c..1bbdab4c 100644 --- a/source/data-encryption.txt +++ b/source/data-encryption.txt @@ -52,9 +52,9 @@ This process cannot be disabled. By default, Encryption at Rest is volume-level encryption. -Additionally, you can enable database-level encryption by bringing your own -customer-managed key (CMK) with a key management service (KMS) -such as AWS KMS, Google Cloud KMS, or Azure Key Vault. This feature +Additionally, you can enable database-level encryption by bringing your own key (BYOK). +This is the customer-managed key (CMK) that you add with a key management service (KMS), +such as AWS KMS, Google Cloud KMS, or Azure Key Vault. The BYOK feature provides file-level encryption and is equivalent to Transparent Data Encryption (TDE), meeting enterprise TDE requirements. Encryption with customer key management adds another layer of security for additional confidentiality and data segmentation. @@ -139,13 +139,60 @@ client as readable plaintext. Recommendations for {+service+} Data Encryption ----------------------------------------------- -Consider the following security recommendations when -provisioning your {+clusters+}. +.. collapsible:: + :heading: Single-Region Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments in a single region + :expanded: true + + Singe-region deployments have no unique considerations for data encryption. + See the next section for "All Deployment Paradigm Recommendations". + +.. collapsible:: + :heading: Multi-Region and Multi-Cloud Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments across multiple regions or multiple cloud providers + :expanded: true + + In multi-region and multi-cloud deployments, you can enable database-level + encryption by bringing your own customer-managed key (BYOK). + + Some KMS providers support multi-region capabilities. + For example, GCP KMS supports multi-region deployment and automatic key + replication. However, in the event of a regional outage affecting the + configured KMS endpoint, the |service| integration with a KMS provider + **doesn't provide automatic failover** of the connection or access + mechanism to the KMS across regions. + + If the configured KMS provider becomes unavailable, |service| performs + periodic validation of your KMS configuration. If your KMS provider + credentials become invalid or the encryption key is deleted or disabled, + |service| shuts down the ``mongod`` and ``mongos`` processes on the + next scheduled validation check and notifies you via email, reflecting + the status in the {+atlas-ui+}. + + However, if |service| cannot connect to your key management provider, + it does not shut down your processes. Therefore, if access to the KMS + is disrupted due to a regional issue, the cluster might continue running + until you manually restart it or until validation fails more critically. + + Because of this, if the KMS goes down, you can change the configuration. + To recover access to the encryption key in a multi-region outage scenario, + you must manually change the configuration in |service| to point to a + different, accessible KMS instance or configuration. + + In summary, |service| integration with the BYOK providers for Encryption at Rest + does not support multi-region failover automatically and you must manually + change the configuration. + +All Deployment Paradigm Recommendations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following recommendations apply to all :ref:`deployment paradigms +`. .. _arch-center-cmk: Encryption with Customer Key Management -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +``````````````````````````````````````` .. include:: /includes/encryption-with-cmk-recommendations.rst @@ -160,7 +207,7 @@ when provisioning a new |service| organization, project, and {+cluster+}, see :ref:`arch-center-create-hierarchy-example`. Data Classification -~~~~~~~~~~~~~~~~~~~ +``````````````````` During the provisioning process, we also recommend assessing the sensitivity of certain fields in your data and classifying @@ -245,7 +292,7 @@ create the following resources. Change the IDs and names to use your values: Alternatively, to simplify the configuration process, you can use the `encryption at rest Terraform module `__. - .. include:: /includes/examples/tf-example-aws-kms.rst + .. include:: /includes/examples/terraform/tf-example-aws-kms.rst .. tab:: Azure :tabid: azure @@ -255,12 +302,12 @@ create the following resources. Change the IDs and names to use your values: For a complete configuration example, see :github:`Atlas Terraform Provider Example `. - .. include:: /includes/examples/tf-example-azure-key-vault.rst + .. include:: /includes/examples/terraform/tf-example-azure-key-vault.rst .. tab:: GCP :tabid: gcp - .. include:: /includes/examples/tf-example-gcp-kms.rst + .. include:: /includes/examples/terraform/tf-example-gcp-kms.rst For more configuration options and info about this example, see `Terraform documentation diff --git a/source/deployment-paradigms.txt b/source/deployment-paradigms.txt new file mode 100644 index 00000000..2aecc93b --- /dev/null +++ b/source/deployment-paradigms.txt @@ -0,0 +1,198 @@ +.. _arch-center-paradigms: + +================================ +{+service+} Deployment Paradigms +================================ + +.. default-domain:: mongodb + +.. contents:: On this page + :local: + :backlinks: none + :depth: 2 + :class: onecol + +When deploying your database, you must choose between deploying to a single region or multiple regions. The following diagram +shows these options, which are explained further below: + +.. figure:: /includes/images/deployment-types.png + :figwidth: 750px + :alt: An image showing the different deployment options. + +Single-Region Deployments +------------------------- + +A :ref:`single-region deployment ` is the simplest deployment paradigm. +In a single-region deployment, your data is stored in one of a provider's regions (such as {+aws+}'s ``us-west-2`` +or {+gcp+}'s ``asia-northeast3``). Because {+service+} always provides a +minimum of zone-level availability, your cluster nodes are spread across the +availability zones within a single region. Therefore, if a single zone fails, your +data is still available in the other zones. + +With the simplicity and lower cost of a single-region deployment +comes the risk of lower availability and potentially higher latency, depending on the +distribution of your application's users. + +Multi-Region Deployments +------------------------ + +A :ref:`multi-region deployment ` is a more complex deployment paradigm that provides higher availability +and low latency across a larger geographic range than a single-region deployment. There are several types of multi-region deployments: + +- **Multi-region in one geography**: + Deploys to multiple regions hosted by a single cloud provider + within a single geography, which is a large area like a country or continent. + This ensures availability if any given region fails. + + For example, you deploy clusters to the {+aws+} regions ``us-west-1`` and ``us-east-1``, + both of which are in the United States. + +- **Multi-region in multiple geographies**: + Deploys to one or more regions in two or more geographies. + This ensures availability if any given region fails, or + if an entire geographic area is unavailable. + + For example, you deploy clusters in the {+aws+} regions ``us-east-1`` and ``us-east-2``, + both of which are in the United States, and a third cluster in ``eu-west-2``, + which is in Europe. + +- **Multi-cloud** : + Deploys to one or more regions hosted by multiple cloud providers. + This provides the highest level of availability, ensuring + your data is available if any given region fails or if an entire cloud provider fails. + + For example, you deploy clusters in the {+aws+} region ``us-west-1`` and the {+gcp+} + region ``us-east4``. + +- **Global**: + Deploys a global cluster with up to nine single or multi-region zones. + Consider this option for only the most complex + situations; for example, where you need global aggregation of user data, or + where legal terms dictate specific hosting requirements. + +.. _arch-center-choosing-paradigms: + +Choosing a Deployment Paradigm +------------------------------ + +To figure out which deployment pattern is right for you, break down your +applications by how critical they are to your core business. The +more important the application (in other words, the more impact an +outage has on your business), the more you should consider an architecture that +automatically handles any outage event. + +The following table provides a comparison of deployment paradigms to help you +determine the best fit for your needs: + +.. list-table:: + :header-rows: 1 + :widths: 20 20 20 20 20 + + * - Design Consideration + - Single-Region + - Multi-Region in One Geography + - Multi-Region in Multiple Geographies + - Multi-Region and Multi-Cloud + + * - **Tier 1** + - Highest criticality applications. Requires fully-automated failover even + in the event of regional outages. + - 0 + - :ref:`3 Regions with 5 or more nodes ` + - $$$ + + * - **Tier 2** + - Lower criticality applications. Can experience some downtime or + maintenance windows warmingithout significant revenue impact. + - > 1 hour and < 8 hours + - | :ref:`2 regions with 3 nodes and backups ` or + | :ref:`3 regions with 1 node in each ` + - $$ + + * - **Tier 3** + - Lowest criticality applications. Can be down for 24 hours without significant + revenue impact. + - > 8 hours + - :ref:`3 nodes in a single region ` + - $+ + + * - **Non-Production** + - Noncritical applications. Environments that are not directly responsible for + revenue and are not customer-facing. Typically development and testing + environments. + - n/a + - :ref:`Single node, single region ` + - $0 and up + +.. note:: + + The cost of each deployment type depends on several factors, including the + provider(s) you select, the number of regions you need, the amount of storage, and + the processing power of the servers. For the latest pricing information, + refer to the `MongoDB Pricing `_. + +Use Cases +~~~~~~~~~ + +When choosing your deployment paradigm, start by identifying the smallest number of regions that you can deploy to serve +the widest geographical distribution of your users. Then, consider adding additional regions or +cloud providers according to your requirements for availability, performance, and +:ref:`data sovereignty `. + +Consider the following use cases to help decide which deployment paradigm works for the geographical distribution of your application's users: + +Users Mostly in One Geography +````````````````````````````` + +If the majority of your application's users are located in one geography, +we recommend that you deploy to one or more regions within the same geography. +While a single-region deployment can protect against an outage in a single availability zone, +a multi-region deployment covers a larger geographical area and ensures availability during both zonal and regional outages. +For even higher availability, you can deploy across multiple regions *and* multiple cloud providers to protect against cloud-provider outages. +These options all support low latency and simplify compliance with data sovereignty requirements +because all user data is accessed and stored within the same geography. + +To learn more about these deployment paradigms, see the following paradigm pages: + +- :ref:`Single-Region Deployment ` +- :ref:`Multi-Region Deployment in One Geography ` +- :ref:`Multi-Region and Multi-Cloud Deployment in One Geography ` + +Users Distributed Across Multiple Geographies +````````````````````````````````````````````` + +If your application's users are distributed across multiple geographies, such as between the US and Europe, +we recommend that you deploy to one or more regions in each geography. +Deploying to one region in each geography where you serve customers provides low latency and +availability in case of a geographical outage. You can also meet data sovereignty requirements +by partitioning your data so that user data from each geography is hosted within that geography. + +To ensure high availability in case of regional outages without increasing latency or violating data sovereignty requirements, +you can also deploy to multiple regions in each geography. You can achieve the highest availabilty for a multi-region deployment +by deploying clusters to multiple regions, in multiple geographies, hosted by multiple cloud providers. + +To learn more about these deployment paradigms, see the following paradigm pages: + +- :ref:`Multi-Region Deployment in Multiple Geographies ` +- :ref:`Multi-Region and Multi-Cloud Deployment in Multiple Geographies ` + +Users Distributed Worldwide +``````````````````````````` + +If you are deploying an application to a worldwide audience, we recommend that you deploy a multi-region deployment +across multiple geographies before considering a global deployment. Global |service| deployments are the most complex multi-region deployment paradigms, +and therefore require very careful planning. In most cases, a multi-region deployment in multiple geographies can fulfill your requirements +for high availability, low latency, and data sovereignty. + +To learn more about these deployment paradigms, see the following pages: + +- :ref:`Multi-Region Deployment in Multiple Geographies ` +- :ref:`Global Deployment ` + +.. toctree:: + :titlesonly: + + Single-Region + Multi-Region + Multi-Cloud + Hybrid diff --git a/source/deployment-paradigms/hybrid.txt b/source/deployment-paradigms/hybrid.txt new file mode 100644 index 00000000..de46f528 --- /dev/null +++ b/source/deployment-paradigms/hybrid.txt @@ -0,0 +1,86 @@ +.. _arch-center-paradigms-hybrid: + +========================== +Hybrid Deployment Paradigm +========================== + +.. default-domain:: mongodb + +.. contents:: On this page + :local: + :backlinks: none + :depth: 2 + :class: onecol + +Hybrid {+service+} deployments are a combination of cloud deployments and +on-premise deployments. The cloud deployments can be +:ref:`single-region ` or +:ref:`mutli-region ` deployments. On-premises +deployments can be either local deployments that you create with the +{+atlas-cli+}, or self-managed deployments in +:manual:`MongoDB Enterprise ` or +:manual:`MongoDB Community `. + +Hybrid deployments allow you to keep some data on premises for +development or compliance, while still leveraging +the following benefits of {+service+} cloud deployments: + +- Scalability and cost savings +- High availability and low latency on a regional or global scale +- Features to help meet cloud compliance requirements for + :ref:`data sovereignty ` + +The following diagram shows one example of a hybrid deployment, in which the +production application uses a cluster hosted in a single AWS region, and +the development app points to an on-prem cluster: + +.. figure:: /includes/images/hybrid.svg + :figwidth: 750px + :alt: An image showing a single-region, three-node cloud deployment and a three-node on-premises deployment. + +Use Case for Hybrid Deployments +------------------------------- + +A hybrid deployment may be best for you if you have +the following requirements: + +- You want to deploy your production workloads in the cloud, which + allows you to scale your resources based on the needs of your + application and pay only for the resources you use. +- You want to develop on local {+clusters+}, which reduces + costs when compared to cloud-hosted development. + +If this is your use case, we recommend that you deploy locally +by using the :atlascli:`{+atlas-cli+} `. You +can deploy your cloud databases as single-region, multi-region, and +multi-cloud. + +Use the following resources to select your cloud deployment type based +on your cloud needs: + +- :ref:`arch-center-paradigms-single` +- :ref:`arch-center-paradigms-multi-region` +- :ref:`arch-center-paradigms-multi-cloud` + +Recommendations for Hybrid Deployments +-------------------------------------- + +The {+atlas-arch-center+} does not currently cover recommendations +specific to hybrid deployments. Contact {+ps+} +team to create a custom landing zone for your {+service+} hybrid +deployments. + +- :ref:`arch-center-single-region-rec-summary` +- :ref:`arch-center-multi-region-rec-summary` + +Creating Hybrid Deployments +--------------------------- + +To learn how to configure cloud deployments and learn about the +different types of nodes you can add, see :atlas:`Create a Cluster +` in the {+service+} documentation. + +To learn how to configure an on-premises deployment, see +:atlascli:`Create a Local {+service+} Deployment ` +with the {+atlas-cli+} and :manual:`Install MongoDB ` for +self-managed deployments in MongoDB Enterprise or MongoDB Community. diff --git a/source/deployment-paradigms/latency-strategies.txt b/source/deployment-paradigms/latency-strategies.txt new file mode 100644 index 00000000..e3329f5f --- /dev/null +++ b/source/deployment-paradigms/latency-strategies.txt @@ -0,0 +1,101 @@ +.. _arch-center-latency-strategies: + +========================================= +Multi-Region Latency Reduction Strategies +========================================= + +.. default-domain:: mongodb + +.. contents:: On this page + :local: + :backlinks: none + :depth: 2 + :class: onecol + +Multi-region {+service+} deployments can be used to enhance performance by +reducing latency. The following sections list the factors and configuration +choices you can make to reduce latency. + +Physical Distance +----------------- +Physical distance is the primary cause of latency. The distance between +users and your application, between your application and your data, and between +{+cluster+} nodes all impact system latency. + +To reduce latency for read operations, it's crucial to place both your +application and data geographically closer to users. + +Replication Configuration +------------------------- + +:manual:`Replication ` is the copying of data from the primary +node to secondary nodes. How you configure replication contributes to latency. +Consider the following factors: + +- **Write Concern Levels:** There is a trade-off between write durability and + write latency. The write concern level you configure (for example, + ``w: "majority"``) defines replication across multiple data centers, + potentially increasing latency for more durable writes. + +- **Order of Regions:** The order of regions in your configuration can determine + the priority for the primary node location, which can impact write latency + depending on where your users are located. For example, if most of your users + are in Asia, your primary node should also be in a region in Asia. + +- **Mirrored Reads-** Mirrored reads reduce the impact of primary elections + following an outage by pre-warming the caches on the secondary nodes. + +- **Read Preference:** By default, applications send read operations to the + primary node. However, you can configure the read preference to send read + operations to secondary nodes. By doing so, you ensure reads go to the + geographically closest cluster. + + .. important:: + + Keep in mind that there is the possibility of a secondary node returning + stale data due to replication lag. + +- **Data Distribution-** Distributing data across regions by using replica sets + or sharded clusters is an effective approach when your data is geographically + oriented. For example, if you have data that is only read from the EU, and other + data that is only read in North America, you can create shards that distribute + that data appropriately. + +Network Configuration +--------------------- +You can further reduce latency using the following network connectivity options: + + - **Private Endpoints:** + :atlas:`Private endpoints ` + establish direct and secure connections between your application's virtual + network and your |service| cluster, potentially reducing network hops and + improving latency. + +- **VPC Peering:** + Configure :atlas:`VPC peering ` + in your replica sets to allow applications to connect to peered regions. + +Data Modeling and Query Optimization +------------------------------------ + +The speed at which your application accesses data contributes to latency. +Good :manual:`data modeling ` and +:atlas:`query optimization ` can +improve data access speeds. For example, you can: + +- **Reduce Document Size:** Consider shortening field names and value lengths to + decrease the amount of data transferred over the network. + +- **Optimize Query Patterns:** Use indexes effectively to minimize the amount + of data that needs to be read across regions. + +Monitoring and Testing Latency +------------------------------ + +{+service+} provides the +:atlas:`Real-Time Performance Panel (RTPP) ` to +observe latency metrics for different regions. You can also implement +application-level monitoring to track end-to-end latency to and from the +application. Before final production deployment, we suggest conducting performance +testing under various multi-region scenarios to identify and address latency +bottlenecks. diff --git a/source/deployment-paradigms/multi-cloud.txt b/source/deployment-paradigms/multi-cloud.txt new file mode 100644 index 00000000..863f041a --- /dev/null +++ b/source/deployment-paradigms/multi-cloud.txt @@ -0,0 +1,77 @@ +.. _arch-center-paradigms-multi-cloud: + +=============================== +Multi-Cloud Deployment Paradigm +=============================== + +.. default-domain:: mongodb + +.. contents:: On this page + :local: + :backlinks: none + :depth: 2 + :class: onecol + +Multi-cloud {+service+} deployments are a special case of multi-region deployment +in which you set up {+cluster+} nodes across +multiple geographic regions *and* multiple cloud providers. Multi-cloud +deployments enhance protection in the case of a regional outage or +cloud provider outage by automatically rerouting traffic to a +node in another region for continuous availability and a +smooth user experience. Multi-cloud deployments can also protect +against vendor lock-in, enhance performance, and help meet +compliance requirements for :ref:`data sovereignty +`. + +{+service+} supports multi-cloud deployment across any combination of +|aws|, |azure|, and {+gcp+}. + +To learn how to configure multi-cloud deployments and learn about the +different types of nodes you can add, see +:atlas:`Configure High Availability and Workload Isolation +` in the {+service+} +documentation. + +Use Cases for Multi-Cloud Deployments +------------------------------------- + +The following image shows a multi-region, multi-cloud {+service+} +deployment for regions that support availability zones. Note that this differs +from a multi-region deployment only in that multiple cloud providers are +used. + +.. figure:: /includes/images/multi-cloud.png + :figwidth: 750px + :alt: An image showing a five-node deployment spread across three regions and two cloud providers. Each region contains one zone per node. + +A multi-region, multi-cloud deployment may be best for you if you have +the following requirements: + +- Have critical operations that require the highest level of availability. +- Want to move your data to different cloud providers in response to + provider changes (such as pricing models) +- Have governing regulations that dictate which regions your data can be stored in, + and by which providers. For example, for an application that + requires data storage in Europe, you can deploy a multi-region, + multi-cloud deployment to three regions within the EU (such as + ``eu-west-1`` and ``eu-west-2`` for |aws|, and ``uksouth`` for |azure|). + This ensures data sovereignty since all regions are within the EU, while + offering high availability if there's a regional outage or a cloud + provider outage that affects the primary node. + +.. _arch-center-multi-cloud-rec-summary +Considerations and Recommendations +---------------------------------- + +Multi-cloud deployments have the same +:ref:`considerations ` and +:ref:`recommendations ` as other +multi-region deployments. + +.. important:: + + There may be other considerations + for multi-cloud deployments that are not covered in the + {+atlas-arch-center+}, such as cloud-provider-aware application-side + settings. Contact {+ps+} team to create a strategy for your + {+service+} multi-cloud deployments that covers these considerations. diff --git a/source/deployment-paradigms/multi-region.txt b/source/deployment-paradigms/multi-region.txt new file mode 100644 index 00000000..3db494a4 --- /dev/null +++ b/source/deployment-paradigms/multi-region.txt @@ -0,0 +1,253 @@ +.. _arch-center-paradigms-multi-region: + +================================ +Multi-Region Deployment Paradigm +================================ + +.. default-domain:: mongodb + +.. contents:: On this page + :local: + :backlinks: none + :depth: 2 + :class: onecol + +Multi-region {+service+} deployments set up {+cluster+} nodes across +multiple regions (as defined by the cloud providers). Multi-region +deployments enhance protection in the case of a :ref:`regional outage +` by automatically rerouting traffic to a +node in another region for continuous availability and a +smooth user experience. Multi-region deployments can also enhance +performance and can help meet compliance requirements for :ref:`data sovereignty +`, such as the EU's General Data +Protection Regulation (GDPR) law. + +A multi-region deployment might have multiple regions within the same geography +(a large area like a continent or country), a single region in each of several +geographies, or multiple regions in multiple geographies. + +Multi-region deployments can exist with a single cloud provider or +multiple cloud providers. To learn about multi-cloud deployments, see +:ref:`arch-center-paradigms-multi-cloud`. + +To learn how to configure multi-region deployments and learn about the +different types of nodes you can add, see +:atlas:`Configure High Availability and Workload Isolation +` in the {+service+} +documentation. + +.. note:: + + Multi-region deployments are available only for ``M10`` dedicated + {+clusters+} and larger. + +Multi-Region Deployment Strategies +---------------------------------- +Consider the 3 deployments in the following image: + +.. figure:: /includes/images/multi-region-types.png + :figwidth: 750px + :alt: An image showing three types of multi-region deployments + +The first example shows a deployment to multiple regions in the same geography. +This is a good solution if you have an application that has users primarily +located in a single geography (in this case, the U.S.). You create a multi-region +deployment in three regions within the U.S. This ensures low latency, +since all regions are within the same geography, while also offering high +availability if there's a regional outage on one of the nodes +(for example, if ``us-east-1`` goes down). + +The second example shows a deployment to a single region in each of multiple +geographies. This is a good option if your application requires +high availability for users in multiple geographies but you are willing to trade +slightly higher latency for lower costs. You create a multi-region +deployment with a region located in the U.S. Europe, and Asia. This is also a +good strategy for complying with local regulations like GDPR. + +The most complex example of a multi-region deployment has multiple regions in +multiple geographies, ensures the highest level of availability with a single +provider. If your application requires the very highest level of availability +and lowest latency, consider a :ref:`arch-center-paradigms-multi-cloud`. + +Within these broad deployment strategies, there are additional configurations +to meet your more specific needs. The following use cases outline specific +examples that might meet your own needs. + +.. _arch-center-multi-region-tier-1: + +5-Node, 3-Region Architecture +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If you have a mission-critical application that requires both high availability +and low latency, you need to deploy your MongoDB nodes and your application in +multiple regions. Depending on your users' locations, you might need to deploy +to regions in several different geographies. + +To achieve this level of availability and latency, we recommend an architecture +that consists of a minimum of 5 nodes spread over at least 3 regions. This +architecture provides high availability in at least 2 regions. If a node goes +down, the traffic can be served from the same region to ensure the same latency +on the primary region. + +.. figure:: /includes/images/multi-region-5+3.png + :figwidth: 750px + :alt: An image showing three types of multi-region deployments + +Notes and Considerations +```````````````````````` + +- Deploy your application servers in each region that MongoDB nodes are + deployed. This gives you the ability to route read requests to the in-region + node, which reduces the response time to your users, and offloads requests + from the primary node. Write requests are always directed to the primary. + +- Use :atlas:`private endpoints ` + to connect to the cluster, and + :atlas:`VPC peering ` + between your application server VPCs. VPC peering ensures that if a network connection + is broken or {+service+} in that region goes down, the application tier can + still route to the primary node, first over the VPC peering, and then over the + private endpoint. + +- This architecture has the highest cost due to network traffic between regions + and having 5 or more data-bearing nodes. + +- This architecture provides the highest resiliency. There are no interruptions + during {+service+} operations (like an automated upgrade), and your application + can sustain a full regional failure with no interruption and manual + intervention required. + +.. _arch-center-multi-region-5+2: + +5-Node, 2-Region Architecture +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If you have an application that can experience limited downtime or maintenance +windows without significant revenue impact, you can create an architecture that +provides that level of availability by using 5 nodes in 2 regions. You have 3 +electable nodes spread across the 2 regions, and also have 2 read-only nodes in +the secondary region. The electable nodes are in the **majority region**, while +the region with fewer electable nodes is the **minority region**. + +This is also an appropriate architecture if you need high availability and low +latency in a geography that only has 2 regions. + +This is a good option for: + +- Customers with only 2 approved regions. +- Meeting governance compliance about data storage. +- Applications where read operations need high uptime, but write operations can + experience some outage. + +.. figure:: /includes/images/multi-region-5+2.png + :figwidth: 750px + :alt: An image showing three types of multi-region deployments + +Notes and Considerations +```````````````````````` +This architecture provides increased protection +against data loss, even in full regional outage, because the system remains +available in read-only mode in the secondary region if the primary region +is lost. + +While this architecture is less expensive than the 5-Node, 3-Region solution +provided above, the reduced cost comes with some caveats: + +- If the majority region is lost, the minority region is not a fully-functional + cluster; it does not have a primary and can only accept reads but not writes. + + To turn it into a functional cluster again, an administrator needs to + reconfigure the 2 read-only nodes to electable nodes. However, there is a + possibility of data loss when the unavailable members are online again. If + your MongoDB process didn't replicate the write operations to the node that + becomes the new primary, then the recovered replica set rolls back these + writes. To learn more, refer to + :atlas:`Reconfigure a Replica Set During a Regional Outage `. + + .. note:: + + If the *minority* region is lost, no action is required, since the majority + region remains a fully functional cluster. + +- In sharded clusters, if your MongoDB process didn't replicate chunk migrations, + the data inconsistency might cause orphaned chunks. + +Lower-Cost Variation +```````````````````` +For further cost savings, you can design this architecture without the 2 read-only +nodes. In addition to the caveats listed above, data size has a significant impact +on your decision since the data needs to be synchronized to secondaries +whenever adding new nodes to the cluster. For example, 1 TB of data averages 1 +hour of recovery and synchronization time. + +.. _arch-center-multi-region-3+3: + +3-Node, 3-Region Architecture +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If you have an application that can experience some downtime, and +the cost of deployment is more of a factor than availability or latency, you +can create an architecture that is less expensive by using 1 node in each of +3 regions. You have a electable node in each of the regions, which means if the +primary node is unavailable, there is latency as another node is elected primary +and writes are routed to that region. However, you still have availability if +any region is unavailable. + +.. figure:: /includes/images/multi-region-3+3.png + :figwidth: 750px + :alt: An image showing three types of multi-region deployments + +.. _arch-center-global-deployments: + +.. _arch-center-global-deployments: + +Global Deployments +------------------ + +Global {+service+} deployments are the most complex multi-region deployment +paradigm, and therefore require very careful planning. In almost all cases, +a :ref:`arch-center-paradigms-multi-region` (or its subset, a +:ref:`arch-center-paradigms-multi-cloud`) can fulfill your needs. + +You might consider a global deployment strategy if: + +- You need a single global connection string. +- You need to perform global aggregations across all users. +- You need the ability to read/write for all users from everywhere + in one logical cluster, while also having regional reads/writes. + +.. note:: + + The complexity of global deployments results in many opinions on best + practices. The {+atlas-arch-center+} does not currently cover recommendations + specific to global deployments. Contact {+ps+} team to discuss your + specific requirements and to design a {+service+} global deployment + strategy. + +Data Sovereignty and High Availability Considerations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For compliance with data residency laws, data can be partitioned to +reside in specific regions, ensuring adherence to local regulations. +However, deploying to a single region sacrifices high availability if +there is a regional outage. + +You can configure a multi-region deployment for both high availability +and data sovereignty. For example, for an application deployed with |aws| +that requires data storage in Europe, you can deploy a multi-region deployment +to three regions within the EU (such as ``eu-west-1``, ``eu-west-2``, +and ``eu-west-3``). This ensures data sovereignty since all regions are within +the EU, while offering high availability if there's a regional outage that +affects one of the nodes. + +.. _arch-center-multi-region-rec-summary: + +Recommendations for Multi-Region Deployments +--------------------------------------------- + +.. include:: /includes/rec-list.rst + +.. toctree:: + :titlesonly: + + Latency Reduction Strategies diff --git a/source/deployment-paradigms/single-region.txt b/source/deployment-paradigms/single-region.txt new file mode 100644 index 00000000..50ee594a --- /dev/null +++ b/source/deployment-paradigms/single-region.txt @@ -0,0 +1,86 @@ +.. _arch-center-paradigms-single: + +================================= +Single-Region Deployment Paradigm +================================= + +.. default-domain:: mongodb + +.. contents:: On this page + :local: + :backlinks: none + :depth: 2 + :class: onecol + +Single-region {+service+} deployments set up {+cluster+} nodes within +a single region of one cloud provider. Single-region {+service+} +deployments are supported on all {+cluster+} tiers. They provide +the least expensive option for applications and are a good choice when +cost is a factor, and the low risk of a regional failure is acceptable. + +All {+service+} providers have regions that support availability zones within +a region, which adds protection in the case of a single zone outage. The provider +automatically reroutes traffic to a node in another availability zone within the +region to ensure availability. This is similar to multi-regional deployments but +on a smaller scale. + +The following diagram shows a single-region {+service+} +deployment for a region that has 3 availability zones: + +.. figure:: /includes/images/single-region.png + :figwidth: 600px + :alt: An image showing a three-zone deployment in a single region. + +To learn how to configure a single-region deployment, see +:atlas:`Create a Cluster ` in the +{+service+} documentation. + +Use Cases for Single-Region Deployments +--------------------------------------- + +A single-region deployment may be best for you if you have +the following requirements: + +- You want to use one cloud provider. +- You don't need to deploy to more than one region. +- Your application requires low latency *and* has a majority of users + in one geographic location. + +For example, for an application deployed with |aws| with +users primarily located in the western US, you can deploy a +single-region deployment to ``us-west-2`` (a region that supports availability +zones). This ensures low latency since all nodes are within the western US, while +offering availability if there's a zonal outage that affects the +primary node. + +If your application requires low latency and cross-region or cross-provider high +availability, consider a :ref:`arch-center-paradigms-multi-region` or +:ref:`arch-center-paradigms-multi-cloud`, respectively. + +If your application requires data sovereignty and +cross-region or cross-provider high +availability, consider a :ref:`arch-center-paradigms-multi-region` or +:ref:`arch-center-paradigms-multi-cloud`, respectively. + +Considerations for Single-Region Deployments +-------------------------------------------- + +Single-region deployments ensure a minimum level of availability. High +availability depends on the deployment of nodes across +regions as well as the number, distribution, and priority order of +nodes. To learn more about recommended +{+cluster+} topologies for high availability, see +:ref:`arch-center-high-availability`. + +For more considerations, see +:atlas:`Considerations +` in the +{+service+} documentation. + +.. _arch-center-single-region-rec-summary: + +Recommendations for Single-Region Deployments +--------------------------------------------- + +.. include:: /includes/rec-list.rst + diff --git a/source/disaster-recovery.txt b/source/disaster-recovery.txt index 35d4701b..29a5d662 100644 --- a/source/disaster-recovery.txt +++ b/source/disaster-recovery.txt @@ -55,6 +55,26 @@ It is imperative that you test the plans in this section regularly (ideally quar Some disaster recovery testing might require actions that cannot be performed by EDM users. In these cases, open a support case for the purpose of performing artificial outages at least a week in advance of when you plan on running a test exercise. +.. collapsible:: + :heading: Single-Region Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments in a single region + :expanded: true + + Recommendations here + +.. collapsible:: + :heading: Multi-Region and Multi-Cloud Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments across multiple regions or multiple cloud providers + :expanded: true + + Recommendations here + +All Deployment Paradigm Recommendations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following recommendations apply to all :ref:`deployment paradigms +`. + This section covers the following disaster recovery procedures: - :ref:`arch-center-single-node-outage` @@ -70,7 +90,7 @@ This section covers the following disaster recovery procedures: .. _arch-center-single-node-outage: Single Node Outage -~~~~~~~~~~~~~~~~~~ +`````````````````` If a single node in your replica set fails due to a regional outage, your deployment should still be available, assuming you have followed best practices. If you are reading from secondaries, you might experience degraded performance because you have one less node to read from. @@ -81,9 +101,12 @@ You can test a primary node outage in {+service+} using the .. _arch-center-regional-outage: Regional Outage -~~~~~~~~~~~~~~~ +``````````````` -If a single region outage or multi-region outage degrades the state of your {+cluster+}, follow these steps: +Multi-region clusters reroute traffic to another region in the event of a +regional outage to ensure high availability. To learn more, see :ref:``. +If a single region outage or multi-region outage degrades the state of your +{+cluster+}, follow these steps: .. procedure:: :style: normal @@ -128,7 +151,15 @@ You can test a region outage in {+service+} using the .. _arch-center-provider-outage: Cloud Provider Outage -~~~~~~~~~~~~~~~~~~~~~ +````````````````````` + +With multi-cloud clusters, you can select electable nodes across cloud providers +to maintain high availability. Should the provider in which your primary node is +deployed become unavailable, electable nodes can be converted to primary nodes +to ensure continuous operation. For example, you can create electable nodes on +{+aws+}, {+gcp+}, and {+azure+} to ensure that if one cloud provider experiences +an outage, an electable node on a separate provider can take over as your +{+cluster+}'s primary node. To learn more, see :ref:``. In the highly unlikely event that an entire cloud provider is unavailable, follow these steps to bring your deployment back online: @@ -169,7 +200,7 @@ unavailable, follow these steps to bring your deployment back online: .. _arch-center-atlas-outage: {+service+} Outage -~~~~~~~~~~~~~~~~~~ +`````````````````` In the highly unlikely event that the {+service+} Control Plane and the {+atlas-ui+} are unavailable, your {+cluster+} is still available and accessible. @@ -180,7 +211,7 @@ to investigate this further. .. _arch-center-resource-capacity: Resource Capacity Issues -~~~~~~~~~~~~~~~~~~~~~~~~ +```````````````````````` Computational resource (such as disk space, RAM, or CPU) capacity issues can result from poor planning or unexpected database traffic. @@ -213,7 +244,7 @@ amount and causes a disaster, follow these steps: .. _arch-center-resource-failure: Resource Failure -~~~~~~~~~~~~~~~~ +```````````````` .. include:: /includes/temporary-cluster-fix.rst @@ -236,7 +267,7 @@ unavailable, follow these steps: .. _arch-center-data-deletion: Deletion of Production Data -~~~~~~~~~~~~~~~~~~~~~~~~~~~ +``````````````````````````` Production data might be accidentally deleted due to human error or a bug in the application built on top of the database. @@ -270,7 +301,7 @@ If the contents of a collection or database have been deleted, follow these step .. _arch-center-driver-failure: Driver Failure -~~~~~~~~~~~~~~ +`````````````` If a driver fails, follow these steps: @@ -302,7 +333,7 @@ If a driver fails, follow these steps: .. _arch-center-data-corruption: Data Corruption -~~~~~~~~~~~~~~~ +``````````````` .. include:: /includes/temporary-cluster-fix.rst diff --git a/source/getting-started.txt b/source/getting-started.txt index d51c9d70..2cb38cdd 100644 --- a/source/getting-started.txt +++ b/source/getting-started.txt @@ -21,6 +21,14 @@ Use the following resources to get started with |service| and the Learn how to define a landing zone for your organization. + .. card:: + :headline: Deployment Paradigms + :url: https://mongodb.com/docs/atlas/architecture/deployment-paradigms/ + :icon: cloud_global + :icon-alt: icon + + Choose your deployment paradigm, such as a multi-region, global, or hybrid deployment. + .. card:: :headline: Orgs, Projects, and Clusters :url: https://mongodb.com/docs/atlas/architecture/hierarchy/ @@ -28,9 +36,26 @@ Use the following resources to get started with |service| and the :icon-alt: Atlas global cluster icon Set up the foundational {+service+} components. + + .. card:: + :headline: Migration + :url: https://mongodb.com/docs/atlas/architecture/migration/ + + Make a plan to move your existing data into {+service+}. + + .. card:: + :headline: Operational Readiness Checklist + :url: https://mongodb.com/docs/atlas/architecture/operational-readiness-checklist/ + :icon: general_action_audit + :icon-alt: icon + + Use a checklist to help you prepare for a deployment. .. toctree:: :titlesonly: Landing Zone Design + Deployment Paradigms Orgs, Projects, and Clusters + Migration + Operational Readiness Checklist diff --git a/source/global.txt b/source/global.txt new file mode 100644 index 00000000..974fa989 --- /dev/null +++ b/source/global.txt @@ -0,0 +1,37 @@ +.. _arch-center-paradigms-global: + +========================== +Global Deployment Paradigm +========================== + +.. default-domain:: mongodb + +.. contents:: On this page + :local: + :backlinks: none + :depth: 2 + :class: onecol + +Global {+service+} deployments are the most complex deployment paradigms, and +therefore require very careful planning. In almost all cases, a +a :ref:`arch-center-paradigms-multi-region` (or its subset, a +:ref:`arch-center-paradigms-multi-cloud`) will fulfill your needs. + +The following are a few reasons why you might consider a global deployment +strategy: + +- You need a single global connection string +- You need to perform global aggregations across all customers +- You need the ability to read/write for all customers from everywhere + in one logical cluster, while also having regional reads/writes + +Recommendations for Global Deployments +-------------------------------------- +The complexity of global deployments results in many opinions on best +practices. This makes them difficult to adopt as a general solution. + +The {+atlas-arch-center+} does not currently cover recommendations +specific to global deployments. Contact {+ps+} team to discuss your +specific requirements and designing a {+service+} global deployment +strategy. + diff --git a/source/hierarchy.txt b/source/hierarchy.txt index 667ea5cc..f036beb2 100644 --- a/source/hierarchy.txt +++ b/source/hierarchy.txt @@ -94,8 +94,30 @@ security settings and governance for your {+service+} enterprise estate: Recommendations for {+service+} Orgs, Projects, and {+Clusters+} ---------------------------------------------------------------- +.. collapsible:: + :heading: Single-Region Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments in a single region + :expanded: true + + Recommendations here + +.. _arch-center-orgs-projects-clusters-recs-multi: + +.. collapsible:: + :heading: Multi-Region and Multi-Cloud Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments across multiple regions or multiple cloud providers + :expanded: true + + Recommendations here + +All Deployment Paradigm Recommendations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following recommendations apply to all :ref:`deployment paradigms +`. + Development, Testing, Staging, and Production Environments -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +`````````````````````````````````````````````````````````` We recommend that you use the following four environments to isolate your sandbox and test projects and {+clusters+} from your application @@ -128,7 +150,7 @@ projects and {+clusters+}: end users. Local {+service+} Deployments -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +````````````````````````````` For development and testing purposes, developers can use the {+atlas-cli+} to :atlascli:`create a local {+service+} deployment`. @@ -143,7 +165,7 @@ local {+service+} deployments in secure, reliable, and portable test environments. To learn more, see :atlascli:`Create a Local {+service+} Deployment with Docker `. Org and Project Hierarchies -~~~~~~~~~~~~~~~~~~~~~~~~~~~ +``````````````````````````` Generally, we recommend a paying organization that is managed centrally, and one organization for each {+BU+} or department that is @@ -162,7 +184,7 @@ increase the limits. To learn more, see .. _project-hierarchy-1: Recommended Hierarchy -````````````````````` +##################### Consider the following hierarchy, which creates fewer |service| organizations, if you have common teams and permissions across the {+BU+} and less than the raiseable limit of 250 projects per organization. @@ -175,7 +197,7 @@ Consider the following hierarchy, which creates fewer |service| organizations, i .. _project-hierarchy-2: Recommended Hierarchy 2: Decentralized Business Units/Departments -````````````````````````````````````````````````````````````````` +################################################################# Consider the following hierarchy if your organization is highly decentralized without a centralized function to serve as the contract @@ -195,7 +217,7 @@ this hierarchy. .. _deployment-hierarchy: {+Cluster+} Hierarchy -~~~~~~~~~~~~~~~~~~~~~ +````````````````````` To maintain isolation between environments, we recommend that you deploy each cluster within its own project, as shown in the following diagram. @@ -203,7 +225,9 @@ This allows administrators to maintain different project configurations between and uphold the principle of least privilege, which states that users should only be granted the least level of access necessary for their role. -However, this hierarchy may make it more complicated to share project-level configurations such as private endpoints and |cmk|\s across clusters. + +However, this hierarchy might make it more complicated to share project-level configurations such as private endpoints and |cmk|\s across clusters. + To learn more, see :ref:`hierarchy-multiple-clusters`. .. figure:: /includes/images/deployment-hierarchy.svg @@ -238,7 +262,7 @@ Deploy multiple {+clusters+} within the same project only if both of the followi :lightbox: Resource Tagging -~~~~~~~~~~~~~~~~ +```````````````` We recommend that you :atlas:`tag {+clusters+} or projects ` with the following details to enable easy parsing for reporting and integrations: @@ -258,7 +282,7 @@ To learn more about parsing billing data using tags, see .. _arch-center-cluster-size-guide: {+service+} {+Cluster+} Size Guide -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +`````````````````````````````````` In a dedicated deployment ({+cluster+} size ``M10``\+), {+service+} allocates resources exclusively. We recommend dedicated deployments @@ -292,9 +316,13 @@ Use the following {+cluster+} size guide to select a {+cluster+} tier that ensur This table displays the default storage and performance capabilities for each {+cluster+} tier, as well as whether or not the {+cluster+} tier is suitable for staging and production environments. +The {+cluster+} size guide also includes expected values for a {+cluster+}'s +total data size and peak IOPS, based on median {+cluster+} sizes and 98th percentile IOPS analytics from existing enterprise account {+clusters+} +in each {+cluster+} tier. + .. list-table:: :header-rows: 1 - :widths: 10 10 20 20 10 10 10 10 + :widths: 10 10 10 10 10 10 10 10 10 10 * - T-Shirt Size - Cluster Tier @@ -303,6 +331,9 @@ whether or not the {+cluster+} tier is suitable for staging and production envir - CPUs (#) - Default RAM - Default IOPS + + - Expected Total Data Size + - Expected Peak IOPS - Suitable For * - Small @@ -312,6 +343,8 @@ whether or not the {+cluster+} tier is suitable for staging and production envir - 2 - 2 GB - 1000 + - 1 GB to 10 GB + - 120 - Dev/Test only * - Med @@ -321,6 +354,8 @@ whether or not the {+cluster+} tier is suitable for staging and production envir - 2 - 8 GB - 3000 + - 20 GB to 50 GB + - 3000 - Prod * - Large @@ -330,6 +365,8 @@ whether or not the {+cluster+} tier is suitable for staging and production envir - 16 - 32 GB - 3000 + - 360 GB to 420 GB + - 3000 - Prod * - X-Large @@ -339,10 +376,16 @@ whether or not the {+cluster+} tier is suitable for staging and production envir - 32 - 128 GB - 3000 + - 1200 GB to 1750 GB + - 6000 - Prod .. [1] ``M10`` is a shared CPU tier. For highly-regulated industries or sensitive data, your minimum and smallest starting tier should be ``M30``. +For example, consider a fictional fintech company, MongoFinance, that must store a total of 400 GB of processed data. +At peak activity, MongoFinance employees and customers perform up to 3000 reads or writes to MongoFinance databases per second. +MongoFinance's storage and performance requirements are best satisfied by a large, or ``M50``, {+cluster+} tier. + To learn more about {+cluster+} tiers and the regions that support them, see the {+service+} documentation for each cloud provider: @@ -415,7 +458,7 @@ These examples also apply other recommended configurations, including: Run the following command for each application and environment pair. Change the IDs and names to use your values: - .. include:: /includes/examples/cli-example-create-projects.rst + .. include:: /includes/examples/cli/cli-example-create-projects.rst For more configuration options and info about this example, see :ref:`atlas-projects-create`. @@ -446,7 +489,7 @@ These examples also apply other recommended configurations, including: For your development and testing environments, run the following command for each project that you created. Change the IDs and names to use your values: - .. include:: /includes/examples/cli-example-create-clusters-devtest.rst + .. include:: /includes/examples/cli/dev-test/cli-example-create-clusters-devtest.rst .. tab:: Staging and Prod Environments :tabid: stagingprod @@ -454,13 +497,13 @@ These examples also apply other recommended configurations, including: For your staging and production environments, create the following ``cluster.json`` file for each project that you created. Change the IDs and names to use your values: - .. include:: /includes/examples/cli-json-example-create-clusters.rst + .. include:: /includes/examples/cli/cli-json-example-create-clusters.rst After you create the ``cluster.json`` file, run the following command for each project that you created. The command uses the ``cluster.json`` file to create a cluster. - .. include:: /includes/examples/cli-example-create-clusters-stagingprod.rst + .. include:: /includes/examples/cli/staging-prod/cli-example-create-clusters-stagingprod.rst For more configuration options and info about this example, see :ref:`atlas-clusters-create`. @@ -502,22 +545,22 @@ These examples also apply other recommended configurations, including: main.tf ``````` - .. include:: /includes/examples/tf-example-main-devtest.rst + .. include:: /includes/examples/terraform/dev-test/tf-example-main-devtest.rst variables.tf ```````````` - .. include:: /includes/examples/tf-example-variables.rst + .. include:: /includes/examples/terraform/tf-example-variables.rst terraform.tfvars ```````````````` - .. include:: /includes/examples/tf-example-tfvars-devtest.rst + .. include:: /includes/examples/terraform/dev-test/tf-example-tfvars-devtest.rst provider.tf ``````````` - .. include:: /includes/examples/tf-example-provider.rst + .. include:: /includes/examples/terraform/tf-example-provider.rst .. tab:: Staging and Prod Environments :tabid: stagingprod @@ -530,22 +573,22 @@ These examples also apply other recommended configurations, including: main.tf ``````` - .. include:: /includes/examples/tf-example-main-stagingprod.rst + .. include:: /includes/examples/terraform/staging-prod/tf-example-main-stagingprod.rst variables.tf ```````````` - .. include:: /includes/examples/tf-example-variables.rst + .. include:: /includes/examples/terraform/tf-example-variables.rst terraform.tfvars ```````````````` - .. include:: /includes/examples/tf-example-tfvars-stagingprod.rst + .. include:: /includes/examples/terraform/staging-prod/tf-example-tfvars-stagingprod.rst provider.tf ``````````` - .. include:: /includes/examples/tf-example-provider.rst + .. include:: /includes/examples/terraform/tf-example-provider.rst For more configuration options and info about this example, see |service-terraform| and the `MongoDB Terraform Blog Post diff --git a/source/high-availability.txt b/source/high-availability.txt index 60810099..ffdfb105 100644 --- a/source/high-availability.txt +++ b/source/high-availability.txt @@ -14,7 +14,7 @@ Guidance for {+service+} High Availability Consult this page to plan the appropriate cluster configuration that optimizes your availability and performance while aligning with your enterprise's -cost controls and access needs. +cost controls and access needs. Features for {+service+} High Availability ------------------------------------------ @@ -51,29 +51,92 @@ Recommendations for {+service+} High Availability .. _arch-center-deployment-topologies: -Recommended Deployment Topology for Single Region -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Recommended Deployment Topologies +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Single Region, 3 Node Replica Set / Shard (Primary Secondary Secondary) +Single-Region, 3 Node Replica Set / Shard (Primary Secondary Secondary) ``````````````````````````````````````````````````````````````````````` .. figure:: /includes/images/highavailabilityPSS.svg - :figwidth: 750px + :figwidth: 750px :alt: A topology with a single region with 3 nodes: one primary and two secondaries. -This topology is appropriate if low latency is required but high availability requirements -are limited to a single region. This topology can tolerate any 1 node failure and easily satisfy majority write -concern within region secondaries. This will maintain a primary in the preferred region upon any node -failure, limits cost, and is the least complicated from an application architecture perspective. -The reads and writes from secondaries in this topology encounter low latency in your preferred region. +This :ref:`single-region ` topology is +appropriate if low latency is required but high availability +requirements are limited to a single region. This topology can tolerate +any one-node failure and easily satisfy majority write +concern within region secondaries. This will maintain a primary in the +preferred region upon any node failure, limits cost, and is the least +complicated from an application architecture perspective. +The reads and writes from secondaries in this topology encounter low +latency in your preferred region. This topology, however, can't tolerate a regional outage. +3-Region, 3 Node Replica Set / Shard (Primary - Secondary - Secondary) +`````````````````````````````````````````````````````````````````````` + +.. figure:: /includes/images/highavailabilityP-S-S.svg + :figwidth: 750px + :alt: A topology with a three regions. There is one primary node in the first region, one secondary node in the second region, and one secondary node in the third region. + +This topology is the standard :ref:`multi-region +` topology where high availability +can be provided to tolerate a regional outage. This topology can +tolerate any one-node failure, any one-region outage, and is the least +expensive multi-region topology. + +If the application requires high durability and the app server code has the write majority option set +in the driver, this topology will experience higher latency for writes as they need to replicate to a second region. +Additionally, any reads off the secondary always exist in a different region than the preferred primary region +and will require a different app server architecture. Furthermore, any failover event will occur to a different region, and your +application architecture must adjust as a result. + +3-Region, 5 Node Replica Set / Shard (Primary Secondary - Secondary Secondary - Secondary) +`````````````````````````````````````````````````````````````````````````````````````````` + +.. figure:: /includes/images/highavailabilityPS-SS-S.svg + :figwidth: 750px + :alt: A topology with a three regions. There is one primary node and one secondary node in the first region, two secondary nodes in the second region, and one secondary node in the third region. + +This topology is the preferred topology that balances high +availability, performance, and cost across :ref:`multiple regions +`. This topology can tolerate any 2 +nodes' failure, tolerate primary node failure while keeping the new +primary in the preferred region, and tolerate any one-region outage. + +If the application requires high durability and the app server code has +the write majority option set in the driver, this topology will +experience higher latency for writes as they need to replicate to a +second region. In order to accommodate a regional outage, the +application tier must be configured to also fail over and shift work to +the elected primary. + .. _arch-center-ha-configurations: Recommended Configurations for High Availability and Recovery ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. collapsible:: + :heading: Single-Region Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments in a single region + :expanded: true + + Recommendations here + +.. collapsible:: + :heading: Multi-Region and Multi-Cloud Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments across multiple regions or multiple cloud providers + :expanded: true + + Recommendations here + +All Deployment Paradigm Recommendations +``````````````````````````````````````` + +The following recommendations apply to all :ref:`deployment paradigms +`. + Use the following recommendations to configure your {+service+} deployments and backups for high availability and to expedite recovery from disasters. @@ -166,7 +229,7 @@ Recommended Google Cloud Regions .. include:: /includes/gcp-recommended-regions.rst Use ``mongos`` Redundancy for Sharded {+Clusters+} -``````````````````````````````````````````````````` +++++++++++++++++++++++++++++++++++++++++++++++++++ When a client connects to a sharded {+cluster+}, we recommend that you include multiple :manual:`mongos ` processes, separated by commas, in the connection URI. To learn more, @@ -190,7 +253,7 @@ configuration. .. _arch-center-majority-write-concern: Use ``majority`` Write Concern -`````````````````````````````` +++++++++++++++++++++++++++++++ MongoDB allows you to specify the level of acknowledgment requested for write operations by using :manual:`write concern @@ -213,7 +276,8 @@ Using ``majority`` will allow write operations to continue when they persist dat maintaining data redundancy as well as write continuity. Consider Backup Configuration -````````````````````````````` ++++++++++++++++++++++++++++++ + Frequent data backups is critical for business continuity and disaster recovery. Frequent backups ensure that data loss and downtime is minimal if a disaster or cyber attack disrupts normal operations. @@ -234,11 +298,11 @@ We recommend that you: For more backup recommendations, see :ref:`arch-center-backups`. Plan Your Resource Utilization -`````````````````````````````` +++++++++++++++++++++++++++++++ To avoid resource capacity issues, we recommend that you monitor resource utilization and hold regular capacity planning sessions. -MongoDB Professional Services offers these sessions. +{+ps+} offers these sessions. Over-utilized clusters could fail causing a disaster. Scale up clusters to higher tiers if your utilization is regularly alerting at a steady state, @@ -325,7 +389,7 @@ These examples also apply other recommended configurations, including: For your development and testing environments, run the following command for each project. Change the IDs and names to use your values: - .. include:: /includes/examples/cli-example-create-clusters-devtest.rst + .. include:: /includes/examples/cli/dev-test/cli-example-create-clusters-devtest.rst .. tab:: Staging and Prod Environments :tabid: stagingprod @@ -333,13 +397,13 @@ These examples also apply other recommended configurations, including: For your staging and production environments, create the following ``cluster.json`` file for each project. Change the IDs and names to use your values: - .. include:: /includes/examples/cli-json-example-create-clusters.rst + .. include:: /includes/examples/cli/cli-json-example-create-clusters.rst After you create the ``cluster.json`` file, run the following command for each project. The command uses the ``cluster.json`` file to create a cluster. - .. include:: /includes/examples/cli-example-create-clusters-stagingprod.rst + .. include:: /includes/examples/cli/staging-prod/cli-example-create-clusters-stagingprod.rst For more configuration options and info about this example, see :ref:`atlas-clusters-create`. @@ -381,22 +445,22 @@ These examples also apply other recommended configurations, including: main.tf ``````` - .. include:: /includes/examples/tf-example-main-devtest.rst + .. include:: /includes/examples/terraform/dev-test/tf-example-main-devtest.rst variables.tf ```````````` - .. include:: /includes/examples/tf-example-variables.rst + .. include:: /includes/examples/terraform/tf-example-variables.rst terraform.tfvars ```````````````` - .. include:: /includes/examples/tf-example-tfvars-devtest.rst + .. include:: /includes/examples/terraform/dev-test/tf-example-tfvars-devtest.rst provider.tf ``````````` - .. include:: /includes/examples/tf-example-provider.rst + .. include:: /includes/examples/terraform/tf-example-provider.rst After you create the files, navigate to each application and environment pair's directory and run the following command to initialize Terraform: @@ -432,22 +496,22 @@ These examples also apply other recommended configurations, including: main.tf ``````` - .. include:: /includes/examples/tf-example-main-stagingprod.rst + .. include:: /includes/examples/terraform/staging-prod/tf-example-main-stagingprod.rst variables.tf ```````````` - .. include:: /includes/examples/tf-example-variables.rst + .. include:: /includes/examples/terraform/tf-example-variables.rst terraform.tfvars ```````````````` - .. include:: /includes/examples/tf-example-tfvars-stagingprod.rst + .. include:: /includes/examples/terraform/staging-prod/tf-example-tfvars-stagingprod.rst provider.tf ``````````` - .. include:: /includes/examples/tf-example-provider.rst + .. include:: /includes/examples/terraform/tf-example-provider.rst After you create the files, navigate to each application and environment pair's directory and run the following command to initialize Terraform: @@ -475,4 +539,3 @@ These examples also apply other recommended configurations, including: For more configuration options and info about this example, see |service-terraform| and the `MongoDB Terraform Blog Post `__. - diff --git a/source/includes/billing-optimizations.rst b/source/includes/billing-optimizations.rst index 083207f5..c3f7c948 100644 --- a/source/includes/billing-optimizations.rst +++ b/source/includes/billing-optimizations.rst @@ -1,7 +1,7 @@ Consider these strategies for optimizing your |service| costs. -Underutilized {+Clusters+} --------------------------- +Scale Down Underutilized {+Clusters+} +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Enable :ref:`auto-scaling ` on your {+cluster+} tier to match your usage and prevent over-provisioning. @@ -47,9 +47,8 @@ Underutilized {+Clusters+} `__ by setting the ``termination_protection_enabled`` field to ``false``. - -High Backup Frequency ---------------------- +Optimize Backup Frequency +~~~~~~~~~~~~~~~~~~~~~~~~~ - :ref:`Continuous backups ` are expensive, but they give you the most safety to recover data from any point in time within the @@ -62,7 +61,7 @@ High Backup Frequency these {+clusters+} entirely for development environments. Optimize Data Transfer Patterns ---------------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Whenever possible, opt for same-provider, same-region data transfer to minimize costs. Only use inter-region or internet transfers when @@ -74,14 +73,14 @@ application — can greatly reduce data transfer costs. To learn more, see :ref:`reducing-data-transfer-costs`. Optimize Queries ----------------- +~~~~~~~~~~~~~~~~ Queries that take a long time to execute can increase resource usage, requiring higher-tier {+clusters+}. :ref:`Optimize these queries ` to reduce resource consumption and lower costs as a result. Optimize Storage ----------------- +~~~~~~~~~~~~~~~~ Use features like :ref:`online archive ` or :manual:`TTL indexes ` to move older data from more @@ -90,21 +89,21 @@ that is no longer needed. After you archive data, you can access the data through :ref:`Atlas Data Federation `. Use Cost Explorer ------------------ +~~~~~~~~~~~~~~~~~ Regularly use the :ref:`Cost Explorer ` tool to monitor spending patterns at the organization, project, {+cluster+}, and service levels. Set a frequency that works for your needs. Set Alerts ----------- +~~~~~~~~~~ Configure :ref:`billing alerts ` for key thresholds, such as when your monthly costs exceed a certain amount. For example, set an alert when costs exceed $100. This proactive approach helps you avoid surprises. Review Invoices ---------------- +~~~~~~~~~~~~~~~ Each month, review your invoice to assess the highest-cost services using the previous billing optimization suggestions. This is a recommended best practice @@ -115,4 +114,11 @@ computing costs, which are often the largest portion of your bill. You can review cloud computing costs in the :guilabel:`Summary By Service` card of any invoice within the |service| :guilabel:`Billing` section. The :guilabel:`Summary By Service` view shows the costs of all -{+clusters+} by provider, tier, and region. \ No newline at end of file +{+clusters+} by provider, tier, and region. + +Choose the Right Deployment Paradigm and Topology +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The deployment paradigm and topology you choose can change your |service| costs. + +To learn more about cost savings for different topologies, see :ref:`arch-center-high-availability`. \ No newline at end of file diff --git a/source/includes/complete-examples-go-sdk.rst b/source/includes/complete-examples-go-sdk.rst new file mode 100644 index 00000000..1ceb4850 --- /dev/null +++ b/source/includes/complete-examples-go-sdk.rst @@ -0,0 +1,6 @@ +.. cta-banner:: + :url: https://github.com/mongodb/atlas-architecture-go-sdk + :icon: Code + + See the complete Go example project `in Github `__. + diff --git a/source/includes/complete-examples-terraform.rst b/source/includes/complete-examples-terraform.rst new file mode 100644 index 00000000..05e0b4fc --- /dev/null +++ b/source/includes/complete-examples-terraform.rst @@ -0,0 +1,7 @@ +.. tip:: + + For Terraform examples that enforce our recommendations across all + pillars, see one of the following examples in Github: + + - `Staging/Prod `__ + - `Dev/Test `__ diff --git a/source/includes/complete-examples.rst b/source/includes/complete-examples.rst deleted file mode 100644 index f0f1fafb..00000000 --- a/source/includes/complete-examples.rst +++ /dev/null @@ -1,6 +0,0 @@ -.. cta-banner:: - :url: https://github.com/mongodb/docs-atlas-architecture/blob/main/source/includes/examples/tf-example-complete-staging-prod.tf - :icon: Code - - See Terraform examples to enforce our Staging/Prod recommendations across all pillars in one place `in Github `__. - diff --git a/source/includes/data-sovereignty.rst b/source/includes/data-sovereignty.rst new file mode 100644 index 00000000..212ee877 --- /dev/null +++ b/source/includes/data-sovereignty.rst @@ -0,0 +1,15 @@ +Data Sovereignty and High Availability Considerations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +For compliance with data residency laws, data can be partitioned to +reside in specific regions, ensuring adherence to local regulations. +However, deploying to a single region sacrifices high availability if +there is a regional outage. + +You can configure a multi-region for both high availability and data sovereignty. + +For example, for an application deployed with |aws| that requires data +storage in Europe, you can deploy a multi-region deployment +to three regions within the EU (such as ``eu-west-1``, +``eu-west-2``, and ``eu-west-3``). This ensures data sovereignty since all +regions are within the EU, while offering high availability if there's a +regional outage that affects one of the nodes. \ No newline at end of file diff --git a/source/includes/examples/cli-example-audit-filter-use.rst b/source/includes/examples/cli-example-audit-filter-use.rst deleted file mode 100644 index f5da29c1..00000000 --- a/source/includes/examples/cli-example-audit-filter-use.rst +++ /dev/null @@ -1,5 +0,0 @@ -.. code-block:: - :copyable: true - - atlas auditing update --auditFilter '{"atype": "authenticate", "param.db": "test"}' - diff --git a/source/includes/examples/cli-example-audit-filter.rst b/source/includes/examples/cli-example-audit-filter.rst deleted file mode 100644 index 83d6d323..00000000 --- a/source/includes/examples/cli-example-audit-filter.rst +++ /dev/null @@ -1,4 +0,0 @@ -.. code-block:: - :copyable: true - - { atype: "authenticate", "param.db": "test" } diff --git a/source/includes/examples/cli-example-audit-logs-config-file.rst b/source/includes/examples/cli-example-audit-logs-config-file.rst deleted file mode 100644 index 3b52eec7..00000000 --- a/source/includes/examples/cli-example-audit-logs-config-file.rst +++ /dev/null @@ -1,4 +0,0 @@ -.. code-block:: - :copyable: true - - atlas auditing update -f filter.json diff --git a/source/includes/examples/cli-example-audit-logs-known-users.rst b/source/includes/examples/cli-example-audit-logs-known-users.rst deleted file mode 100644 index 855ead63..00000000 --- a/source/includes/examples/cli-example-audit-logs-known-users.rst +++ /dev/null @@ -1,4 +0,0 @@ -.. code-block:: - :copyable: true - - atlas auditing update --auditFilter '{"atype": "authenticate"}' diff --git a/source/includes/examples/cli-example-download-logs.rst b/source/includes/examples/cli-example-download-logs.rst deleted file mode 100644 index a548072f..00000000 --- a/source/includes/examples/cli-example-download-logs.rst +++ /dev/null @@ -1,5 +0,0 @@ -.. code-block:: - :copyable: true - - atlas logs download atlas-lnmtkm-shard-00-00.ajlj3.mongodb.net mongodb.gz --projectId 56fd11f25f23b33ef4c2a331 - \ No newline at end of file diff --git a/source/includes/examples/cli-example-retrieve-logs.rst b/source/includes/examples/cli-example-retrieve-logs.rst deleted file mode 100644 index 51bd73ce..00000000 --- a/source/includes/examples/cli-example-retrieve-logs.rst +++ /dev/null @@ -1,5 +0,0 @@ -.. code-block:: - :copyable: true - - atlas accesslogs list --output json --projectId 618d48e05277a606ed2496fe --clusterName Cluster0 - \ No newline at end of file diff --git a/source/includes/examples/cli/cli-example-audit-filter-use.rst b/source/includes/examples/cli/cli-example-audit-filter-use.rst new file mode 100644 index 00000000..e8eaab1b --- /dev/null +++ b/source/includes/examples/cli/cli-example-audit-filter-use.rst @@ -0,0 +1,5 @@ +.. code-block:: + :copyable: true + + atlas auditing update --enabled --auditFilter '{"atype": "authenticate", "param.db": "test"}' + diff --git a/source/includes/examples/cli/cli-example-audit-filter.rst b/source/includes/examples/cli/cli-example-audit-filter.rst new file mode 100644 index 00000000..9c82006d --- /dev/null +++ b/source/includes/examples/cli/cli-example-audit-filter.rst @@ -0,0 +1,4 @@ +.. code-block:: + :copyable: true + + { "atype": "authenticate", "param.db": "test" } diff --git a/source/includes/examples/cli/cli-example-audit-logs-config-file.rst b/source/includes/examples/cli/cli-example-audit-logs-config-file.rst new file mode 100644 index 00000000..3a663a74 --- /dev/null +++ b/source/includes/examples/cli/cli-example-audit-logs-config-file.rst @@ -0,0 +1,4 @@ +.. code-block:: + :copyable: true + + atlas auditing update --enabled -f filter.json diff --git a/source/includes/examples/cli-example-audit-logs-describe.rst b/source/includes/examples/cli/cli-example-audit-logs-describe.rst similarity index 52% rename from source/includes/examples/cli-example-audit-logs-describe.rst rename to source/includes/examples/cli/cli-example-audit-logs-describe.rst index d67773bf..b00f7c7e 100644 --- a/source/includes/examples/cli-example-audit-logs-describe.rst +++ b/source/includes/examples/cli/cli-example-audit-logs-describe.rst @@ -1,4 +1,5 @@ -.. code-block:: - :copyable: true +.. code-block:: + :copyable: true atlas auditing describe --output json + diff --git a/source/includes/examples/cli/cli-example-audit-logs-known-users.rst b/source/includes/examples/cli/cli-example-audit-logs-known-users.rst new file mode 100644 index 00000000..50aeaf7c --- /dev/null +++ b/source/includes/examples/cli/cli-example-audit-logs-known-users.rst @@ -0,0 +1,4 @@ +.. code-block:: + :copyable: true + + atlas auditing update --enabled --auditFilter '{"atype": "authenticate"}' diff --git a/source/includes/examples/cli-example-backup-compliance-policy-enable.rst b/source/includes/examples/cli/cli-example-backup-compliance-policy-enable.rst similarity index 100% rename from source/includes/examples/cli-example-backup-compliance-policy-enable.rst rename to source/includes/examples/cli/cli-example-backup-compliance-policy-enable.rst diff --git a/source/includes/examples/cli-example-backup-compliance-policy-schedule.rst b/source/includes/examples/cli/cli-example-backup-compliance-policy-schedule.rst similarity index 100% rename from source/includes/examples/cli-example-backup-compliance-policy-schedule.rst rename to source/includes/examples/cli/cli-example-backup-compliance-policy-schedule.rst diff --git a/source/includes/examples/cli-example-backup-take-snapshot.rst b/source/includes/examples/cli/cli-example-backup-take-snapshot.rst similarity index 100% rename from source/includes/examples/cli-example-backup-take-snapshot.rst rename to source/includes/examples/cli/cli-example-backup-take-snapshot.rst diff --git a/source/includes/examples/cli-example-create-projects.rst b/source/includes/examples/cli/cli-example-create-projects.rst similarity index 100% rename from source/includes/examples/cli-example-create-projects.rst rename to source/includes/examples/cli/cli-example-create-projects.rst diff --git a/source/includes/examples/cli/cli-example-download-audit-logs-mongod.rst b/source/includes/examples/cli/cli-example-download-audit-logs-mongod.rst new file mode 100644 index 00000000..997c1c17 --- /dev/null +++ b/source/includes/examples/cli/cli-example-download-audit-logs-mongod.rst @@ -0,0 +1,5 @@ +.. code-block:: + :copyable: true + + atlas logs download cluster0-shard-00-00.a1b2c.mongodb.net mongodb-audit-log.gz + \ No newline at end of file diff --git a/source/includes/examples/cli/cli-example-download-audit-logs-mongos.rst b/source/includes/examples/cli/cli-example-download-audit-logs-mongos.rst new file mode 100644 index 00000000..42c442d1 --- /dev/null +++ b/source/includes/examples/cli/cli-example-download-audit-logs-mongos.rst @@ -0,0 +1,5 @@ +.. code-block:: + :copyable: true + + atlas logs download cluster0-shard-00-00.a1b2c.mongodb.net mongos-audit-log.gz + \ No newline at end of file diff --git a/source/includes/examples/cli/cli-example-download-logs.rst b/source/includes/examples/cli/cli-example-download-logs.rst new file mode 100644 index 00000000..7086951f --- /dev/null +++ b/source/includes/examples/cli/cli-example-download-logs.rst @@ -0,0 +1,5 @@ +.. code-block:: + :copyable: true + + atlas logs download cluster0-shard-00-00.a1b2c.mongodb.net mongodb.gz + diff --git a/source/includes/examples/cli-example-metrics-disks.rst b/source/includes/examples/cli/cli-example-metrics-disks.rst similarity index 100% rename from source/includes/examples/cli-example-metrics-disks.rst rename to source/includes/examples/cli/cli-example-metrics-disks.rst diff --git a/source/includes/examples/cli/cli-example-retrieve-access-logs.rst b/source/includes/examples/cli/cli-example-retrieve-access-logs.rst new file mode 100644 index 00000000..f0ad1f7c --- /dev/null +++ b/source/includes/examples/cli/cli-example-retrieve-access-logs.rst @@ -0,0 +1,5 @@ +.. code-block:: + :copyable: true + + atlas accessLogs list --output json --clusterName Cluster0 + \ No newline at end of file diff --git a/source/includes/examples/cli/cli-example-retrieve-audit-logs-mongod.rst b/source/includes/examples/cli/cli-example-retrieve-audit-logs-mongod.rst new file mode 100644 index 00000000..1080b89e --- /dev/null +++ b/source/includes/examples/cli/cli-example-retrieve-audit-logs-mongod.rst @@ -0,0 +1,4 @@ +.. code-block:: + :copyable: true + + atlas deployments logs --output json --type atlas --hostname cluster0-shard-00-00.a1b2c.mongodb.net --name mongodb-audit-log.gz \ No newline at end of file diff --git a/source/includes/examples/cli/cli-example-retrieve-audit-logs-mongos.rst b/source/includes/examples/cli/cli-example-retrieve-audit-logs-mongos.rst new file mode 100644 index 00000000..343d0369 --- /dev/null +++ b/source/includes/examples/cli/cli-example-retrieve-audit-logs-mongos.rst @@ -0,0 +1,4 @@ +.. code-block:: + :copyable: true + + atlas deployments logs --output json --type atlas --hostname cluster0-shard-00-00.a1b2c.mongodb.net --name mongos-audit-log.gz \ No newline at end of file diff --git a/source/includes/examples/cli-example-retrieve-logs-org.rst b/source/includes/examples/cli/cli-example-retrieve-logs-org.rst similarity index 82% rename from source/includes/examples/cli-example-retrieve-logs-org.rst rename to source/includes/examples/cli/cli-example-retrieve-logs-org.rst index 894c1f69..d908b29a 100644 --- a/source/includes/examples/cli-example-retrieve-logs-org.rst +++ b/source/includes/examples/cli/cli-example-retrieve-logs-org.rst @@ -1,5 +1,4 @@ .. code-block:: :copyable: true - atlas events organizations list --orgId 5dd5a6b6f10fab1d71a58495 --output json - \ No newline at end of file + atlas events organizations list --orgId 5dd5a6b6f10fab1d71a58495 --output json \ No newline at end of file diff --git a/source/includes/examples/cli/cli-example-retrieve-logs-proj.rst b/source/includes/examples/cli/cli-example-retrieve-logs-proj.rst new file mode 100644 index 00000000..227fcb89 --- /dev/null +++ b/source/includes/examples/cli/cli-example-retrieve-logs-proj.rst @@ -0,0 +1,4 @@ +.. code-block:: + :copyable: true + + atlas events projects list --projectId 64ac57bfe9810c0263e9d655 --output json \ No newline at end of file diff --git a/source/includes/examples/cli/cli-example-retrieve-logs.rst b/source/includes/examples/cli/cli-example-retrieve-logs.rst new file mode 100644 index 00000000..a3859520 --- /dev/null +++ b/source/includes/examples/cli/cli-example-retrieve-logs.rst @@ -0,0 +1,4 @@ +.. code-block:: + :copyable: true + + atlas deployments logs --output json --type atlas --hostname cluster0-shard-00-00.a1b2c.mongodb.net --name mongodb.gz \ No newline at end of file diff --git a/source/includes/examples/cli-json-example-create-clusters-with-autoscaling.rst b/source/includes/examples/cli/cli-json-example-create-clusters-with-autoscaling.rst similarity index 100% rename from source/includes/examples/cli-json-example-create-clusters-with-autoscaling.rst rename to source/includes/examples/cli/cli-json-example-create-clusters-with-autoscaling.rst diff --git a/source/includes/examples/cli-json-example-create-clusters.rst b/source/includes/examples/cli/cli-json-example-create-clusters.rst similarity index 100% rename from source/includes/examples/cli-json-example-create-clusters.rst rename to source/includes/examples/cli/cli-json-example-create-clusters.rst diff --git a/source/includes/examples/cli-example-alerts-no-primary-devtest.rst b/source/includes/examples/cli/dev-test/cli-example-alerts-no-primary-devtest.rst similarity index 100% rename from source/includes/examples/cli-example-alerts-no-primary-devtest.rst rename to source/includes/examples/cli/dev-test/cli-example-alerts-no-primary-devtest.rst diff --git a/source/includes/examples/cli-example-auth-aws-iam-devtest.rst b/source/includes/examples/cli/dev-test/cli-example-auth-aws-iam-devtest.rst similarity index 100% rename from source/includes/examples/cli-example-auth-aws-iam-devtest.rst rename to source/includes/examples/cli/dev-test/cli-example-auth-aws-iam-devtest.rst diff --git a/source/includes/examples/cli-example-auth-oid-devtest.rst b/source/includes/examples/cli/dev-test/cli-example-auth-oid-devtest.rst similarity index 100% rename from source/includes/examples/cli-example-auth-oid-devtest.rst rename to source/includes/examples/cli/dev-test/cli-example-auth-oid-devtest.rst diff --git a/source/includes/examples/cli-example-auth-temp-user-devtest.rst b/source/includes/examples/cli/dev-test/cli-example-auth-temp-user-devtest.rst similarity index 100% rename from source/includes/examples/cli-example-auth-temp-user-devtest.rst rename to source/includes/examples/cli/dev-test/cli-example-auth-temp-user-devtest.rst diff --git a/source/includes/examples/cli-example-create-clusters-devtest.rst b/source/includes/examples/cli/dev-test/cli-example-create-clusters-devtest.rst similarity index 100% rename from source/includes/examples/cli-example-create-clusters-devtest.rst rename to source/includes/examples/cli/dev-test/cli-example-create-clusters-devtest.rst diff --git a/source/includes/examples/cli-example-metrics-disk-devtest.rst b/source/includes/examples/cli/dev-test/cli-example-metrics-disk-devtest.rst similarity index 100% rename from source/includes/examples/cli-example-metrics-disk-devtest.rst rename to source/includes/examples/cli/dev-test/cli-example-metrics-disk-devtest.rst diff --git a/source/includes/examples/cli-example-performance-advisor-enable-devtest.rst b/source/includes/examples/cli/dev-test/cli-example-performance-advisor-enable-devtest.rst similarity index 100% rename from source/includes/examples/cli-example-performance-advisor-enable-devtest.rst rename to source/includes/examples/cli/dev-test/cli-example-performance-advisor-enable-devtest.rst diff --git a/source/includes/examples/cli-example-alerts-connection-storms-stagingprod.rst b/source/includes/examples/cli/staging-prod/cli-example-alerts-connection-storms-stagingprod.rst similarity index 100% rename from source/includes/examples/cli-example-alerts-connection-storms-stagingprod.rst rename to source/includes/examples/cli/staging-prod/cli-example-alerts-connection-storms-stagingprod.rst diff --git a/source/includes/examples/cli-example-auth-oidc-stagingprod.rst b/source/includes/examples/cli/staging-prod/cli-example-auth-oidc-stagingprod.rst similarity index 100% rename from source/includes/examples/cli-example-auth-oidc-stagingprod.rst rename to source/includes/examples/cli/staging-prod/cli-example-auth-oidc-stagingprod.rst diff --git a/source/includes/examples/cli-example-auth-okta-stagingprod.rst b/source/includes/examples/cli/staging-prod/cli-example-auth-okta-stagingprod.rst similarity index 100% rename from source/includes/examples/cli-example-auth-okta-stagingprod.rst rename to source/includes/examples/cli/staging-prod/cli-example-auth-okta-stagingprod.rst diff --git a/source/includes/examples/cli-example-create-clusters-stagingprod.rst b/source/includes/examples/cli/staging-prod/cli-example-create-clusters-stagingprod.rst similarity index 100% rename from source/includes/examples/cli-example-create-clusters-stagingprod.rst rename to source/includes/examples/cli/staging-prod/cli-example-create-clusters-stagingprod.rst diff --git a/source/includes/examples/cli-example-metrics-processes-stagingprod.rst b/source/includes/examples/cli/staging-prod/cli-example-metrics-processes-stagingprod.rst similarity index 100% rename from source/includes/examples/cli-example-metrics-processes-stagingprod.rst rename to source/includes/examples/cli/staging-prod/cli-example-metrics-processes-stagingprod.rst diff --git a/source/includes/examples/cli-example-return-suggested-indexes-stagingprod.rst b/source/includes/examples/cli/staging-prod/cli-example-return-suggested-indexes-stagingprod.rst similarity index 100% rename from source/includes/examples/cli-example-return-suggested-indexes-stagingprod.rst rename to source/includes/examples/cli/staging-prod/cli-example-return-suggested-indexes-stagingprod.rst diff --git a/source/includes/examples/generated-examples/main.snippet.get-logs-main.go b/source/includes/examples/generated-examples/main.snippet.get-logs-main.go new file mode 100644 index 00000000..750bceb0 --- /dev/null +++ b/source/includes/examples/generated-examples/main.snippet.get-logs-main.go @@ -0,0 +1,27 @@ +func main() { + ctx := context.Background() + + // Create an Atlas client authenticated using OAuth2 with service account credentials + client, _, config, err := auth.CreateAtlasClient() + if err != nil { + log.Fatalf("Failed to create Atlas client: %v", err) + } + + params := &admin.GetHostLogsApiParams{ + GroupId: config.ProjectID, + HostName: config.HostName, + LogName: "mongodb", // The type of log to get ("mongodb" or "mongos") + } + + logFileName, err := getHostLogs(ctx, *client, params) + if err != nil { + log.Fatalf("Failed to download logs: %v", err) + } + + plainTextLog := strings.TrimSuffix(logFileName, ".gz") + ".log" + if err := unzipGzFile(logFileName, plainTextLog); err != nil { + log.Fatalf("Failed to unzip log file: %v", err) + } + +} + diff --git a/source/includes/examples/generated-examples/main.snippet.get-logs.go b/source/includes/examples/generated-examples/main.snippet.get-logs.go new file mode 100644 index 00000000..ad652619 --- /dev/null +++ b/source/includes/examples/generated-examples/main.snippet.get-logs.go @@ -0,0 +1,112 @@ +// See entire project at https://github.com/mongodb/atlas-architecture-go-sdk +package main + +import ( + "atlas-sdk-go/internal/auth" + "compress/gzip" + "context" + "fmt" + "io" + "log" + "os" + "strings" + + "go.mongodb.org/atlas-sdk/v20250219001/admin" +) + +func SafeClose(c io.Closer) { + if c != nil { + if err := c.Close(); err != nil { + log.Printf("Warning: failed to close resource: %v", err) + } + } +} + +// getHostLogs downloads a compressed .gz file that contains the MongoDB logs for +// the specified host in your project. +func getHostLogs(ctx context.Context, atlasClient admin.APIClient, params *admin.GetHostLogsApiParams) (string, error) { + logFileName := fmt.Sprintf("logs_%s_%s.gz", params.GroupId, params.HostName) + fmt.Printf("Fetching %s log for host %s in project %s\n", params.LogName, params.HostName, params.GroupId) + + if err := downloadLogs(ctx, atlasClient, params, logFileName); err != nil { + return "", err + } + + fmt.Printf("Logs saved to %s\n", logFileName) + return logFileName, nil +} + +func downloadLogs(ctx context.Context, atlasClient admin.APIClient, params *admin.GetHostLogsApiParams, filePath string) error { + resp, _, err := atlasClient.MonitoringAndLogsApi.GetHostLogsWithParams(ctx, params).Execute() + if err != nil { + return fmt.Errorf("fetch logs: %w", err) + } + defer SafeClose(resp) + + file, err := os.Create(filePath) + if err != nil { + return fmt.Errorf("create %q: %w", filePath, err) + } + defer SafeClose(file) + + if _, err := io.Copy(file, resp); err != nil { + return fmt.Errorf("write to %q: %w", filePath, err) + } + + return nil +} + +func unzipGzFile(srcPath, destPath string) error { + srcFile, err := os.Open(srcPath) + if err != nil { + return fmt.Errorf("open gz file: %w", err) + } + defer SafeClose(srcFile) + + gzReader, err := gzip.NewReader(srcFile) + if err != nil { + return fmt.Errorf("create gzip reader: %w", err) + } + defer SafeClose(gzReader) + + destFile, err := os.Create(destPath) + if err != nil { + return fmt.Errorf("create destination file: %w", err) + } + defer SafeClose(destFile) + + if _, err := io.Copy(destFile, gzReader); err != nil { + return fmt.Errorf("unzip copy error: %w", err) + } + + fmt.Printf("Unzipped logs to %s\n", destPath) + return nil +} + +func main() { + ctx := context.Background() + + // Create an Atlas client authenticated using OAuth2 with service account credentials + client, _, config, err := auth.CreateAtlasClient() + if err != nil { + log.Fatalf("Failed to create Atlas client: %v", err) + } + + params := &admin.GetHostLogsApiParams{ + GroupId: config.ProjectID, + HostName: config.HostName, + LogName: "mongodb", // The type of log to get ("mongodb" or "mongos") + } + + logFileName, err := getHostLogs(ctx, *client, params) + if err != nil { + log.Fatalf("Failed to download logs: %v", err) + } + + plainTextLog := strings.TrimSuffix(logFileName, ".gz") + ".log" + if err := unzipGzFile(logFileName, plainTextLog); err != nil { + log.Fatalf("Failed to unzip log file: %v", err) + } + +} + diff --git a/source/includes/examples/generated-examples/main.snippet.get-metrics-dev.go b/source/includes/examples/generated-examples/main.snippet.get-metrics-dev.go new file mode 100644 index 00000000..95ce8eae --- /dev/null +++ b/source/includes/examples/generated-examples/main.snippet.get-metrics-dev.go @@ -0,0 +1,64 @@ +// See entire project at https://github.com/mongodb/atlas-architecture-go-sdk +package main + +import ( + "atlas-sdk-go/internal/auth" + "context" + "encoding/json" + "fmt" + "go.mongodb.org/atlas-sdk/v20250219001/admin" + "log" +) + +// getDiskMetrics fetches metrics for a specified disk partition in a project and prints results to the console +func getDiskMetrics(ctx context.Context, atlasClient admin.APIClient, params *admin.GetDiskMeasurementsApiParams) (*admin.ApiMeasurementsGeneralViewAtlas, error) { + + resp, _, err := atlasClient.MonitoringAndLogsApi.GetDiskMeasurementsWithParams(ctx, params).Execute() + if err != nil { + if apiError, ok := admin.AsError(err); ok { + return nil, fmt.Errorf("failed to get metrics for partition: %s (API error: %v)", err, apiError.GetDetail()) + } + return nil, fmt.Errorf("failed to get metrics: %w", err) + } + if resp == nil || resp.HasMeasurements() == false { + return nil, fmt.Errorf("no metrics found for partition %s in project %s", params.PartitionName, params.GroupId) + } + jsonData, err := json.MarshalIndent(resp, "", " ") + if err != nil { + return nil, fmt.Errorf("failed to marshal response: %w", err) + } + fmt.Println(string(jsonData)) + return resp, nil +} + +func main() { + ctx := context.Background() + + // Create an Atlas client authenticated using OAuth2 with service account credentials + atlasClient, _, config, err := auth.CreateAtlasClient() + if err != nil { + log.Fatalf("Failed to create Atlas client: %v", err) + } + + // Fetch disk metrics using the following parameters: + partitionName := "data" + diskMetricsGranularity := admin.PtrString("P1D") + diskMetricsPeriod := admin.PtrString("P1D") + diskMetrics := []string{ + "DISK_PARTITION_SPACE_FREE", "DISK_PARTITION_SPACE_USED", + } + + diskMeasurementsParams := &admin.GetDiskMeasurementsApiParams{ + GroupId: config.ProjectID, + ProcessId: config.ProcessID, + PartitionName: partitionName, + M: &diskMetrics, + Granularity: diskMetricsGranularity, + Period: diskMetricsPeriod, + } + _, err = getDiskMetrics(ctx, *atlasClient, diskMeasurementsParams) + if err != nil { + fmt.Printf("Error fetching disk metrics: %v", err) + } +} + diff --git a/source/includes/examples/generated-examples/main.snippet.get-metrics-main-dev.go b/source/includes/examples/generated-examples/main.snippet.get-metrics-main-dev.go new file mode 100644 index 00000000..4cd22c5b --- /dev/null +++ b/source/includes/examples/generated-examples/main.snippet.get-metrics-main-dev.go @@ -0,0 +1,31 @@ +func main() { + ctx := context.Background() + + // Create an Atlas client authenticated using OAuth2 with service account credentials + atlasClient, _, config, err := auth.CreateAtlasClient() + if err != nil { + log.Fatalf("Failed to create Atlas client: %v", err) + } + + // Fetch disk metrics using the following parameters: + partitionName := "data" + diskMetricsGranularity := admin.PtrString("P1D") + diskMetricsPeriod := admin.PtrString("P1D") + diskMetrics := []string{ + "DISK_PARTITION_SPACE_FREE", "DISK_PARTITION_SPACE_USED", + } + + diskMeasurementsParams := &admin.GetDiskMeasurementsApiParams{ + GroupId: config.ProjectID, + ProcessId: config.ProcessID, + PartitionName: partitionName, + M: &diskMetrics, + Granularity: diskMetricsGranularity, + Period: diskMetricsPeriod, + } + _, err = getDiskMetrics(ctx, *atlasClient, diskMeasurementsParams) + if err != nil { + fmt.Printf("Error fetching disk metrics: %v", err) + } +} + diff --git a/source/includes/examples/generated-examples/main.snippet.get-metrics-main-prod.go b/source/includes/examples/generated-examples/main.snippet.get-metrics-main-prod.go new file mode 100644 index 00000000..f07839ad --- /dev/null +++ b/source/includes/examples/generated-examples/main.snippet.get-metrics-main-prod.go @@ -0,0 +1,32 @@ +func main() { + ctx := context.Background() + + // Create an Atlas client authenticated using OAuth2 with service account credentials + atlasClient, _, config, err := auth.CreateAtlasClient() + if err != nil { + log.Fatalf("Failed to create Atlas client: %v", err) + } + + // Fetch process metrics using the following parameters: + processMetricGranularity := admin.PtrString("PT1H") + processMetricPeriod := admin.PtrString("P7D") + processMetrics := []string{ + "OPCOUNTER_INSERT", "OPCOUNTER_QUERY", "OPCOUNTER_UPDATE", "TICKETS_AVAILABLE_READS", + "TICKETS_AVAILABLE_WRITE", "CONNECTIONS", "QUERY_TARGETING_SCANNED_OBJECTS_PER_RETURNED", + "QUERY_TARGETING_SCANNED_PER_RETURNED", "SYSTEM_CPU_GUEST", "SYSTEM_CPU_IOWAIT", + "SYSTEM_CPU_IRQ", "SYSTEM_CPU_KERNEL", "SYSTEM_CPU_NICE", "SYSTEM_CPU_SOFTIRQ", + "SYSTEM_CPU_STEAL", "SYSTEM_CPU_USER", + } + hostMeasurementsParams := &admin.GetHostMeasurementsApiParams{ + GroupId: config.ProjectID, + ProcessId: config.ProcessID, + M: &processMetrics, + Granularity: processMetricGranularity, + Period: processMetricPeriod, + } + _, err = getProcessMetrics(ctx, *atlasClient, hostMeasurementsParams) + if err != nil { + fmt.Printf("Error fetching host process metrics: %v", err) + } +} + diff --git a/source/includes/examples/generated-examples/main.snippet.get-metrics-prod.go b/source/includes/examples/generated-examples/main.snippet.get-metrics-prod.go new file mode 100644 index 00000000..99f27d0f --- /dev/null +++ b/source/includes/examples/generated-examples/main.snippet.get-metrics-prod.go @@ -0,0 +1,67 @@ +// See entire project at https://github.com/mongodb/atlas-architecture-go-sdk +package main + +import ( + "atlas-sdk-go/internal/auth" + "context" + "encoding/json" + "fmt" + "go.mongodb.org/atlas-sdk/v20250219001/admin" + "log" +) + +// getProcessMetrics fetches metrics for a specified host process in a project and prints results to the console +func getProcessMetrics(ctx context.Context, atlasClient admin.APIClient, params *admin.GetHostMeasurementsApiParams) (*admin.ApiMeasurementsGeneralViewAtlas, error) { + fmt.Printf("Fetching metrics for host process %s in project %s", params.ProcessId, params.GroupId) + + resp, _, err := atlasClient.MonitoringAndLogsApi.GetHostMeasurementsWithParams(ctx, params).Execute() + if err != nil { + if apiError, ok := admin.AsError(err); ok { + return nil, fmt.Errorf("failed to get metrics for process in host: %s (API error: %v)", err, apiError.GetDetail()) + } + return nil, fmt.Errorf("failed to get metrics: %w", err) + } + + if resp == nil || resp.HasMeasurements() == false { + return nil, fmt.Errorf("no metrics found for host process %s in project %s", params.ProcessId, params.GroupId) + } + jsonData, err := json.MarshalIndent(resp, "", " ") + if err != nil { + return nil, fmt.Errorf("failed to marshal response: %w", err) + } + fmt.Println(string(jsonData)) + return resp, nil +} + +func main() { + ctx := context.Background() + + // Create an Atlas client authenticated using OAuth2 with service account credentials + atlasClient, _, config, err := auth.CreateAtlasClient() + if err != nil { + log.Fatalf("Failed to create Atlas client: %v", err) + } + + // Fetch process metrics using the following parameters: + processMetricGranularity := admin.PtrString("PT1H") + processMetricPeriod := admin.PtrString("P7D") + processMetrics := []string{ + "OPCOUNTER_INSERT", "OPCOUNTER_QUERY", "OPCOUNTER_UPDATE", "TICKETS_AVAILABLE_READS", + "TICKETS_AVAILABLE_WRITE", "CONNECTIONS", "QUERY_TARGETING_SCANNED_OBJECTS_PER_RETURNED", + "QUERY_TARGETING_SCANNED_PER_RETURNED", "SYSTEM_CPU_GUEST", "SYSTEM_CPU_IOWAIT", + "SYSTEM_CPU_IRQ", "SYSTEM_CPU_KERNEL", "SYSTEM_CPU_NICE", "SYSTEM_CPU_SOFTIRQ", + "SYSTEM_CPU_STEAL", "SYSTEM_CPU_USER", + } + hostMeasurementsParams := &admin.GetHostMeasurementsApiParams{ + GroupId: config.ProjectID, + ProcessId: config.ProcessID, + M: &processMetrics, + Granularity: processMetricGranularity, + Period: processMetricPeriod, + } + _, err = getProcessMetrics(ctx, *atlasClient, hostMeasurementsParams) + if err != nil { + fmt.Printf("Error fetching host process metrics: %v", err) + } +} + diff --git a/source/includes/examples/generated-examples/snippet.config.json b/source/includes/examples/generated-examples/snippet.config.json new file mode 100644 index 00000000..207e973c --- /dev/null +++ b/source/includes/examples/generated-examples/snippet.config.json @@ -0,0 +1,7 @@ +{ + "MONGODB_ATLAS_BASE_URL": "https://cloud.mongodb.com", + "ATLAS_ORG_ID": "32b6e34b3d91647abb20e7b8", + "ATLAS_PROJECT_ID": "67212db237c5766221eb6ad9", + "ATLAS_CLUSTER_NAME": "myCluster", + "ATLAS_PROCESS_ID": "myCluster-shard-00-00.ajlj3.mongodb.net:27017" +} diff --git a/source/includes/examples/tf-example-alerts-replication-lag-devtest.rst b/source/includes/examples/terraform/dev-test/tf-example-alerts-replication-lag-devtest.rst similarity index 100% rename from source/includes/examples/tf-example-alerts-replication-lag-devtest.rst rename to source/includes/examples/terraform/dev-test/tf-example-alerts-replication-lag-devtest.rst diff --git a/source/includes/examples/tf-example-auth-grant-roles-devtest.rst b/source/includes/examples/terraform/dev-test/tf-example-auth-grant-roles-devtest.rst similarity index 100% rename from source/includes/examples/tf-example-auth-grant-roles-devtest.rst rename to source/includes/examples/terraform/dev-test/tf-example-auth-grant-roles-devtest.rst diff --git a/source/includes/examples/tf-example-auth-oidc-devtest.rst b/source/includes/examples/terraform/dev-test/tf-example-auth-oidc-devtest.rst similarity index 100% rename from source/includes/examples/tf-example-auth-oidc-devtest.rst rename to source/includes/examples/terraform/dev-test/tf-example-auth-oidc-devtest.rst diff --git a/source/includes/examples/tf-example-auth-scram-devtest.rst b/source/includes/examples/terraform/dev-test/tf-example-auth-scram-devtest.rst similarity index 100% rename from source/includes/examples/tf-example-auth-scram-devtest.rst rename to source/includes/examples/terraform/dev-test/tf-example-auth-scram-devtest.rst diff --git a/source/includes/examples/tf-example-auth-tfoutputs-devtest.rst b/source/includes/examples/terraform/dev-test/tf-example-auth-tfoutputs-devtest.rst similarity index 100% rename from source/includes/examples/tf-example-auth-tfoutputs-devtest.rst rename to source/includes/examples/terraform/dev-test/tf-example-auth-tfoutputs-devtest.rst diff --git a/source/includes/examples/tf-example-auth-tfvars-devtest.rst b/source/includes/examples/terraform/dev-test/tf-example-auth-tfvars-devtest.rst similarity index 100% rename from source/includes/examples/tf-example-auth-tfvars-devtest.rst rename to source/includes/examples/terraform/dev-test/tf-example-auth-tfvars-devtest.rst diff --git a/source/includes/examples/tf-example-auth-variables-devtest.rst b/source/includes/examples/terraform/dev-test/tf-example-auth-variables-devtest.rst similarity index 100% rename from source/includes/examples/tf-example-auth-variables-devtest.rst rename to source/includes/examples/terraform/dev-test/tf-example-auth-variables-devtest.rst diff --git a/source/includes/examples/tf-example-main-devtest.rst b/source/includes/examples/terraform/dev-test/tf-example-main-devtest.rst similarity index 100% rename from source/includes/examples/tf-example-main-devtest.rst rename to source/includes/examples/terraform/dev-test/tf-example-main-devtest.rst diff --git a/source/includes/examples/tf-example-monitoring-tfvars-devtest.rst b/source/includes/examples/terraform/dev-test/tf-example-monitoring-tfvars-devtest.rst similarity index 100% rename from source/includes/examples/tf-example-monitoring-tfvars-devtest.rst rename to source/includes/examples/terraform/dev-test/tf-example-monitoring-tfvars-devtest.rst diff --git a/source/includes/examples/tf-example-monitoring-variables-devtest.rst b/source/includes/examples/terraform/dev-test/tf-example-monitoring-variables-devtest.rst similarity index 100% rename from source/includes/examples/tf-example-monitoring-variables-devtest.rst rename to source/includes/examples/terraform/dev-test/tf-example-monitoring-variables-devtest.rst diff --git a/source/includes/examples/tf-example-tfvars-devtest.rst b/source/includes/examples/terraform/dev-test/tf-example-tfvars-devtest.rst similarity index 100% rename from source/includes/examples/tf-example-tfvars-devtest.rst rename to source/includes/examples/terraform/dev-test/tf-example-tfvars-devtest.rst diff --git a/source/includes/examples/tf-example-alerts-no-primary-stagingprod.rst b/source/includes/examples/terraform/staging-prod/tf-example-alerts-no-primary-stagingprod.rst similarity index 100% rename from source/includes/examples/tf-example-alerts-no-primary-stagingprod.rst rename to source/includes/examples/terraform/staging-prod/tf-example-alerts-no-primary-stagingprod.rst diff --git a/source/includes/examples/tf-example-alerts-replication-lag-stagingprod.rst b/source/includes/examples/terraform/staging-prod/tf-example-alerts-replication-lag-stagingprod.rst similarity index 100% rename from source/includes/examples/tf-example-alerts-replication-lag-stagingprod.rst rename to source/includes/examples/terraform/staging-prod/tf-example-alerts-replication-lag-stagingprod.rst diff --git a/source/includes/examples/tf-example-auth-create-custom-role-stagingprod.rst b/source/includes/examples/terraform/staging-prod/tf-example-auth-create-custom-role-stagingprod.rst similarity index 100% rename from source/includes/examples/tf-example-auth-create-custom-role-stagingprod.rst rename to source/includes/examples/terraform/staging-prod/tf-example-auth-create-custom-role-stagingprod.rst diff --git a/source/includes/examples/tf-example-auth-create-oidc-user-stagingprod.rst b/source/includes/examples/terraform/staging-prod/tf-example-auth-create-oidc-user-stagingprod.rst similarity index 100% rename from source/includes/examples/tf-example-auth-create-oidc-user-stagingprod.rst rename to source/includes/examples/terraform/staging-prod/tf-example-auth-create-oidc-user-stagingprod.rst diff --git a/source/includes/examples/tf-example-auth-oidc-stagingprod.rst b/source/includes/examples/terraform/staging-prod/tf-example-auth-oidc-stagingprod.rst similarity index 100% rename from source/includes/examples/tf-example-auth-oidc-stagingprod.rst rename to source/includes/examples/terraform/staging-prod/tf-example-auth-oidc-stagingprod.rst diff --git a/source/includes/examples/tf-example-auth-tfoutputs-stagingprod.rst b/source/includes/examples/terraform/staging-prod/tf-example-auth-tfoutputs-stagingprod.rst similarity index 100% rename from source/includes/examples/tf-example-auth-tfoutputs-stagingprod.rst rename to source/includes/examples/terraform/staging-prod/tf-example-auth-tfoutputs-stagingprod.rst diff --git a/source/includes/examples/tf-example-auth-tfvars-stagingprod.rst b/source/includes/examples/terraform/staging-prod/tf-example-auth-tfvars-stagingprod.rst similarity index 100% rename from source/includes/examples/tf-example-auth-tfvars-stagingprod.rst rename to source/includes/examples/terraform/staging-prod/tf-example-auth-tfvars-stagingprod.rst diff --git a/source/includes/examples/tf-example-auth-variables-stagingprod.rst b/source/includes/examples/terraform/staging-prod/tf-example-auth-variables-stagingprod.rst similarity index 100% rename from source/includes/examples/tf-example-auth-variables-stagingprod.rst rename to source/includes/examples/terraform/staging-prod/tf-example-auth-variables-stagingprod.rst diff --git a/source/includes/examples/tf-example-autoscaling-main-stagingprod.rst b/source/includes/examples/terraform/staging-prod/tf-example-autoscaling-main-stagingprod.rst similarity index 100% rename from source/includes/examples/tf-example-autoscaling-main-stagingprod.rst rename to source/includes/examples/terraform/staging-prod/tf-example-autoscaling-main-stagingprod.rst diff --git a/source/includes/examples/tf-example-main-stagingprod.rst b/source/includes/examples/terraform/staging-prod/tf-example-main-stagingprod.rst similarity index 92% rename from source/includes/examples/tf-example-main-stagingprod.rst rename to source/includes/examples/terraform/staging-prod/tf-example-main-stagingprod.rst index 883dbbb7..06719e34 100644 --- a/source/includes/examples/tf-example-main-stagingprod.rst +++ b/source/includes/examples/terraform/staging-prod/tf-example-main-stagingprod.rst @@ -16,8 +16,10 @@ org_id = var.atlas_org_id name = var.atlas_project_name # Assign the Project the Group with Specific Roles - team_id = mongodbatlas_team.project_group.team_id - role_names = ["GROUP_READ_ONLY", "GROUP_CLUSTER_MANAGER"] + teams { + team_id = mongodbatlas_team.project_group.team_id + role_names = ["GROUP_READ_ONLY", "GROUP_CLUSTER_MANAGER"] + } } # Create an Atlas Advanced Cluster diff --git a/source/includes/examples/tf-example-monitoring-tfvars-stagingprod.rst b/source/includes/examples/terraform/staging-prod/tf-example-monitoring-tfvars-stagingprod.rst similarity index 100% rename from source/includes/examples/tf-example-monitoring-tfvars-stagingprod.rst rename to source/includes/examples/terraform/staging-prod/tf-example-monitoring-tfvars-stagingprod.rst diff --git a/source/includes/examples/tf-example-monitoring-variables-stagingprod.rst b/source/includes/examples/terraform/staging-prod/tf-example-monitoring-variables-stagingprod.rst similarity index 100% rename from source/includes/examples/tf-example-monitoring-variables-stagingprod.rst rename to source/includes/examples/terraform/staging-prod/tf-example-monitoring-variables-stagingprod.rst diff --git a/source/includes/examples/tf-example-tfvars-autoscaling-stagingprod.rst b/source/includes/examples/terraform/staging-prod/tf-example-tfvars-autoscaling-stagingprod.rst similarity index 100% rename from source/includes/examples/tf-example-tfvars-autoscaling-stagingprod.rst rename to source/includes/examples/terraform/staging-prod/tf-example-tfvars-autoscaling-stagingprod.rst diff --git a/source/includes/examples/tf-example-tfvars-stagingprod.rst b/source/includes/examples/terraform/staging-prod/tf-example-tfvars-stagingprod.rst similarity index 100% rename from source/includes/examples/tf-example-tfvars-stagingprod.rst rename to source/includes/examples/terraform/staging-prod/tf-example-tfvars-stagingprod.rst diff --git a/source/includes/examples/tf-example-third-party-integration-stagingprod.rst b/source/includes/examples/terraform/staging-prod/tf-example-third-party-integration-stagingprod.rst similarity index 100% rename from source/includes/examples/tf-example-third-party-integration-stagingprod.rst rename to source/includes/examples/terraform/staging-prod/tf-example-third-party-integration-stagingprod.rst diff --git a/source/includes/examples/tf-example-access-entry-for-add-1.rst b/source/includes/examples/terraform/tf-example-access-entry-for-add-1.rst similarity index 100% rename from source/includes/examples/tf-example-access-entry-for-add-1.rst rename to source/includes/examples/terraform/tf-example-access-entry-for-add-1.rst diff --git a/source/includes/examples/tf-example-auditing-filter.rst b/source/includes/examples/terraform/tf-example-auditing-filter.rst similarity index 100% rename from source/includes/examples/tf-example-auditing-filter.rst rename to source/includes/examples/terraform/tf-example-auditing-filter.rst diff --git a/source/includes/examples/tf-example-auth-tfazure.rst b/source/includes/examples/terraform/tf-example-auth-tfazure.rst similarity index 100% rename from source/includes/examples/tf-example-auth-tfazure.rst rename to source/includes/examples/terraform/tf-example-auth-tfazure.rst diff --git a/source/includes/examples/tf-example-autoscaling-variables.rst b/source/includes/examples/terraform/tf-example-autoscaling-variables.rst similarity index 100% rename from source/includes/examples/tf-example-autoscaling-variables.rst rename to source/includes/examples/terraform/tf-example-autoscaling-variables.rst diff --git a/source/includes/examples/tf-example-aws-kms.rst b/source/includes/examples/terraform/tf-example-aws-kms.rst similarity index 100% rename from source/includes/examples/tf-example-aws-kms.rst rename to source/includes/examples/terraform/tf-example-aws-kms.rst diff --git a/source/includes/examples/tf-example-azure-key-vault.rst b/source/includes/examples/terraform/tf-example-azure-key-vault.rst similarity index 100% rename from source/includes/examples/tf-example-azure-key-vault.rst rename to source/includes/examples/terraform/tf-example-azure-key-vault.rst diff --git a/source/includes/examples/tf-example-backup-snapshot-pit-restore.rst b/source/includes/examples/terraform/tf-example-backup-snapshot-pit-restore.rst similarity index 100% rename from source/includes/examples/tf-example-backup-snapshot-pit-restore.rst rename to source/includes/examples/terraform/tf-example-backup-snapshot-pit-restore.rst diff --git a/source/includes/examples/tf-example-backup-snapshot-schedule-tier1.rst b/source/includes/examples/terraform/tf-example-backup-snapshot-schedule-tier1.rst similarity index 100% rename from source/includes/examples/tf-example-backup-snapshot-schedule-tier1.rst rename to source/includes/examples/terraform/tf-example-backup-snapshot-schedule-tier1.rst diff --git a/source/includes/examples/tf-example-backup-snapshot-schedule-tier2.rst b/source/includes/examples/terraform/tf-example-backup-snapshot-schedule-tier2.rst similarity index 100% rename from source/includes/examples/tf-example-backup-snapshot-schedule-tier2.rst rename to source/includes/examples/terraform/tf-example-backup-snapshot-schedule-tier2.rst diff --git a/source/includes/examples/tf-example-backup-snapshot-schedule-tier3.rst b/source/includes/examples/terraform/tf-example-backup-snapshot-schedule-tier3.rst similarity index 100% rename from source/includes/examples/tf-example-backup-snapshot-schedule-tier3.rst rename to source/includes/examples/terraform/tf-example-backup-snapshot-schedule-tier3.rst diff --git a/source/includes/examples/tf-example-backup-variables.rst b/source/includes/examples/terraform/tf-example-backup-variables.rst similarity index 100% rename from source/includes/examples/tf-example-backup-variables.rst rename to source/includes/examples/terraform/tf-example-backup-variables.rst diff --git a/source/includes/examples/tf-example-gcp-kms.rst b/source/includes/examples/terraform/tf-example-gcp-kms.rst similarity index 100% rename from source/includes/examples/tf-example-gcp-kms.rst rename to source/includes/examples/terraform/tf-example-gcp-kms.rst diff --git a/source/includes/examples/tf-example-private-link.rst b/source/includes/examples/terraform/tf-example-private-link.rst similarity index 100% rename from source/includes/examples/tf-example-private-link.rst rename to source/includes/examples/terraform/tf-example-private-link.rst diff --git a/source/includes/examples/tf-example-provider.rst b/source/includes/examples/terraform/tf-example-provider.rst similarity index 100% rename from source/includes/examples/tf-example-provider.rst rename to source/includes/examples/terraform/tf-example-provider.rst diff --git a/source/includes/examples/tf-example-variables.rst b/source/includes/examples/terraform/tf-example-variables.rst similarity index 100% rename from source/includes/examples/tf-example-variables.rst rename to source/includes/examples/terraform/tf-example-variables.rst diff --git a/source/includes/examples/tf-example-vpc-connection.rst b/source/includes/examples/terraform/tf-example-vpc-connection.rst similarity index 100% rename from source/includes/examples/tf-example-vpc-connection.rst rename to source/includes/examples/terraform/tf-example-vpc-connection.rst diff --git a/source/includes/examples/tf-dev-test-complete/aws-roles.tf b/source/includes/examples/tf-dev-test-complete/aws-roles.tf new file mode 100644 index 00000000..a3304c1e --- /dev/null +++ b/source/includes/examples/tf-dev-test-complete/aws-roles.tf @@ -0,0 +1,23 @@ +resource "aws_iam_role" "test_role" { + name = "mongodb_atlas_kms_test_role" + + assume_role_policy = < every 1 day retention_unit = "days" - retention_value = 4 + retention_value = 4 } policy_item_weekly { frequency_interval = 4 # accepted values = 1 to 7 -> every 1=Monday,2=Tuesday,3=Wednesday,4=Thursday,5=Friday,6=Saturday,7=Sunday day of the week diff --git a/source/includes/examples/tf-staging-prod-complete/private_link.tf b/source/includes/examples/tf-staging-prod-complete/private_link.tf new file mode 100644 index 00000000..c4bf020b --- /dev/null +++ b/source/includes/examples/tf-staging-prod-complete/private_link.tf @@ -0,0 +1,27 @@ +# AWS ONLY - Create a Private Link +resource "mongodbatlas_privatelink_endpoint" "aws_test" { + # Conditionally create this resource if the provider is "AWS" + count = var.cloud_provider == "AWS" ? 1 : 0 + project_id = mongodbatlas_project.atlas-project.id + provider_name = "AWS" + region = "US_EAST_1" + + timeouts { + create = "30m" + delete = "20m" + } +} + +# AZURE ONLY - Create a Private Link +resource "mongodbatlas_privatelink_endpoint" "azure_test" { + # Conditionally create this resource if the provider is "AZURE" + count = var.cloud_provider == "AZURE" ? 1 : 0 + project_id = mongodbatlas_project.atlas-project.id + provider_name = "AZURE" + region = "US_EAST_1" + + timeouts { + create = "30m" + delete = "20m" + } +} \ No newline at end of file diff --git a/source/includes/examples/tf-staging-prod-complete/provider.tf b/source/includes/examples/tf-staging-prod-complete/provider.tf new file mode 100644 index 00000000..34daed0d --- /dev/null +++ b/source/includes/examples/tf-staging-prod-complete/provider.tf @@ -0,0 +1,12 @@ +# Define the MongoDB Atlas Provider +terraform { + required_providers { + mongodbatlas = { + source = "mongodb/mongodbatlas" + } + } + required_version = ">= 0.13" # Update to your preferred minimum Terraform CLI version + # To see which versions are supported by the MongoDB Atlas Provider, see the + # following link: + # https://registry.terraform.io/providers/mongodb/mongodbatlas/latest/docs#hashicorp-terraform-versionhttpswwwterraformiodownloadshtml-compatibility-matrix +} \ No newline at end of file diff --git a/source/includes/examples/tf-staging-prod-complete/terraform.tfvars b/source/includes/examples/tf-staging-prod-complete/terraform.tfvars new file mode 100644 index 00000000..9a76b644 --- /dev/null +++ b/source/includes/examples/tf-staging-prod-complete/terraform.tfvars @@ -0,0 +1,33 @@ +atlas_org_id = "" +atlas_project_name = "" +environment = "prod" +cluster_instance_size_name = "M30" +cloud_provider = "Azure" +atlas_region = "" +mongodb_version = "8.0" +atlas_group_name = "Atlas Group" + +atlas_project_id = "" +atlas_cluster_name = "" +datadog_api_key = "" +datadog_region = "US5" +prometheus_user_name = "" +prometheus_password = "" + +connection_strings = [""] +token_audience = "https://management.azure.com/" + +auto_scaling_disk_gb = true +auto_scaling_compute = true +disk_size_gb = 40000 + +azure_tenant_id = "" +azure_subscription_id = "" +azure_client_id = "" +azure_client_secret = "" +azure_resource_group_name = "" +azure_key_vault_name = "" +azure_key_identifier = "" + +vm_admin_username="" +ssh_public_key="" diff --git a/source/includes/examples/tf-staging-prod-complete/variables.tf b/source/includes/examples/tf-staging-prod-complete/variables.tf new file mode 100644 index 00000000..14fc93dd --- /dev/null +++ b/source/includes/examples/tf-staging-prod-complete/variables.tf @@ -0,0 +1,164 @@ +# Atlas Organization ID +variable "atlas_org_id" { + type = string + description = "Atlas Organization ID" +} + +# Atlas Project Name +variable "atlas_project_name" { + type = string + description = "Atlas Project Name" +} + +# Atlas Project Environment +variable "environment" { + type = string + description = "The environment to be built" +} + +# Cluster Instance Size Name +variable "cluster_instance_size_name" { + type = string + description = "Cluster instance size name" +} + +# Cloud Provider to Host Atlas Cluster +variable "cloud_provider" { + type = string + description = "AWS or GCP or Azure" +} + +# Atlas Region +variable "atlas_region" { + type = string + description = "Atlas region where resources will be created" +} + +# MongoDB Version +variable "mongodb_version" { + type = string + description = "MongoDB Version" +} + +# Atlas Group Name +variable "atlas_group_name" { + type = string + description = "Atlas Group Name" +} + +# Storage Auto-scaling Enablement Flag +variable "auto_scaling_disk_gb" { + type = bool + description = "Flag that specifies whether disk auto-scaling is enabled" +} + +# Compute Auto-scaling Enablement Flag +variable "auto_scaling_compute" { + type = bool + description = "Flag that specifies whether cluster tier auto-scaling is enabled" +} + +# Disk Size in GB +variable "disk_size_gb" { + type = number + description = "Disk Size in GB" +} + +# Atlas Project ID +variable "atlas_project_id" { + type = string + description = "MongoDB Atlas project id" +} + +# Atlas Cluster Name +variable "atlas_cluster_name" { + type = string + description = "MongoDB Atlas Cluster Name" + default = "datadog-test-cluster" +} + +# Datadog API Key +variable "datadog_api_key" { + type = string + description = "Datadog api key" +} + +# Datadog Region +variable "datadog_region" { + type = string + description = "Datadog region" + default = "US5" +} + +# Prometheus User Name +variable "prometheus_user_name" { + type = string + description = "The Prometheus User Name" + default = "puser" +} + +# Prometheus Password +variable "prometheus_password" { + type = string + description = "The Prometheus Password" + default = "ppassword" +} + +# Azure Variables + +variable "token_audience" { + type = string + default = "https://management.azure.com/" + description = "Used as resource when getting the access token. See more in the [Azure documentation](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/how-to-use-vm-token#get-a-token-using-http)" +} + +variable "azure_tenant_id" { + type = string + description = "Azure Tenant ID" +} + +variable "azure_subscription_id" { + type = string + description = "Azure Subscription ID" +} + +variable "azure_client_id" { + type = string + description = "Azure Client ID" +} + +variable "azure_client_secret" { + type = string + description = "Azure Client Secret" +} + +variable "azure_resource_group_name" { + type = string + description = "Azure Resource Group Name" +} + +variable "azure_key_vault_name" { + type = string + description = "Azure Key Vault Name" +} + +variable "azure_key_identifier" { + type = string + description = "Azure Key Identifier" +} + +# MongoDB Atlas variables +variable "connection_strings" { + type = list(string) + description = "MongoDB Atlas Cluster Standard Connection Strings" +} + +variable "vm_admin_username" { + type = string + description = "VM Admin Username" +} + +variable "ssh_public_key" { + type = string + description = "SSH Public Key" +} \ No newline at end of file diff --git a/source/includes/images/atlas-database-replication.png b/source/includes/images/atlas-database-replication.png new file mode 100644 index 00000000..9d47c56f Binary files /dev/null and b/source/includes/images/atlas-database-replication.png differ diff --git a/source/includes/images/atlas-self-healing.png b/source/includes/images/atlas-self-healing.png new file mode 100644 index 00000000..4997b003 Binary files /dev/null and b/source/includes/images/atlas-self-healing.png differ diff --git a/source/includes/images/deployment-types.png b/source/includes/images/deployment-types.png new file mode 100644 index 00000000..7b0a6729 Binary files /dev/null and b/source/includes/images/deployment-types.png differ diff --git a/source/includes/images/global-template.png b/source/includes/images/global-template.png new file mode 100644 index 00000000..d8836f4d Binary files /dev/null and b/source/includes/images/global-template.png differ diff --git a/source/includes/images/global.svg b/source/includes/images/global.svg new file mode 100644 index 00000000..cf6265dc --- /dev/null +++ b/source/includes/images/global.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/source/includes/images/hybrid.svg b/source/includes/images/hybrid.svg new file mode 100644 index 00000000..ba5318db --- /dev/null +++ b/source/includes/images/hybrid.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/source/includes/images/multi-cloud.png b/source/includes/images/multi-cloud.png new file mode 100644 index 00000000..3c5ce000 Binary files /dev/null and b/source/includes/images/multi-cloud.png differ diff --git a/source/includes/images/multi-region-3+3.png b/source/includes/images/multi-region-3+3.png new file mode 100644 index 00000000..92792962 Binary files /dev/null and b/source/includes/images/multi-region-3+3.png differ diff --git a/source/includes/images/multi-region-5+2.png b/source/includes/images/multi-region-5+2.png new file mode 100644 index 00000000..0a43077d Binary files /dev/null and b/source/includes/images/multi-region-5+2.png differ diff --git a/source/includes/images/multi-region-multi-cloud.svg b/source/includes/images/multi-region-multi-cloud.svg new file mode 100644 index 00000000..0e9bcb85 --- /dev/null +++ b/source/includes/images/multi-region-multi-cloud.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/source/includes/images/multi-region-single-cloud.svg b/source/includes/images/multi-region-single-cloud.svg new file mode 100644 index 00000000..ea291c2e --- /dev/null +++ b/source/includes/images/multi-region-single-cloud.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/source/includes/images/multi-region-tier-1.png b/source/includes/images/multi-region-tier-1.png new file mode 100644 index 00000000..99fd03de Binary files /dev/null and b/source/includes/images/multi-region-tier-1.png differ diff --git a/source/includes/images/multi-region-tier-2.png b/source/includes/images/multi-region-tier-2.png new file mode 100644 index 00000000..0a43077d Binary files /dev/null and b/source/includes/images/multi-region-tier-2.png differ diff --git a/source/includes/images/multi-region-types.png b/source/includes/images/multi-region-types.png new file mode 100644 index 00000000..0b412c01 Binary files /dev/null and b/source/includes/images/multi-region-types.png differ diff --git a/source/includes/images/single-region.png b/source/includes/images/single-region.png new file mode 100644 index 00000000..6539b54b Binary files /dev/null and b/source/includes/images/single-region.png differ diff --git a/source/includes/images/single-region.svg b/source/includes/images/single-region.svg new file mode 100644 index 00000000..a3349039 --- /dev/null +++ b/source/includes/images/single-region.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/source/includes/rec-list.rst b/source/includes/rec-list.rst new file mode 100644 index 00000000..c3f7d66d --- /dev/null +++ b/source/includes/rec-list.rst @@ -0,0 +1,30 @@ +To find recommendations for your {+service+} cloud deployments, +refer to the following resources: + +- :ref:`arch-center-orgs-projects-clusters-recs` + +- Operational Efficiency + + - :ref:`arch-center-monitoring-alerts-recs` + +- Security + + - :ref:`arch-center-network-security-recs` + - :ref:`arch-center-authentication-recs` + - :ref:`arch-center-authorization-recs` + - :ref:`arch-center-data-encryption-recs` + - :ref:`arch-center-auditing-logging-recs` +- Reliability + + - :ref:`arch-center-ha-recs` + - :ref:`arch-center-resiliency-recs` + - :ref:`arch-center-backups-recs` + - :ref:`arch-center-dr-recs` + +- Performance + + - :ref:`arch-center-scalability-recs` + +- Cost Optimization + + - :ref:`arch-center-cost-saving-config` diff --git a/source/index.txt b/source/index.txt index ac458131..582e8210 100644 --- a/source/index.txt +++ b/source/index.txt @@ -58,6 +58,14 @@ Explore the following resources to get started with |service| and the Learn how to define a landing zone for your organization. + .. card:: + :headline: Deployment Paradigms + :url: https://mongodb.com/docs/atlas/architecture/deployment-paradigms/ + :icon: cloud_global + :icon-alt: icon + + Choose your deployment paradigm, such as a multi-region, global, or hybrid deployment. + .. card:: :headline: Orgs, Projects, and Clusters :url: https://mongodb.com/docs/atlas/architecture/hierarchy/ @@ -66,6 +74,22 @@ Explore the following resources to get started with |service| and the Set up the foundational {+service+} components. + .. card:: + :headline: Migration + :url: https://mongodb.com/docs/atlas/architecture/migration/ + :icon: mdb_live_migration + :icon-alt: icon + + Select a method for migrating to |service|. + + .. card:: + :headline: Operational Readiness Checklist + :url: https://mongodb.com/docs/atlas/architecture/operational-readiness-checklist/ + :icon: general_action_audit + :icon-alt: icon + + Use a checklist to help you prepare for a deployment. + Learn Best Practices for {+service+} ------------------------------------ @@ -115,7 +139,6 @@ Find features and best practices for each {+service+} {+waf+} pillar. Cost-saving configurations and billing data features and best practices. - .. toctree:: :titlesonly: diff --git a/source/landing-zone.txt b/source/landing-zone.txt index b1180aaf..bbe442a3 100644 --- a/source/landing-zone.txt +++ b/source/landing-zone.txt @@ -19,193 +19,154 @@ Landing Zone Design :depth: 2 :class: onecol -This page: - -- Introduces the concept of a landing zone, why you need one, and the - considerations that influence a landing zone's design. -- Helps you design a starter landing zone that gives - prescriptive guidance on: - - - How your teams can implement {+service+} in accordance with both - the {+waf+} pillars and your organization's unique requirements. - - How {+service+} fits into your organization's larger ecosystem and - architecture. - -MongoDB's Professional Services team partners with enterprise -customers to create custom landing zones for {+service+}. If you're -working with MongoDB's Professional Services, the resources on this -page can also help you plan for those discussions. - -Landing Zone Overview ---------------------- +A landing zone is a framework for establishing a well-architected and +pre-configured cloud environment. A MongoDB {+service+} landing zone defines organization-specific requirements +for operational efficiency, security, reliability, performance, and cost, +as well as the tools, procedures, and {+service+} configurations that teams must use to meet these requirements. +We recommend that all enterprise customers design a landing zone before moving workloads to {+service+}. -What Is A Landing Zone? -~~~~~~~~~~~~~~~~~~~~~~~ +Designing and implementing a landing zone can help you avoid expensive +efforts to redesign initial setups later down the line. +For example, one enterprise customer worked with {+ps+} in 2024 to set +company-wide standards for security and cost-efficiency before moving their workload to {+service+}. +Peer financial services companies had published new reports of user data leaks and runaway cloud costs due to +inconsistent encryption policies and redundant servers across business units. +To avoid these risks, {+ps+} helped our enterprise customer design a landing zone that aligns all of their +teams and stakeholders on company requirements, including encryption at rest with {+byok+} and FinOps integrations to tag and track spending. -A landing zone is a framework for establishing a well-architected and -pre-configured cloud environment for {+service+} that conforms to your organization's unique requirements across security, compliance, and governance. A landing zone is often a prerequisite for enterprises to -move workloads to the cloud, and it is often provisioned -programmatically using an API or tools like Terraform. - -An {+service+} landing zone defines the default, minimum, and maximum -settings that teams use to deploy workloads in {+service+}. A landing -zone also defines the tools and settings that teams should leverage in -order to integrate systems with {+service+} and their applications -connecting to {+service+}. - -Why Do You Need a Landing Zone? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Establishing a landing zone gives you confidence that -every workload your team deploys aligns with your requirements around -security, performance, reliability, operational efficiency, and cost -optimization. Designing and sharing out a landing zone helps -ensure alignment across all teams and stakeholders on how your -organization will work in {+service+}, and it prevents developers on -all of your teams following different standards. .. _landing-zone-considerations: -What Are Some Important Landing Zone Considerations? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +What Are Some Landing Zone Considerations? +------------------------------------------ -Consider the following organization-specific -requirements when you create a landing zone: +As you create your landing zone, define requirements for the following considerations: .. list-table:: :header-rows: 1 - :widths: 20 40 + :widths: 20 80 * - Consideration - Description - - * - System Hierarchy Requirements - - Identify how you will group database {+clusters+} for management - and isolation. For example, clarify how your team should arrange - {+service+} organizations, projects, and {+clusters+}. + * - System Hierarchy + - Choose a deployment hierarchy that groups your {+service+} organizations, projects, and clusters + in order to isolate between business units, environments, and billable groups as needed. + For example, you can group business units into individual organizations to ensure that + sales credentials cannot authenticate to product resources. To get recommendations and learn more about this topic, see :ref:`arch-center-hierarchy`. - * - Access Controls - - Identify MongoDB {+service+} Control Plane access controls, and - database access controls for both workload and workforce principals. Create a comprehensive list of principals and mechanisms for how you will authenticate and authorize. Define {+service+} API key policies, including authorizations and internal policies for key rotation. - To get recommendations and learn more about this topic, see - :ref:`arch-center-auth`. - * - Change Control and Auditability Requirements - - Clarify any change control or audit controls requirements. This - can include change approval processes and tools, along with - reporting guidelines. + * - Security + - By default, {+service+} blocks all access to your clusters, enforces |tls-ssl| encryption for all connections to your databases, + and encrypts data at rest using :ref:`cloud provider disk encryption `. + You must define the following security requirements to enable secure access to your clusters: + + - Network security configurations such as IP access list restrictions, or required private endpoints and |vpc| connections to limit + the extension of your network trust boundary + - Principals and mechanisms for how to authenticate and authorize access to + the |service| control plane (|service| UI and {+atlas-admin-api+}), database resources, and database operations + - Additional data encryption requirements for data in transit, at rest, and in use + + To get recommendations and learn more about this topic, see the following pages: + + - :ref:`arch-center-network-security` + - :ref:`arch-center-auth` + - :ref:`arch-center-data-encryption` + + * - Compliance + - Consider how your deployment's data residency determines data sovereignty + and therefore which laws apply to your data. Identify and account for any specific legal and regulatory + requirements not clearly articulated within other categories of requirements. + + Your data residency depends on which regions and geographies you choose to deploy to in your deployment paradigm, + and whether you choose to partition your data between geographies. + + To get recommendations and learn more about this topic, see the following pages: + + - :ref:`arch-center-compliance` + - :ref:`arch-center-paradigms` + + * - Disaster Recovery + - Define and record a disaster recovery plan that meets the following criteria: + + - Defines an optimal :abbr: `RPO (Recovery Point Objective)` and :abbr: `RTO (Recovery Time Objective)` for your organization + - Defines a backup snapshot schedule and requirements for snapshot retention and mutli-region distribution + - Identifies recovery methods such as snapshot restores, queryable backups, document versioning, region shifting, + or provider pivot + - Defines disaster recovery procedures for possible disaster scenarios such as zonal, regional, or cloud-provider + outages, resource failures, data corruption events, and more. To get recommendations and learn more about this topic, see - :ref:`arch-center-auditing-logging`. - * - Data Residency Requirements - - Identify data residency requirements, such as data sovereignty - requirements, or application data partitioning rules. - * - Network Topology Requirements - - Identify network topology requirements, including Cloud Provider - regions, private connectivity options, and application - deployment topology. + the following pages: + + - :ref:`arch-center-dr` + - :ref:`arch-center-backups` + + * - High Availability + - Set standards for high availability that ensure system operation during planned or + unplanned outages. These requirements will partially determine the number of nodes, + regions, and cloud providers in your deployment paradigm. To get recommendations and learn more about this topic, see - :ref:`arch-center-network-security`. - * - High Availability Requirements - - Identify high availability requirements such as cluster topology - with node count and priority per region, global write - requirements, and partitioning tolerances. + the following pages: + + - :ref:`arch-center-high-availability` + - :ref:`arch-center-paradigms` + + * - Billing + - Identify any specific requirements for billing, such as + integrations with FinOps tools for reporting and charge-back. + You can build these requirements into the automation and + provisioning process for {+service+} clusters to facilitate + this integration. To get recommendations and learn more about this topic, see - :ref:`arch-center-high-availability`. - * - Data Retention Requirements + :ref:`arch-center-billing-data`. + + * - Data Retention - Identify and record your data retention policies. This may require creating a classification for automation, including creation of archive or purge automation. In some cases, data must be preserved for a certain duration, whereas in other cases data must be purged after some duration. Identify performance characteristics of retrieval of archived records. - * - Disaster Recovery Requirements - - Identify all disaster recovery requirements, including: - - - Identify and record your current disaster recovery plan. - - Define backup snapshot schedules and retentions. - - Identify Restore Time Objective (RTO) and Restore Point - Objective (RPO). - - Define backup snapshot locations. - - Identify Point-in-time restore option selection. - - Define recovery methodologies such as restore, queryable - backups, document versioning, region shifting, and cloud - provider pivot. - - To get recommendations and learn more about this topic, see :ref:`arch-center-dr` and - :ref:`arch-center-backups`. - - * - Observability Requirements - - Identify requirements for external system integrations that you - use for observability, such as - ingestion of {+service+} log files, activity - feed data, audit logs, alerts, and metrics. Identify required alert events, thresholds, recipients, and - alert delivery mechanism(s). + + * - Monitoring and Observability + - Set standards for observability that define which logs and metrics you will monitor and how you will monitor them. For example, set up + external system integrations that ingest {+service+} log files, audit logs, or activity feed data, + or configure alerts and reporting guidelines for certain event types. To get recommendations and learn more about this topic, see - :ref:`arch-center-monitoring-alerts`. + the following pages: + + - :ref:`arch-center-monitoring-alerts`. + - :ref:`arch-center-auditing-logging`. - * - Security - - Identify and define any specific security - requirements, including requirements for encryption, network security, authentication, and authorization. Consider integration requirements with Security - Information and Event Management (SIEM) systems. + * - Auditing and Change Control + - Define any auditing or change control requirements. This + can include change approval processes and tools, along with + reporting guidelines. - To get recommendations and learn more about this topic, see the following resources: + To get recommendations and learn more about this topic, see + :ref:`arch-center-auditing-logging`. - - :ref:`arch-center-data-encryption` - - :ref:`arch-center-network-security` - - :ref:`arch-center-auth` - * - Legal and Compliance Requirements - - Identify and define any specific legal or compliance - requirements not clearly articulated within other requirements - definitions. - - To learn more about compliance, see - :ref:`arch-center-compliance`. - * - Maintenance Requirements + * - Maintenance - Identify whether you have any specific requirements regarding Maintenance windows or upgrade deferments. - * - Backup Snapshot Requirements - - Identify all backup snapshot requirements, including: - - - Identify snapshot schedule, retention, and multi-region - distribution requirements. - - Identify Point-in-time (PIT) backup requirements. - - Identify Encryption at rest requirements for backup snapshots. - - Identify snapshot access requirements (for example, should - snapshot download be allowed?). - - To get recommendations and learn more about this topic, see - :ref:`arch-center-backups`. - * - Billing Requirements - - Identify any specific requirements to billing, such as - integrations with FinOps tools for reporting and charge-back. - You can build these requirements into the automation and - provisioning process for {+service+} {+clusters+} to facilitate - this integration. - - To get recommendations and learn more about this topic, see - :ref:`arch-center-billing-data`. - -Finally, prioritize requirements based on their importance and impact. Design Your Landing Zone ------------------------ -Use the following resources to design an {+service+} landing zone. -We recommend that you compile all diagrams, recommendations, and -examples in a document and adjust them to meet your organization's -requirements. +{+ps+} team partners with enterprise +customers to create custom landing zones for {+service+}. If you're +working with {+ps+}, the resources on this +page can also help you plan for those discussions. +Use the following resources as a starting point for your {+service+} landing zone. Designing a landing zone is an iterative process that involves reviewing -and aligning based on the best path forward for integration to satisfy -requirements. The following resources -provide a starting point to begin your organization's first {+service+} -landing zone. +and realigning standards across teams. We recommend that you compile all diagrams, recommendations, and +examples in a document and adjust them to meet your organization's requirements. Example Landing Zone Diagram ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -229,8 +190,6 @@ Then, review the guidance and examples for each of the pages nested under each { diagrams, recommendations, tools, and examples that are relevant for your organization. -.. include:: /includes/complete-examples.rst - The {+atlas-arch-center+} pages include: - Operational Efficiency @@ -275,5 +234,6 @@ organization, see the previous section on Next Steps ---------- -See the :ref:`arch-center-hierarchy` page to learn about the building blocks of your {+service+} enterprise estate or use the left +See the :ref:`arch-center-migration` page to plan your migration +to {+service+} or use the left navigation to find features and best practices for each {+waf+} pillar. diff --git a/source/migration.txt b/source/migration.txt new file mode 100644 index 00000000..6a26a987 --- /dev/null +++ b/source/migration.txt @@ -0,0 +1,178 @@ +.. _arch-center-migration: + +========= +Migration +========= + +.. default-domain:: mongodb + +.. facet:: + :name: genre + :values: reference + +.. meta:: + :keywords: atlas architecture center + :description: Learn about live migrating data into Atlas from on-premises MongoDB databases using several automated processes. + +.. contents:: On this page + :local: + :backlinks: none + :depth: 1 + :class: onecol + +You can migrate data from your on-premises MongoDB deployments to |service| using one of +a variety of methods. We recommend using |service| :atlas:`live migration ` +when possible because it automates many of the tasks with the least amount of downtime, but you +can use other tools that accommodate the variety and complexity inherent to database migration. + +Live Migration Overview +----------------------- + +|service| live migration automates moving data from on-premises MongoDB databases to |service|. +You can pull data from an on-premises MongoDB database or push data using |com|, +but with either method, |service| live migration includes the following features: + +- The migration host always encrypts traffic to the |service| cluster. To encrypt + data end-to-end, :manual:`enable TLS on your source cluster `. Only users with specific + Role-Based Access Control (RBAC) :ref:`roles ` + (such as :authrole:`backup`, :authrole:`readAnyDatabase`, or :authrole:`clusterMonitor`) + can initiate live migration. Users authenticate to clusters using + :manual:`SCRAM-SHA-1 or SCRAM-SHA-256 `. + +- Live migration automates most tasks. For the fully-managed "pull" and "push" methods, live migration + monitors key metrics, provisions the host servers and the strict sequencing of the migration commands. + You specify the resource requirements and scalable options to prevent over-provisioning. + +- Detailed instructions help you provision migration hosts and scale destination clusters + to control costs. Recommendations include appropriate cluster sizing + and temporary scaling, followed by resizing to optimal levels post-migration. + +- Live migration uses |mongosync| to facilitate fast cutover through parallel data copying. + Processes manage temporary network interruptions and cluster elections, using continuous + data synchronization and a final cutover phase to achieve minimal downtime. Retry + mechanisms and pre-migration validations enhance resilience against interruptions. + +- Monitor migrations with real-time status updates and notifications. + +Live Migration Methods +---------------------- + +You can use |com| to push data into |service| or use a live migration server +to pull data into |service|. + +Ensure you allocate adequate |cpu| and network resources for the migration host. +While you can run multiple concurrent migrations, each deployment must have a +dedicated migration host. + +All live migration methods require that the source and destination databases run +MongoDB 6.0.13+ or MongoDB 7.0.8+. To migrate data from databases using prior +versions of MongoDB, see :atlas:`Legacy Migration ` or +:ref:`arch-center-manual-migration`. + +* **Pull data into Atlas.** |service| pulls data from the source MongoDB deployment + and requires access to the source deployment through the deployment's firewall. When the + clusters are nearly synced, you must stop write operations on the source, + redirect applications to the |service| cluster, and restart them. The following + considerations apply: + + - Best for deployments not monitored by |com|. + - The source database must be publicly accessible to allow inbound access from the live migration server. + - Doesn't support :ref:`VPC peering ` or :ref:`private endpoints ` + for either the source or destination cluster. + - Source and destination cluster topologies must match. For example, both + must be replica sets or sharded clusters with the same number of shards. + - Plan for minimal downtime to stop writes and restart applications with a new connection string. + The migration process is CPU-intensive and requires significant network bandwidth. MongoDB + recommends that migration hosts are run on their own dedicated server with a high + network bandwidth. Provision the migration host with the following minimum configuration: + + .. list-table:: + :header-rows: 1 + :widths: 20 40 + + * - Number of VMs + - Total of 3 VMs, 2 for sharded and 1 for RS + + * - Purpose + - Mongosync + + * - Location + - Must access on-prem and public cloud + + * - CPU + - 8 + + * - Memory + - 32 GB + + * - OS + - OS - 64-bit + + * - Disk Size + - Disk enough for logging + + To ensure a smooth migration process, confirm that the source cluster's oplog size is adequate to + cover the entire migraiton duration. + + For full migration recommendations and instructions, see :ref:`c2c-pull-live-migration`. + +* **Push data into Atlas.** |com| pushes data to |service| using a secure :term:`link-token` + without requiring access to the source cluster through the cluster's firewall. + During migration, |service| continuously syncs real-time data between the source + and destination clusters until cutover. The following + considerations apply: + + - Data is synchronized in one direction only: changes made to the destination won't + reflect back on the source. + - Supports :ref:`VPC peering ` and :ref:`private endpoints `. + - Source and destination cluster topologies must match. For example, both + must be replica sets or sharded clusters with the same number of shards. + - Ensure that you connect to your |service| cluster from all client hardware where your + applications run. Testing your connection string helps ensure that your data migration + process completes with minimal downtime. + + For full migration recommendations and instructions, see :ref:`c2c-push-live-migration`. + +Monitoring Migrations +--------------------- + +.. include:: /includes/cloud-docs/shared-migration-monitoring-description.rst + +To learn more, see :ref:`monitor-migrations`. + +.. _arch-center-manual-migration: + +Manual Migration Methods +------------------------ + +If |service| live migration can't satisfy the constraints of your migration requirements, +you can bring data from existing MongoDB deployments, ``JSON``, or ``CSV`` files +into |service| using one of the following tools that you run outside of |service|. + +.. include:: /includes/cloud-docs/shared-migration-tools-table.rst + +If you are required to use either |service| VNet peering or Private Link configurations, you +don't want to allow direct connection from a third party to its source cluster, or if you +don't already have or don't want to import the source cluster in |onprem| or |mms|, then MongoDB +recommends the mongosync approach. + +If you have relatively small datasets (<300 GB) to migrate, and can afford application downtime +for an extended time period, then MongoDB recommends the :dbtools:`mongodump ` +and :dbtools:`mongorestore ` approach. + +If you have relatively small datasets (<300 GB) to migrate, no index concerns, and can afford +application downtime for an extended time period, then MongoDB recommends the :dbtools:`mongoexport ` +and :dbtools:`mongoimport ` approach. + +Cutover +------- + +.. include:: /includes/cloud-docs/shared-migration-cutover-description.rst + +To learn more, see :ref:`monitor-migrations`. + +Next Steps +---------- + +See the :ref:`arch-center-hierarchy` page to learn about the building blocks of your {+service+} enterprise estate or use the left +navigation to find features and best practices for each {+waf+} pillar. \ No newline at end of file diff --git a/source/monitoring-alerts.txt b/source/monitoring-alerts.txt index b328e699..b0bd5383 100644 --- a/source/monitoring-alerts.txt +++ b/source/monitoring-alerts.txt @@ -10,8 +10,13 @@ Guidance for {+service+} Monitoring and Alerts :name: genre :values: reference +.. facet:: + :name: programming_language + :values: go, shell + .. meta:: :description: Learn how to monitor and set up alerts on your Atlas cluster. + :keywords: code example, atlas sdk for go, terraform configuration, atlas cli .. contents:: On this page :local: @@ -107,8 +112,36 @@ Features for {+service+} Monitoring and Alerts Recommendations for {+service+} Monitoring and Alerts ----------------------------------------------------- +.. collapsible:: + :heading: Single-Region Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments in a single region + :expanded: true + + Singe-region deployments have no unique considerations + for integrations with third-party alerting tools, + such as :ref:`Datadog ` and :ref:`Prometheus `. + See :ref:`arch-center-monitoring-alerts-recs-for-any-deployment`. + +.. collapsible:: + :heading: Multi-Region and Multi-Cloud Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments across multiple regions or multiple cloud providers + :expanded: true + + Multi-region and multi-cloud deployments have no unique considerations + for integrations with third-party alerting tools, + such as :ref:`Datadog ` and :ref:`Prometheus `. + See :ref:`arch-center-monitoring-alerts-recs-for-any-deployment`. + +.. _arch-center-monitoring-alerts-recs-for-any-deployment: + +All Deployment Paradigm Recommendations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following recommendations apply to all :ref:`deployment paradigms +`. + Monitor by Using Metrics -~~~~~~~~~~~~~~~~~~~~~~~~ +```````````````````````` To monitor your {+cluster+} or database performance, you can view the {+cluster+} metrics such as historical throughput, performance, and utilization metrics. The following table lists some @@ -151,7 +184,7 @@ custom metrics tools. To learn more, see :ref:`monitor-cluster-metrics`. Monitor by Configuring Alerts -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +````````````````````````````` |service| extends into your existing observability stack so that you can get alerts and make data-driven decisions without replacing your current tools or changing your workflows. {+service+} can send alert notifications with third-party tools like @@ -173,7 +206,7 @@ available filters to limit results by host, replica set, cluster, shard, and more to help focus on the most critical alerts. Recommended {+service+} Alert Configurations -```````````````````````````````````````````` +############################################ At the minimum, we recommend configuring the following alerts. These alerts recommendations provide a baseline, but you should adjust them @@ -348,13 +381,13 @@ multiple alerts for the same condition, one for a low priority level of severity To learn more, see :atlas:`Configure and Resolve Alerts `. Monitor by Using {+service+} Built-In Tools -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +``````````````````````````````````````````` |service| provides several tools that allow you to proactively monitor and improve the performance of your database. Real-Time Performance Panel -``````````````````````````` +########################### The Real-Time Performance Panel (RTPP) in the {+atlas-ui+} provides insights into current network traffic, database operations, and hardware @@ -377,7 +410,7 @@ with :oas-atlas-op:`Update One Project Settings To learn more, see :ref:`real-time-metrics-status-tab`. Query Profiler -`````````````` +############## The Query Profiler identifies slow queries and bottlenecks, and suggests index refinement and query restructuring to improve the performance of @@ -394,7 +427,7 @@ with :oas-atlas-op:`Return Slow Queries To learn more, see :ref:`query-profiler`. Performance Advisor -``````````````````` +################### The Performance Advisor automatically analyzes logs for slow-running queries and recommends indexes to create and drop. It analyzes slow @@ -416,7 +449,7 @@ with :oas-atlas-op:`Return Slow Queries To learn more, see :ref:`performance-advisor`. Namespace Insights -`````````````````` +################## The Namespace Insights page in the {+atlas-ui+} allows you to monitor collection-level performance and usage metrics. It displays metrics @@ -430,7 +463,7 @@ decisions about scaling, indexing, and query tuning. To learn more, see :ref:`namespace-insights`. Monitor by Using Logs -~~~~~~~~~~~~~~~~~~~~~ +````````````````````` .. include:: /includes/cloud-docs/logs.rst @@ -441,8 +474,6 @@ When you configure this feature, {+service+} continually pushes logs from ``mong Automation Examples: {+service+} Monitoring and Logging ------------------------------------------------------- -.. include:: /includes/complete-examples.rst - The following examples demonstrate how to enable monitoring using |service| :ref:`tools for automation `. @@ -472,7 +503,7 @@ The following examples demonstrate how to enable monitoring using |service| space on the specified disk. This metric can be used to determine if the system is running out of free space. - .. include:: /includes/examples/cli-example-metrics-disk-devtest.rst + .. include:: /includes/examples/cli/dev-test/cli-example-metrics-disk-devtest.rst Configure Alerts ~~~~~~~~~~~~~~~~ @@ -480,7 +511,7 @@ The following examples demonstrate how to enable monitoring using |service| Run the following command to create an alert notification to an email address when your deployment doesn't have a primary. - .. include:: /includes/examples/cli-example-alerts-no-primary-devtest.rst + .. include:: /includes/examples/cli/dev-test/cli-example-alerts-no-primary-devtest.rst Monitor Database Performance ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -488,7 +519,7 @@ The following examples demonstrate how to enable monitoring using |service| Run the following command to enable |service|-managed slow operation threshold for your project. - .. include:: /includes/examples/cli-example-performance-advisor-enable-devtest.rst + .. include:: /includes/examples/cli/dev-test/cli-example-performance-advisor-enable-devtest.rst Download Logs ~~~~~~~~~~~~~ @@ -496,10 +527,57 @@ The following examples demonstrate how to enable monitoring using |service| Run the following command to download a compressed file that contains the MongoDB logs for the specified host in your project. - .. include:: /includes/examples/cli-example-download-logs.rst + .. include:: /includes/examples/cli/cli-example-download-logs.rst + + .. tab:: Atlas Go SDK + :tabid: go-sdk + + .. include:: /includes/complete-examples-go-sdk.rst + + Before you can authenticate and run the example scripts using + the Atlas Go SDK, you must: + + - :ref:`Create an Atlas service account `. + Store your client ID and secret as environment variables by + running the following command in the terminal: + + .. code-block:: shell + + export MONGODB_ATLAS_SERVICE_ACCOUNT_ID="" + export MONGODB_ATLAS_SERVICE_ACCOUNT_SECRET="" + + - Set the following configuration variables in your Go project: + + .. literalinclude:: /includes/examples/generated-examples/snippet.config.json + :language: json + :caption: configs/config.json + + View Cluster Metrics + ~~~~~~~~~~~~~~~~~~~~ + + The following example script demonstrates how to retrieve the amount of + used and free space on the specified disk. This metric can be used + to determine if the system is running out of free space. + + .. literalinclude:: /includes/examples/generated-examples/main.snippet.get-metrics-dev.go + :language: go + :caption: get-cluster-metrics/main.go + + Download Logs + ~~~~~~~~~~~~~ + + The following example script demonstrates how to download and + unzip a compressed file that contains the + MongoDB logs for the specified host in your Atlas project: + + .. literalinclude:: /includes/examples/generated-examples/main.snippet.get-logs.go + :language: go + :caption: get-logs/main.go .. tab:: Terraform - :tabid: terraform + :tabid: terraform + + .. include:: /includes/complete-examples-terraform.rst Before you can create resources with Terraform, you must: @@ -509,15 +587,14 @@ The following examples demonstrate how to enable monitoring using |service| Store your |api| key as environment variables by running the following command in the terminal: - .. code-block:: + .. code-block:: shell export MONGODB_ATLAS_PUBLIC_KEY="" export MONGODB_ATLAS_PRIVATE_KEY="" - `Install Terraform `__ - We also suggest `creating a workspace for your environment - `__. + We also recommend `creating a workspace for your environment `__. Configure Alerts ~~~~~~~~~~~~~~~~ @@ -530,12 +607,12 @@ The following examples demonstrate how to enable monitoring using |service| variables.tf ```````````` - .. include:: /includes/examples/tf-example-monitoring-variables-devtest.rst + .. include:: /includes/examples/terraform/dev-test/tf-example-monitoring-variables-devtest.rst terraform.tfvars ```````````````` - .. include:: /includes/examples/tf-example-monitoring-tfvars-devtest.rst + .. include:: /includes/examples/terraform/dev-test/tf-example-monitoring-tfvars-devtest.rst **Example:** Use the following to send an alert notification by email to users with the ``GROUP_CLUSTER_MANAGER`` role @@ -545,7 +622,7 @@ The following examples demonstrate how to enable monitoring using |service| main.tf ``````` - .. include:: /includes/examples/tf-example-alerts-replication-lag-devtest.rst + .. include:: /includes/examples/terraform/dev-test/tf-example-alerts-replication-lag-devtest.rst .. tab:: Staging and Prod Environments :tabid: stagingprod @@ -560,20 +637,20 @@ The following examples demonstrate how to enable monitoring using |service| Run the sample command to retrieve the following metrics: - - OPCOUNTERS - to monitor the amount of queries, updates, inserts, + - OPCOUNTERS - Monitor the amount of queries, updates, inserts, and deletes occurring at peak load and ensure that load doesn't - increase unexpectedly. - - TICKETS - to ensure that the number of allowed concurrent reads - and writes doesn't lower much, or frequently. - - CONNECTIONS - to ensure that the number of sockets used for - heartbeats and replication between members isn't above the - set limit. - - QUERY TARGETING - to ensure that number of keys and documents + increase unexpectedly. + - TICKETS - Ensure that the number of allowed concurrent reads + and writes doesn't lower much, or frequently. + - CONNECTIONS - Ensure that the number of sockets used for + heartbeats and replication between members isn't above the + set limit. + - QUERY TARGETING - Ensure that number of keys and documents scanned to the number of documents returned, averaged by second, - are't too high. - - SYSTEM CPU - to ensure that the CPU usage is steady. + aren't too high. + - SYSTEM CPU - Ensure that the CPU usage is steady. - .. include:: /includes/examples/cli-example-metrics-processes-stagingprod.rst + .. include:: /includes/examples/cli/staging-prod/cli-example-metrics-processes-stagingprod.rst Configure Alerts ~~~~~~~~~~~~~~~~ @@ -582,7 +659,7 @@ The following examples demonstrate how to enable monitoring using |service| there are possible connection storms based on the number of connections in your project. - .. include:: /includes/examples/cli-example-alerts-connection-storms-stagingprod.rst + .. include:: /includes/examples/cli/staging-prod/cli-example-alerts-connection-storms-stagingprod.rst Monitor Database Performance ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -590,7 +667,7 @@ The following examples demonstrate how to enable monitoring using |service| Run the following command to retrieve the suggested indexes for collections experiencing slow queries. - .. include:: /includes/examples/cli-example-return-suggested-indexes-stagingprod.rst + .. include:: /includes/examples/cli/staging-prod/cli-example-return-suggested-indexes-stagingprod.rst Download Logs ~~~~~~~~~~~~~ @@ -598,11 +675,72 @@ The following examples demonstrate how to enable monitoring using |service| Run the following command to download a compressed file that contains the MongoDB logs for the specified host in your project. - .. include:: /includes/examples/cli-example-download-logs.rst + .. include:: /includes/examples/cli/cli-example-download-logs.rst + + .. tab:: Atlas Go SDK + :tabid: go-sdk + + .. include:: /includes/complete-examples-go-sdk.rst + + Before you can authenticate and run the example scripts using + the Atlas Go SDK, you must: + + - :ref:`Create an Atlas service account `. + Store your client ID and secret as environment variables by + running the following command in the terminal: + + .. code-block:: shell + + export MONGODB_ATLAS_SERVICE_ACCOUNT_ID="" + export MONGODB_ATLAS_SERVICE_ACCOUNT_SECRET="" + + - Set the following configuration variables in your Go project: + + .. literalinclude:: /includes/examples/generated-examples/snippet.config.json + :language: json + :caption: configs/config.json + + For more information on authenticating and creating a client, see + the complete Atlas SDK for Go example project `in GitHub `__. + + View Cluster Metrics + ~~~~~~~~~~~~~~~~~~~~ + + The following example script demonstrates how to retrieve the following metrics: + + - OPCOUNTERS - Monitor the amount of queries, updates, inserts, + and deletes occurring at peak load and ensure that load doesn't + increase unexpectedly. + - TICKETS - Ensure that the number of allowed concurrent reads + and writes doesn't lower much, or frequently. + - CONNECTIONS - Ensure that the number of sockets used for + heartbeats and replication between members isn't above the + set limit. + - QUERY TARGETING - Ensure that number of keys and documents + scanned to the number of documents returned, averaged by second, + aren't too high. + - SYSTEM CPU - Ensure that the CPU usage is steady. + + .. literalinclude:: /includes/examples/generated-examples/main.snippet.get-metrics-prod.go + :language: go + :caption: get-disk-metrics/main.go + + Download Logs + ~~~~~~~~~~~~~ + + The following example script demonstrates how to download and + unzip a compressed file that contains the + MongoDB logs for the specified host in your Atlas project: + + .. literalinclude:: /includes/examples/generated-examples/main.snippet.get-logs.go + :language: go + :caption: get-logs/main.go .. tab:: Terraform :tabid: terraform + .. include:: /includes/complete-examples-terraform.rst + Before you can create resources with Terraform, you must: - :ref:`Create your paying organization @@ -611,7 +749,7 @@ The following examples demonstrate how to enable monitoring using |service| Store your |api| key as environment variables by running the following command in the terminal: - .. code-block:: + .. code-block:: shell export MONGODB_ATLAS_PUBLIC_KEY="" export MONGODB_ATLAS_PRIVATE_KEY="" @@ -630,12 +768,12 @@ The following examples demonstrate how to enable monitoring using |service| variables.tf ```````````` - .. include:: /includes/examples/tf-example-monitoring-variables-stagingprod.rst + .. include:: /includes/examples/terraform/staging-prod/tf-example-monitoring-variables-stagingprod.rst terraform.tfvars ```````````````` - .. include:: /includes/examples/tf-example-monitoring-tfvars-stagingprod.rst + .. include:: /includes/examples/terraform/staging-prod/tf-example-monitoring-tfvars-stagingprod.rst **Example 1:** Use the following to integrate with third-party services like Datadog and Prometheus for alert notification. @@ -643,7 +781,7 @@ The following examples demonstrate how to enable monitoring using |service| main.tf ``````` - .. include:: /includes/examples/tf-example-third-party-integration-stagingprod.rst + .. include:: /includes/examples/terraform/staging-prod/tf-example-third-party-integration-stagingprod.rst **Example 2:** Use the following to send alert notification to third-party services like Datadog and Prometheus when @@ -652,7 +790,7 @@ The following examples demonstrate how to enable monitoring using |service| main.tf ``````` - .. include:: /includes/examples/tf-example-alerts-no-primary-stagingprod.rst + .. include:: /includes/examples/terraform/staging-prod/tf-example-alerts-no-primary-stagingprod.rst **Example 3:** Use the following to send an alert notification by email to users with the @@ -662,4 +800,4 @@ The following examples demonstrate how to enable monitoring using |service| main.tf ``````` - .. include:: /includes/examples/tf-example-alerts-replication-lag-stagingprod.rst + .. include:: /includes/examples/terraform/staging-prod/tf-example-alerts-replication-lag-stagingprod.rst diff --git a/source/multi-cloud.txt b/source/multi-cloud.txt new file mode 100644 index 00000000..a54cfacc --- /dev/null +++ b/source/multi-cloud.txt @@ -0,0 +1,177 @@ +.. _arch-center-paradigms-multi-cloud: + +=============================== +Multi-Cloud Deployment Paradigm +=============================== + +.. default-domain:: mongodb + +.. contents:: On this page + :local: + :backlinks: none + :depth: 2 + :class: onecol + +Multi-cloud {+service+} deployments set up {+cluster+} nodes across +multiple geographic regions *and* multiple cloud providers. Multi-cloud +deployments enhance protection in the case of a regional outage or +cloud provider outage by automatically rerouting traffic to a +node in another region for continuous availability and a +smooth user experience. Multi-cloud deployments can also protect +against vendor lock-in, enhance performance, and help meet +compliance requirements for :ref:`data sovereignty +`. + +{+service+} supports multi-cloud deployment across any combination of +|aws|, |azure|, and {+gcp+}. + +To learn how to configure multi-cloud deployments and learn about the +different types of nodes you can add, see +:atlas:`Configure High Availability and Workload Isolation +` in the {+service+} +documentation. + +Multi-region deployments can deploy nodes +across a single cloud provider or multiple cloud providers. As each +cloud provider has its own set of regions, multi-cloud deployments are +also multi-region deployments. To learn about single-cloud multi-region +deployments, see :ref:`arch-center-paradigms-single`. + +To learn about global deployments, which support zone configuration +across multiple cloud providers, see +:ref:`arch-center-paradigms-global`. + +The following diagram shows a multi-region, multi-cloud {+service+} +deployment for regions that support availability zones: + +.. figure:: /includes/images/multi-region-multi-cloud.svg + :figwidth: 750px + :alt: An image showing a five-node deployment spread across three regions and two cloud providers. Each region contains one zone per node. + +Use Cases for Multi-Cloud Deployments +------------------------------------- + +To learn the use cases for multi-region, single-cloud deployments, see +:ref:`arch-center-paradigms-single`. + +Multi-region, single-cloud deployments are best for the following use +cases: + +Region-Specific Applications that Require Low Latency and High Availability (Multiple Cloud Providers) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To improve performance for critical operations, +all {+service+} deployments allow you to deploy data +close to your users' geographic location, which reduces latency. +However, deploying to a single region or single cloud provider +sacrifices high availability if there is a regional or cloud provider +outage. Additionally, deploying to a single cloud provider +can make it difficult to change vendors later in case of pricing +changes or other cloud provider changes. +You can configure a multi-region, multi-cloud deployment for +high availability, low latency, *and* avoid vendor lock-in. + +A multi-region, single-cloud deployment may be best for you if you have +the following requirements: + +- You want to use multiple cloud providers +- You want to deploy to more than one region for high availability +- Your application requires low latency *and* has a majority of users + in one geographic location, since multi-region deployments allow you + to deploy nodes to different regions within the same geographic area. + +For example, for an application deployed with |aws| and |azure| with +users primarily located in the US, you can deploy a multi-region, +multi-cloud deployment to three regions within the US (such as +``us-east-1`` and ``us-east-2`` for |aws|, and ``eastus`` for |azure|). +This ensures low latency since all regions are within the eastern US, +while offering high availability if there's a regional outage or a cloud +provider outage that affects the primary +node. Additionally, it makes it easier to transition off of one of these +cloud providers if needed. + +If your application requires low latency for users in any region across +a global user base, consider a :ref:`arch-center-paradigms-global`. + +If your application requires low latency but doesn't require +cross-region or cross-provider high +availability, consider a :ref:`arch-center-paradigms-single`. + +If your application requires low latency, high availability, and +deployment across +a single cloud provider, consider a +:ref:`arch-center-paradigms-multi-region`. + +Applications that Require Data Sovereignty and High Availability (Multiple Cloud Providers) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For compliance with data residency laws, data can be partitioned to +reside in specific regions, ensuring adherence to local regulations. +However, deploying to a single region or single cloud provider +sacrifices high availability if there is a regional or cloud provider +outage. Additionally, deploying to a single cloud provider +can make it difficult to change vendors later in case of pricing +changes or other cloud provider changes. +You can configure a multi-region, multi-cloud deployment for +high availability, data sovereignty, *and* avoid vendor lock-in. + +A multi-region, multi-cloud deployment may be best for you if you have +the following requirements: + +- You want to use multiple cloud providers +- You want to deploy to more than one region for high availability +- Your application requires data sovereignty, since multi-region + deployments allow you to deploy nodes to different regions within the + same geographic area. + +For example, for an application deployed with |aws| and |azure| that +requires data storage in Europe, you can deploy a multi-region, +multi-cloud deployment to three regions within the EU (such as +``eu-west-1`` and ``eu-west-2`` for |aws|, and ``uksouth`` for |azure|). +This ensures data sovereignty since all regions are within the EU, while +offering high availability if there's a regional outage or a cloud +provider outage that affects the primary node. Additionally, it makes +it easier to transition off of one of these cloud providers if needed. + +If your application requires data sovereignty but doesn't require +cross-region or cross-provider high +availability, consider a :ref:`arch-center-paradigms-single`. + +If your application requires data sovereignty and deployment across +a single cloud provider, consider a +:ref:`arch-center-paradigms-multi-region`. + +Considerations for Multi-Cloud Deployments +------------------------------------------- + +Other considerations for multi-region deployments include: + +- High availability depends on the deployment of nodes across + regions as well as the number, distribution, and priority order of + nodes. To learn more about recommended + {+cluster+} topologies for high availability, see + :ref:`arch-center-high-availability`. +- Multi-cloud deployments are available only for ``M10`` dedicated + {+clusters+} and larger. + +For more considerations, see +:atlas:`Considerations +` in the +{+service+} documentation. + +Recommendations for Multi-Cloud Deployments +-------------------------------------------- + +To learn about recommendations that apply to multi-region and +multi-cloud deployments, see the following sections: + +.. important:: + + All recommendations that apply to multi-region deployments also + apply to multi-cloud deployments because all multi-cloud deployments + are also multi-region. There may be other considerations + for multi-cloud deployments that are not covered in the + {+atlas-arch-center+}, such as cloud-provider-aware application-side + settings. Contact {+ps+} + team to create a custom landing zone for your {+service+} multi-cloud + deployments that covers all considerations. diff --git a/source/multi-region.txt b/source/multi-region.txt new file mode 100644 index 00000000..801bc583 --- /dev/null +++ b/source/multi-region.txt @@ -0,0 +1,113 @@ +.. _arch-center-paradigms-multi-region: + +================================ +Multi-Region Deployment Paradigm +================================ + +.. default-domain:: mongodb + +.. contents:: On this page + :local: + :backlinks: none + :depth: 2 + :class: onecol + +Multi-region {+service+} deployments set up {+cluster+} nodes across +multiple regions (as defined by the cloud providers). Multi-region +deployments enhance protection in the case of a :ref:`regional outage +` by automatically rerouting traffic to a +node in another region for continuous availability and a +smooth user experience. Multi-region deployments can also enhance +performance and can help meet compliance requirements for :ref:`data sovereignty +`, such as the EU's General Data +Protection Regulation (GDPR) law. + +A multi-region deployment may have multiple regions within the same geography +(a large area like a continent or country), single regions in multiple +geographies, or multiple regions in multiple geographies. + +Multi-region deployments can exist with a single cloud provider or +multiple cloud providers. To learn about multi-cloud deployments, see +:ref:`arch-center-paradigms-multi-cloud`. + +To learn how to configure multi-region deployments and learn about the +different types of nodes you can add, see +:atlas:`Configure High Availability and Workload Isolation +` in the {+service+} +documentation. + +Use Cases for Multi-Region Deployments +-------------------------------------- +Consider the 3 use cases in the following image: + +.. figure:: /includes/images/multi-region-types.png + :figwidth: 750px + :alt: An image showing three types of multi-region deployments + +In the first example, where you deploy to multiple regions in the same geography, +you have an application that has users primarily located in the US. You create a +multi-region deployment in three regions within the US. This ensures low latency, +since all regions are within the US, while also offering high availability if +there's a regional outage on one of the nodes (for example, ``us-east-1``). + +In the second example, where you deploy to one region in multiple geographies, +your application requires low latency and high availability for users in both +the US and Europe. You create a multi-region deployment with a region located +in both the US and Europe. In this scenario, European users are served from the +European nodes and US users are served from US servers, ensuring lower latency +and better performance. This also helps comply with local regulations like GDPR. + +The most complex example of a multi-region deployment has multiple regions in +multiple geographies, ensures the highest level of availability with a single +provider. If your application requires the very highest level of availability +and lowest latency, consider a :ref:`arch-center-paradigms-multi-cloud`. + +.. _arch-center-global-deployments: + +Global Deployments +------------------ + +Global {+service+} deployments are the most complex multi-region deployment +paradigms, and therefore require very careful planning. In almost all cases, +a :ref:`arch-center-paradigms-multi-region` (or its subset, a +:ref:`arch-center-paradigms-multi-cloud`) will fulfill your needs. + +The following are a few reasons why you might consider a global deployment +strategy: + +- You need a single global connection string. +- You need to perform global aggregations across all customers. +- You need the ability to read/write for all customers from everywhere + in one logical cluster, while also having regional reads/writes. + +.. note:: + + The complexity of global deployments results in many opinions on best + practices. The {+atlas-arch-center+} does not currently cover recommendations + specific to global deployments. Contact {+ps+} team to discuss your + specific requirements and to design a {+service+} global deployment + strategy. + +Data Sovereignty and High Availability Considerations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For compliance with data residency laws, data can be partitioned to +reside in specific regions, ensuring adherence to local regulations. +However, deploying to a single region sacrifices high availability if +there is a regional outage. + +You can configure a multi-region deployment for both high availability +and data sovereignty. For example, for an application deployed with |aws| +that requires data storage in Europe, you can deploy a multi-region deployment +to three regions within the EU (such as ``eu-west-1``, ``eu-west-2``, +and ``eu-west-3``). This ensures data sovereignty since all regions are within +the EU, while offering high availability if there's a regional outage that +affects one of the nodes. + +.. _arch-center-multi-region-rec-summary: + +Recommendations for Multi-Region Deployments +--------------------------------------------- + +.. include:: /includes/rec-list.rst + diff --git a/source/network-security.txt b/source/network-security.txt index a40d733f..79167a0d 100644 --- a/source/network-security.txt +++ b/source/network-security.txt @@ -24,7 +24,7 @@ database deployments, such as: - Mandatory |tls-ssl| connection encryption - {+vpc+}\s for all projects with one-or-more {+Dedicated-clusters+} -- Authentication that uses {+ip-access-list+}s and only accepts connections +- Authentication that uses {+ip-access-list+}s and only accepts database connections from sources you explicitly declare You can further configure these protections to meet your unique security @@ -41,6 +41,7 @@ databases. We recommend using M10+ dedicated {+clusters+} because all {+service+} projects with one or more M10+ dedicated {+clusters+} receive their own dedicated: + - |vpc| on {+aws+} or {+gcp+}. - {+vnet+} on |azure|. @@ -49,13 +50,15 @@ one or more M10+ dedicated {+clusters+} receive their own dedicated: By default, all access to your {+clusters+} is blocked. You must explicitly allow an inbound connection by one of the following methods: -- Add public IP addresses to your {+ip-access-list+}. -- Use |vpc| / {+vnet+} peering to add private IP addresses. - Add private endpoints, which {+service+} adds automatically to your {+ip-access-list+}. No other access is automatically added. +- Use |vpc| or {+vnet+} peering to add private IP addresses. +- Add public IP addresses to your {+ip-access-list+}. You can also use multiple methods together for added security. +.. _arch-center-tls: + |tls| ~~~~~~~~~~~~~~ @@ -65,6 +68,8 @@ databases. |tls| 1.2 is the default protocol. To learn more, see the :ref:`Configure Additional Settings `. +.. _arch-center-ip-access-list: + {+ip-access-list+}s ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -75,6 +80,9 @@ attempt authentication to your database. Your {+service+} {+clusters+} allow access only from the IP addresses and |cidr| block IP ranges that you add to your {+ip-access-list+}. +We recommend that you permit access to the smallest network segments +possible, such as an individual ``/32`` address. + Application servers and other clients can't access your {+service+} {+clusters+} if their IP addresses aren't included in your {+ip-access-list+}. @@ -110,11 +118,13 @@ you must update your connection string to reflect your new cluster topology. In the case of adding new shards, the failure to reconnect the application clients may cause your application to suffer from a data outage. +.. _arch-center-private-endpoints: + Private Endpoints ~~~~~~~~~~~~~~~~~ -A private endpoint facilitates a one-way connection from your own |vpc| -to your {+service+} |vpc|, without permitting {+service+} to initiate a +A private endpoint facilitates a one-way connection from a |vpc| that you manage +directly to your {+service+} |vpc|, without permitting {+service+} to initiate a reciprocal connection. This allows you to make use of secure connections to {+service+} without extending your network trust boundary. The following private endpoints are available: @@ -130,6 +140,25 @@ private endpoints are available: :alt: "An image representing how MongoDB Atlas private endpoints work." :figwidth: 750px +.. _arch-center-multi-region-recs-private-endpoints: + +Multi-Region Considerations +``````````````````````````` + +- For global private endpoints, |service| automatically generates a |srv| record + that points to all |service| cluster nodes. The MongoDB driver attempts to + connect to each |srv| record from your application. This allows the driver to + handle a failover event without waiting for |dns| replication and without + requiring you to update the driver's connection string. +- In order to facilitate automatic |srv| record generation for all nodes in your + |service| cluster, you must establish a |vpc| peering connection between the + |vpc| in which your application is deployed and and your MongoDB |vpc| with + `PrivateLink `__ or equivalent. +- Private endpoints must be enabled in every region that you have an |service| + cluster deployed. + +.. _arch-center-vpc-vnet-peering: + VPC/{+vnet+} Peering ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -161,27 +190,182 @@ based on the |cidr| block. For example, a project with a |cidr| block of Recommendations for {+service+} Network Security ------------------------------------------------ +.. collapsible:: + :heading: Single-Region Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments in a single region + :expanded: true + + Recommendations here + +.. collapsible:: + :heading: Multi-Region and Multi-Cloud Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments across multiple regions or multiple cloud providers + :expanded: true + + Recommendations here + +All Deployment Paradigm Recommendations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following recommendations apply to all :ref:`deployment paradigms +`. + Private Endpoints -~~~~~~~~~~~~~~~~~ +````````````````` + +We recommend that you set up private endpoints for all new staging and production +projects to limit the extension of your network trust boundary. + +In general, we recommend using private endpoints for every {+service+} project, +because this approach provides the most granular security and eases the +administrative burden that can come from managing {+ip-access-list+}\s and large +blocks of IP addresses as your cloud network scales. There is a cost associated +with each endpoint. So, you don't need private endpoints in +lower environments, but you should leverage them in higher environments to limit +the extension of your network trust boundary. + +If VPCs or VNets in which your application is deployed can't be peered with one +another, potentially due to a combination of on-premises and cloud deployments, +you might want to consider a regional private endpoint. + +With regional private endpoints, you can do the following: + +- Connect a single private endpoint to multiple VNets or VPCs without peering + them directly to each other. -We recommend that you set up private endpoints for all new staging and production projects to limit the extension of your network trust -boundary. +- Mitigate partial region failures in which one or more services within a region fails. -In general, we recommend using private endpoints for every {+service+} project -because it provides the most granular security and eases the administrative burden -that can come from managing {+ip-access-list+}\s and large blocks of IP addresses -as your cloud network scales. There is a cost associated with each endpoint, so you -might consider not requiring private endpoints in lower environments but you should -leverage them in higher environments to limit the extension of your network trust -boundary. +To network with regional endpoints, you must do the following: + +- Perform regular and robust health checks to detect an outage and update routing. + +- Use a distinct connection string for each region. + +- Manage cross-region routing in |service|. + +- Deploy a Mongos server and an additional metadata server if you are running a + MongoDB version older than v8.0. To learn more about private endpoints in {+service+}, including limitations and considerations, see :atlas:`Learn About Private Endpoints in {+service+} `. To learn how to set up private endpoints for your {+clusters+}, see :atlas:`Set Up a Private Endpoint for a Dedicated Cluster `. +Cloud Provider-Specific Guidance +````````````````````````````````` +- |aws|: We recommend |vpc| peering across all of your self-managed VPCs that + need to connect to |service|. Leverage global private endpoints. + +- |azure|: We recommend VNet peering across all of your self-managed VNets that + need to connect to |service|. Leverage global private endpoints. + +- |gcp|: Peering is not required across your self-managed VPCs when using + GlobalConnect. All |service| regions must be networked with private endpoints + to your self-managed VPC in each region. + +|gcp| Private Endpoints Considerations and Limitations +``````````````````````````````````````````````````````` + +Atlas services are accessed through GCP Private Service Connect endpoints on +ports 27015 through 27017. The ports can change under specific circumstances, +including (but not limited to) cluster changes. + +- {+gcp-psc+} must be active in all regions into which you + deploy a multi-region cluster. You will receive an error + if {+gcp-psc+} is active in some, but not all, targeted + regions. + +- You can do only one of the following: + + - Deploy nodes in more than one region, and have one + private endpoint per region. + - Have multiple private endpoints in one region, and no + other private endpoints. + + .. important:: + + This limitation applies across cloud providers. For + example, if you create more than one private endpoint + in a single region in |gcp|, you can't create + private endpoints in |aws| or any other |gcp| + region. + + See :ref:`atlas_regionalized-pl` for an exception for + multi-region and global sharded clusters. + +- |service| creates 50 service attachments, each with a + subnet mask value of 27. You can change the number of + service attachments and the subnet masks that |service| + creates by setting the following limits with the + :oas-atlas-op:`Set One Project Limit ` + {+atlas-admin-api+} endpoint: + + - Set the + ``atlas.project.deployment.privateServiceConnectionsPerRegionGroup`` limit to + change the number of service attachments. + - Set the ``atlas.project.deployment.privateServiceConnectionsSubnetMask`` + limit to change the subnet mask for each service + attachment. + + To learn more, see :oas-atlas-op:`Set One Project Limit + `. + +- You can have up to 50 nodes when you create |service| projects + that use {+gcp-psc+} in a **single region**. If you need + to change the number of nodes, perform one of the + following actions: + + - Remove existing private endpoints and then + change the limit using the :oas-atlas-op:`Set One + Project Limit ` {+atlas-admin-api+} + endpoint. + - Contact :ref:`MongoDB Support `. + - Use additional projects or regions to connect to nodes + beyond this limit. + +.. important:: + + - Each private endpoint in |gcp| reserves an IP address + within your |gcp| |vpc| and forwards traffic from the + endpoints' IP addresses to the + :gcp:`service attachments `. + You must create an equal number of private endpoints + to the number of service attachments. The number of + service attachments defaults to 50. + + Addressable targets include: + + - Each |mongod| instance in a replica set deployment + (sharded clusters excluded). + - Each |mongos| instance in a sharded cluster deployment. + - Each |bic| instance across all dedicated clusters in the + project. + +- You can have up to 40 nodes when you create |service| projects + that use {+gcp-psc+} across **multiple regions**. This total + excludes the following instances: + + - |gcp| regions communicating with each other + - {+Free-clusters+} or {+Shared-clusters+} + +- |gcp| {+google-psc+} supports up to 1024 outgoing + connections per virtual machine. As a result, you can't + have more than 1024 connections from a single |gcp| + virtual machine to an |service| cluster. + + To learn more, see the |gcp| + :gcp:`cloud NAT documentation + `. + +- |gcp| {+google-psc+} is region-specific. However, you + can configure :gcp:`global access + ` + to access private endpoints from a different region. + + To learn more, see :ref:`Multi-Region Support `. + IP Access Lists -~~~~~~~~~~~~~~~ +``````````````` We recommend that you configure an {+ip-access-list+} for your API keys and programmatic access to allow access only from trusted IP addresses such as your CI/CD pipeline @@ -201,7 +385,7 @@ When you configure your {+ip-access-list+}, we recommend that you: and avoid large |cidr| blocks. VPC/{+vnet+} Peering -~~~~~~~~~~~~~~~~~~~~ +```````````````````` If you configure |vpc| or {+vnet+} peering, we recommend that you: @@ -308,7 +492,7 @@ These examples also apply other recommended configurations, including: :copyable: true atlas privateEndpoints aws create --region us-east-1 --projectId 5e2211c17a3e5a48f5497de3 --output json - + For more configuration options and information about this example, see: - :ref:`atlas-privateEndpoints-aws-create`, for connections from {+aws+} |vpc|\s @@ -349,7 +533,7 @@ These examples also apply other recommended configurations, including: accessEntryForAddress1.tf ````````````````````````````` - .. include:: /includes/examples/tf-example-access-entry-for-add-1.rst + .. include:: /includes/examples/terraform/tf-example-access-entry-for-add-1.rst After you create the files, navigate to your project directory and run the following command to initialize Terraform: @@ -386,7 +570,7 @@ These examples also apply other recommended configurations, including: vpcConnection.tf ```````````````` - .. include:: /includes/examples/tf-example-vpc-connection.rst + .. include:: /includes/examples/terraform/tf-example-vpc-connection.rst After you create the file, navigate to your project directory and run the following command to initialize Terraform: @@ -423,7 +607,7 @@ These examples also apply other recommended configurations, including: privateLink.tf `````````````` - .. include:: /includes/examples/tf-example-private-link.rst + .. include:: /includes/examples/terraform/tf-example-private-link.rst After you create the file, navigate to your project directory and run the following command to initialize Terraform: diff --git a/source/operational-readiness-checklist.txt b/source/operational-readiness-checklist.txt new file mode 100644 index 00000000..1cd7d7c3 --- /dev/null +++ b/source/operational-readiness-checklist.txt @@ -0,0 +1,323 @@ +.. _arch-center-checklist: + +================================================== +|service-fullname| Operational Readiness Checklist +================================================== + +.. default-domain:: mongodb + +.. contents:: On this page + :local: + :backlinks: none + :depth: 2 + :class: onecol + +This checklist is designed to help you prepare your environment and team +for a successful deployment and operation of |service-fullname|. +Use this checklist to track your progress. For example, you can print it +and check off each item as you complete the tasks. + +Consult the official |service-fullname| documentation for detailed guidance +on each of these aspects. + +Account and Organization Setup +------------------------------- + +.. list-table:: + :header-rows: 1 + :widths: 5 55 + + * - Check + - Action + + * - :icon-fa4:`square-o` + - Create a |service-fullname| account, set up your |service| organizations + according to your internal structure, and configure a root user + with appropriate access. To get recommendations and learn more about + this topic, see :ref:`arch-center-hierarchy`. + + * - :icon-fa4:`square-o` + - Set up projects based on your environment and application needs. + Isolate environments by setting up production and non-production + projects, at a minimum. To get recommendations and learn more about + this topic, see :ref:`arch-center-orgs-projects-clusters-recs`. + + * - :icon-fa4:`square-o` + - Consider cross-prganization billing, if applicable. To get recommendations + and learn more about this topic, see :ref:`arch-center-billing-data`. + +Network and Security Configuration +----------------------------------- + +.. list-table:: + :header-rows: 1 + :widths: 5 55 + + * - Check + - Action + + * - :icon-fa4:`square-o` + - Select cloud providers and regions for your |service| clusters. + Consider data sovereignty requirements and latency. To get + recommendations and learn more about this topic, see + :ref:`arch-center-paradigms`. + + * - :icon-fa4:`square-o` + - Configure network security based on your organization's needs. + To get recommendations and learn more about this topic, see + :ref:`arch-center-auth`. + + * - :icon-fa4:`square-o` + - Choose a network connectivity method: + + - :ref:`Private Endpoints ` + (AWS PrivateLink, Azure Private Link, or + Google Cloud Private Service Connect) for a one-way private + connection from your VPC to |service|. For multi-region clusters, + enable private endpoints in each region. To learn more, + see :ref:`Recommendations for Multi-Region Deployments `. + - :ref:`VPC Peering ` to set up private + secure traffic routing within your network boundaries. + - Public :ref:`IP Access Lists ` to restrict + inbound connections to specific IP addresses or CIDR blocks. + Consider IP Allow lists as an alternative to endpoints if necessary. + + * - :icon-fa4:`square-o` + - Confirm that TLS is enabled for encrypting connections to your + databases. TLS is mandatory and is enabled by default and cannot + be disabled. TLS 1.2+ is the default, which ensures support for + the next version once |service| supports it. To learn more, + see :ref:``. + + * - :icon-fa4:`square-o` + - For on-premises connectivity to |service|, ensure that your organization + completes all the necessary internal processes, which may take time for approvals. + + * - :icon-fa4:`square-o` + - Configure authentication and authorization. To get recommendations and + learn more about this topic, see :ref:`arch-center-auth`. + + - Set up database users and roles with the principle of least privilege. + - Implement Role-Based Access Control (RBAC) to manage access + across all resources. To learn more, see :ref:`arch-center-authorization-recs`. + - Consider setting up Federated Authentication, such as SAML 2.0, + for UI access via identity providers, such as Okta, Entra ID, + Ping Identity, or others. To learn more, see :ref:`arch-center-authorization-recs`. + + - Enforce Multi-Factor Authentication (MFA) for enhanced security. + - Secure |service| API access using API key-based authentication. + Consider regular key rotation. + - For database access in cloud environments, consider Workforce and + Workload Identity Federation, such as OIDC, OAuth 2.0, AWS IAM roles, + or Azure Managed Identities, for passwordless access. + - Consider LDAP Integration for user authentication and authorization. + - Explore using X.509 client certificates for authentication. + + * - :icon-fa4:`square-o` + - Implement robust encryption. To learn more, see :ref:`arch-center-data-encryption`. + + - Encryption at rest is enabled by default using cloud providers' + transparent disk encryption (AES-256). + - Optionally enable "Bring Your Own Key (BYOK)" encryption using + Key Management Service (KMS) providers (AWS KMS, Azure Key Vault, + or GCP KMS). |service| can't rotate customer-managed encryption keys. + - Consider Client-Side Field Level Encryption (CSFLE) to encrypt + data within your application before transmitting it to |service|. + - Explore :manual:`Queryable Encryption ` + for applications that run queries on encrypted data. + + * - :icon-fa4:`square-o` + - Configure database auditing to track database access and actions. + Create custom filters if needed. To get recommendations and learn + more about this topic, see :ref:`arch-center-auditing-logging`. + + * - :icon-fa4:`square-o` + - Be aware of hard-coded certificate authority certificates. Ensure + that you set up your applications in a way that lets you + handle potential CA certificate updates. + + |service| clusters use TLS certificates signed by a widely + trusted Certificate Authority (CA). While applications using + recent MongoDB drivers handle certificate validation automatically, + older applications or those with custom TLS configurations might + require updates to trust the new CA certificates if MongoDB updates + its certificate provider. To learn more, + see :atlas:`Hard-coded certificate authority `. + + + * - :icon-fa4:`square-o` + - Understand and plan for compliance with relevant standards and + regulations, such as ISO/IEC 27001, HIPAA, GDPR, PCI DSS, FedRAMP, + and others. To learn more, see :ref:`arch-center-compliance-atlas-gov`. + +Backup and Restore Strategy +---------------------------- + +.. list-table:: + :widths: 5 55 + + * - Check + - Action + + * - :icon-fa4:`square-o` + - Enable |service| Cloud Backup, which provides localized backup + storage using the cloud provider's native snapshot functionality. + To get recommendations and learn more about this topic, see + :ref:`arch-center-backups`. + + * - :icon-fa4:`square-o` + - Enable Continuous Cloud Backup with a restore window that meets + your Recovery Point Objective (RPO). We recommend having a restore + window of 7 days to allow for Point In Time (PIT) recovery using the oplog. + + * - :icon-fa4:`square-o` + - Define a backup schedule and retention policy that aligns with your + business continuity and compliance requirements. Consider hourly, + daily, weekly, and monthly snapshots with appropriate retention periods. + + * - :icon-fa4:`square-o` + - Consider :ref:`multi-region snapshot distribution ` + for increased resilience by copying snapshots to other geographic regions. + + * - :icon-fa4:`square-o` + - Enable :ref:`Backup Compliance Policy ` to + prevent unauthorized modifications or deletions of backups and comply + with strict data protection requirements. + + * - :icon-fa4:`square-o` + - Understand the process for restoring from scheduled or on-demand snapshots. + To learn more, see :ref:`backup-policy-recommendations`. + + * - :icon-fa4:`square-o` + - Learn about the process for restoring from Continuous Cloud Backup + to a specific point in time. To learn more, see :ref:`backup-policy-recommendations`. + + * - :icon-fa4:`square-o` + - Plan and test your Disaster Recovery (DR) strategy. Understand + Recovery Time Objective (RTO) and Recovery Point Objective (RPO). + Consider testing application's resilience in |service|. + To learn more, see :ref:`arch-center-dr`. + + * - :icon-fa4:`square-o` + - Consider options for downloading and archiving snapshots, if required, + using the {+atlas-ui+}, {+atlas-admin-api+}, or {+atlas-cli+}. + To learn more, see :ref:`arch-center-resiliency`. + +Maintenance and Patching +------------------------ + +.. list-table:: + :widths: 5 55 + + * - Check + - Action + + * - :icon-fa4:`square-o` + - Be aware that |service| deploys major version upgrades in a rolling + manner to minimize downtime. + + * - :icon-fa4:`square-o` + - Define a Maintenance Window for |service| automated systems to apply + automatic minor version updates. Configure the day and hour of + allowed maintenance using the ``mongodbatlas_maintenance_window`` resource. + To learn more, see :ref:`arch-center-resiliency`. + + * - :icon-fa4:`square-o` + - Understand that |service| has non-deferrable maintenance hours for critical security + patches or operational necessities. Configure Protected Hours for your + project and :atlas:`define a daily window ` + when standard updates cannot begin. |service| performs standard updates + that don't involve cluster restarts or resyncs outside of these hours. + +Monitoring and Alerts +--------------------- + +.. list-table:: + :widths: 5 55 + + * - Check + - Action + + * - :icon-fa4:`square-o` + - Use built-in monitoring capabilities in |service| via the Metrics tab + to track cluster health and performance. + + * - :icon-fa4:`square-o` + - Configure alerts for various cluster metrics and events to proactively + identify and respond to potential issues. As a starting point, review + and configure recommended alerts. Consider setting up multiple alerts + for different severity levels. + + * - :icon-fa4:`square-o` + - Integrate |service| monitoring with your existing enterprise + monitoring and observability tools if required. + + * - :icon-fa4:`square-o` + - Familiarize yourself with Performance Advisor, Real-Time Performance Panel (RTPP), + and Query Profiler for performance tuning and optimization. + +To get recommendations and learn more about monitoring performance and alerts, +see :ref:`arch-center-monitoring-alerts`. + +Operational Procedures and Team Readiness +------------------------------------------ + +.. list-table:: + :widths: 5 55 + + * - Check + - Action + + * - :icon-fa4:`square-o` + - Define roles and responsibilities for managing and operating |service-fullname|. + + * - :icon-fa4:`square-o` + - Establish change control and auditability processes. To learn more, + see :ref:`arch-center-auditing-logging`. + + * - :icon-fa4:`square-o` + - Develop a clear Disaster Recovery Process Documentation + specific to your applications and |service| setup. To learn more, + see :ref:`arch-center-dr`. + + * - :icon-fa4:`square-o` + - Ensure your team is trained on |service-fullname| fundamentals, + security best practices, and operational procedures. + Consider MongoDB University and Professional Services for + training and enablement. + + * - :icon-fa4:`square-o` + - Establish a process for engaging with |mdb-support| for + production issues or when MongoDB's access level is required. + + * - :icon-fa4:`square-o` + - Plan for performance improvement using tools like Query Profiler + and Performance Advisor. To learn more, see :ref:`arch-center-monitoring-alerts`. + + * - :icon-fa4:`square-o` + - Define how you will handle data lifecycle management. + Configure archival strategies, such as :manual:`TTL indexes `, + or :ref:`online archive `. + + * - :icon-fa4:`square-o` + - Establish integration strategies with other tooling and + services, such as Datadog, Prometheus, PagerDuty, and other tools. + To learn more, see :ref:`arch-center-monitoring-alerts`. + + * - :icon-fa4:`square-o` + - Consider establishing a MongoDB Center of Excellence (CoE) + within your organization to foster best practices and + knowledge sharing. + +By completing these checklist items, you will enhance your operational +readiness for deploying and managing |service-fullname|. This will ensure +that you set up a reliable, secure, and performant database environment. + +Next Steps +---------- + +Use the left navigation to find features and best practices for each {+waf+} pillar. + +- :ref:`arch-center-monitoring-alerts` +- :ref:`arch-center-network-security` +- :ref:`arch-center-backups` diff --git a/source/performance.txt b/source/performance.txt index 5160a6e9..d00597a7 100644 --- a/source/performance.txt +++ b/source/performance.txt @@ -27,7 +27,16 @@ performance in {+service+}: Use auto-scaling to dynamically allocate resources based on growing workload demands. For horizontal and vertical scaling, use {+service+} deployment templates. + .. card:: + :headline: Latency Reduction + :url: https://mongodb.com/docs/atlas/architecture/latency-reduction/ + :icon: general_features_realtime + :icon-alt: real time icon + + Review best practices for reducing latency. + .. toctree:: :titlesonly: Scalability + Latency Reduction diff --git a/source/release-notes.txt b/source/release-notes.txt index 99707b82..2618ea4b 100644 --- a/source/release-notes.txt +++ b/source/release-notes.txt @@ -15,6 +15,33 @@ This page lists the changes introduced in each new version of the {+atlas-arch-center+}. +v20250411 +--------- + +**Released 11 July, 2025** + +- :ref:`arch-center-landing-zone` page: Minor update to add that the primary is + mapped *randomly* to a zone, and link to new guidance pages. +- :ref:`arch-center-hierarchy` page: Adds organization creation example for Terraform. +- :ref:`arch-center-network-security` page: Minor update to mention limitation for changing the number of service attachments + and recommendation to use the DNS seedlist connection string. +- :ref:`arch-center-compliance` page: Minor update to introduce |atlas-gov| as an |service| compliance option. +- :ref:`arch-center-high-availability` page: Adds guidance for 3 and 5 node replica set topologies. +- :ref:`arch-center-dr` page: Minor update to recommend multi region clusters and electable nodes for high availability, + and link to other pages with existing guidance. +- :ref:`arch-center-cost-saving-config` page: Minor update to link to cost-analysis guidance for different deployment topologies. +- Adds the following new pages: + + - :ref:`arch-center-migration`: Describes how to migrate data from your on-premises MongoDB deployments to |service|. + - :atlas:`Operational Readiness Checklist `: Checklist to help you prepare a {+cluster+} for deployment. + - :ref:`arch-center-paradigms`: Introduces and compares different |service| deployment paradigms, which are described with use-cases in the following new pages: + + - :ref:`arch-center-paradigms-single` + - :ref:`arch-center-paradigms-multi-region` + - :ref:`arch-center-paradigms-global` + - :ref:`arch-center-paradigms-multi-cloud` + - :ref:`arch-center-paradigms-hybrid` + v20250317 --------- diff --git a/source/resiliency.txt b/source/resiliency.txt index 971ed36f..e53f4124 100644 --- a/source/resiliency.txt +++ b/source/resiliency.txt @@ -39,6 +39,10 @@ one of your chosen cloud provider's `availability regions `__ +database availability. + +.. figure:: /includes/images/atlas-self-healing.png + :figwidth: 650px + :alt: Atlas Self-Healing Deployment diagram. + +To learn more about this process, see +`How does MongoDB Atlas deliver high availability? `__ Maintenance Window Uptime ~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -116,8 +127,28 @@ For recommendations on backups, see :ref:`arch-center-backups`. Recommendations for {+service+} Resiliency ------------------------------------------ +.. collapsible:: + :heading: Single-Region Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments in a single region + :expanded: true + + Recommendations here + +.. collapsible:: + :heading: Multi-Region and Multi-Cloud Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments across multiple regions or multiple cloud providers + :expanded: true + + Recommendations here + +All Deployment Paradigm Recommendations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following recommendations apply to all :ref:`deployment paradigms +`. + Use MongoDB 8.0 or Later -~~~~~~~~~~~~~~~~~~~~~~~~ +```````````````````````` To improve the resiliency of your cluster, upgrade your cluster to MongoDB 8.0. MongoDB 8.0 introduces the following performance improvements and new features @@ -134,7 +165,7 @@ related to resilience: Connecting Your Application to |service| -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +```````````````````````````````````````` We recommend that you use a connection method built on the most `current driver version `__ for your application's programming language whenever possible. And while the @@ -151,7 +182,7 @@ analytics job request against the cluster. is particularly important in the context of enterprise level application deployments. Connection Pool Considerations for Performant Applications -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +`````````````````````````````````````````````````````````` Opening database client connections is one of the most resource-intensive processes @@ -170,7 +201,7 @@ doesn't undermine your application's time-sensitive need for increased database operations. Min and Max Connection Pool Size -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +```````````````````````````````` If your ``minPoolSize`` and ``maxPoolSize`` values are similar, the majority of your database client connections open at application startup. For example, if your @@ -196,7 +227,7 @@ only ever be able to open and use one client connection, so your ``minPoolSize`` and your ``maxPoolSize`` should both be set to ``1``. Query Timeout -~~~~~~~~~~~~~ +````````````` Almost invariably, workload-specific queries from your application will vary in terms of the amount of time they take to execute in |service| and in terms of @@ -206,7 +237,7 @@ You can set `query timeout `__ and `retryable write `__ @@ -216,7 +247,7 @@ safeguard against intermittent network outages. .. _resiliency-read-write-concerns: Configure Read and Write Concerns -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +````````````````````````````````` |service| {+clusters+} eventually replicate all data across all nodes. However, you can configure the number of nodes across which data must be replicated before @@ -233,12 +264,12 @@ one node in your cluster .. _arch-center-move-collection: Isolate the Impact of Busy, Unsharded Collections -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +````````````````````````````````````````````````` .. include:: /includes/cloud-docs/move-collection.rst Disaster Recovery -~~~~~~~~~~~~~~~~~ +````````````````` For recommendations on disaster recovery best practices for |service|, see :ref:`arch-center-dr` and :ref:`arch-center-ha-configurations`. diff --git a/source/scalability.txt b/source/scalability.txt index 5b66a864..23f6fffb 100644 --- a/source/scalability.txt +++ b/source/scalability.txt @@ -76,6 +76,26 @@ to enhance your query performance. Additionally, you can leverage intelligent in Recommendations for {+service+} Scalability ------------------------------------------- +.. collapsible:: + :heading: Single-Region Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments in a single region + :expanded: true + + Recommendations here + +.. collapsible:: + :heading: Multi-Region and Multi-Cloud Deployment Recommendations + :sub_heading: Recommendations that apply only to deployments across multiple regions or multiple cloud providers + :expanded: true + + Recommendations here + +All Deployment Paradigm Recommendations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following recommendations apply to all :ref:`deployment paradigms +`. + For development and testing environments, do not enable auto-scaling compute and auto-scaling storage. This saves costs in your non-production environments. @@ -158,13 +178,13 @@ These examples also apply other recommended configurations, including: For your staging and production environments, create the following ``cluster.json`` file for each project. Change the IDs and names to use your values: - .. include:: /includes/examples/cli-json-example-create-clusters-with-autoscaling.rst + .. include:: /includes/examples/cli/cli-json-example-create-clusters-with-autoscaling.rst After you create the ``cluster.json`` file, run the following command for each project. The command uses the ``cluster.json`` file to create a cluster. - .. include:: /includes/examples/cli-example-create-clusters-stagingprod.rst + .. include:: /includes/examples/cli/staging-prod/cli-example-create-clusters-stagingprod.rst For more configuration options and info about this example, see :ref:`atlas-clusters-create`. @@ -212,22 +232,22 @@ These examples also apply other recommended configurations, including: main.tf ``````` - .. include:: /includes/examples/tf-example-main-stagingprod.rst + .. include:: /includes/examples/terraform/staging-prod/tf-example-main-stagingprod.rst variables.tf ```````````` - .. include:: /includes/examples/tf-example-autoscaling-variables.rst + .. include:: /includes/examples/terraform/tf-example-autoscaling-variables.rst terraform.tfvars ```````````````` - .. include:: /includes/examples/tf-example-tfvars-autoscaling-stagingprod.rst + .. include:: /includes/examples/terraform/staging-prod/tf-example-tfvars-autoscaling-stagingprod.rst provider.tf ``````````` - .. include:: /includes/examples/tf-example-provider.rst + .. include:: /includes/examples/terraform/tf-example-provider.rst After you create the files, navigate to each application and environment pair's directory and run the following command to initialize Terraform: diff --git a/source/single-region.txt b/source/single-region.txt new file mode 100644 index 00000000..91f6bfd3 --- /dev/null +++ b/source/single-region.txt @@ -0,0 +1,132 @@ +.. _arch-center-paradigms-single: + +================================= +Single-Region Deployment Paradigm +================================= + +.. default-domain:: mongodb + +.. contents:: On this page + :local: + :backlinks: none + :depth: 2 + :class: onecol + +Single-region {+service+} deployments set up {+cluster+} nodes within +one region of one cloud provider. Single-region +deployments to regions that support availability zones offer +protection in the case of a zonal outage by automatically rerouting +traffic to a node in another availability zone for continuous +availability and a smooth user experience. Single-region {+service+} +deployments are supported on all {+cluster+} tiers, so they include +less expensive options for non-mission-critical deployments, and, when +configured correctly, single-region deployments can support performance +and help meet compliance requirements for :ref:`data sovereignty +`. + +To learn how to configure a single-region deployment, see +:atlas:`Create a Cluster ` in the +{+service+} documentation. + +The following diagram shows a single-region {+service+} +deployment for regions that support availability zones: + +.. figure:: /includes/images/single-region.svg + :figwidth: 500px + :alt: An image showing a three-node deployment in a single region. The region contains one zone per node. + +Use Cases for Single-Region Deployments +--------------------------------------- + +Single-region deployments are best for the following use +cases: + +Region-Specific Applications that Require Low Latency and Zonal High Availability (One Cloud Provider) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To improve performance for critical operations, +all {+service+} deployments allow you to deploy data +close to your users' geographic location, which reduces latency. +You can configure a single-region deployment for +high availability across zones and low latency. + +A single-region deployment may be best for you if you have +the following requirements: + +- You want to use one cloud provider +- You want to deploy to more than one availability zone for high + availability, and you don't need to deploy to more than one region +- Your application requires low latency *and* has a majority of users + in one geographic location, since single-region deployments allow you + to choose your geographic area. + +For example, for an application deployed with |aws| with +users primarily located in the eastern US, you can deploy a +single-region deployment to ``us-east-1`` +(a region that supports availability zones). This +ensures low latency since all nodes are within the eastern US, while +offering high availability if there's a zonal outage that affects the +primary node. + +If your application requires low latency and +cross-region or cross-provider high +availability, consider a :ref:`arch-center-paradigms-multi-region` or :ref:`arch-center-paradigms-multi-cloud`, respectively. + +If your application requires low latency for users in any region across +a global user base, consider a :ref:`arch-center-paradigms-global`. + +Applications that Require Data Sovereignty and Zonal High Availability (One Cloud Provider) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For compliance with data residency laws, data can be partitioned to +reside in a specific region, ensuring adherence to local regulations. +You can configure a single-region deployment for +high zonal availability and data sovereignty. + +A single-region deployment may be best for you if you have +the following requirements: + +- You want to use one cloud provider +- You want to deploy to more than one availability zone for high + availability, and you don't need to deploy to more than one region +- Your application requires data sovereignty, since single-region + deployments allow you to choose your geographic area. + +For example, for an application deployed with |aws| that +requires data storage in Europe, you can deploy a single-region +deployment to a region within the EU (such as +``eu-west-1``, a region that supports availability zones). This +ensures data sovereignty since the region is within the EU, while +offering high availability if there's a zonal outage that affects the +primary node. + +.. include:: /includes/data-sovereignty.rst + +If your application requires data sovereignty and +cross-region or cross-provider high +availability, consider a :ref:`arch-center-paradigms-multi-region` or :ref:`arch-center-paradigms-multi-cloud`, respectively. + +Considerations for Single-Region Deployments +-------------------------------------------- + +Other considerations for single-region deployments include: + +- High availability depends on the deployment of nodes across + regions as well as the number, distribution, and priority order of + nodes. To learn more about recommended + {+cluster+} topologies for high availability, see + :ref:`arch-center-high-availability`. + +For more considerations, see +:atlas:`Considerations +` in the +{+service+} documentation. + +Recommendations for Single-Region Deployments +--------------------------------------------- + +To learn about recommendations that apply single-region deployments, see +the following sections: + +.. include:: /includes/rec-list.rst +