EMR Serverless Samples

This repository contains example code for getting started with EMR Serverless and using it with Apache Spark and Apache Hive.

In addition, it provides Container Images for both the Spark History Server and Tez UI in order to debug your jobs.

For full details about using EMR Serverless, please see the EMR Serverless documentation.

Pre-Requisites

These demos assume you are using an Administrator-level role in your AWS account

Amazon EMR Serverless is currently in preview. Please follow the sign-up steps at https://pages.awscloud.com/EMR-Serverless-Preview.html to request access.
Create an Amazon S3 bucket in the us-east-1 region

aws s3 mb s3://BUCKET-NAME --region us-east-1

Create an EMR Serverless execution role (replacing BUCKET-NAME with the one you created above)

This role provides both S3 access for specific buckets as well as full read and write access to the Glue Data Catalog.

aws iam create-role --role-name emr-serverless-job-role --assume-role-policy-document '{
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Principal": {
          "Service": "emr-serverless.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
      }
    ]
  }'

aws iam put-role-policy --role-name emr-serverless-job-role --policy-name S3Access --policy-document '{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ReadFromOutputAndInputBuckets",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::noaa-gsod-pds",
                "arn:aws:s3:::noaa-gsod-pds/*",
                "arn:aws:s3:::BUCKET-NAME",
                "arn:aws:s3:::BUCKET-NAME/*"
            ]
        },
        {
            "Sid": "WriteToOutputDataBucket",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::BUCKET-NAME/*"
            ]
        }
    ]
}'

aws iam put-role-policy --role-name emr-serverless-job-role --policy-name GlueAccess --policy-document '{
    "Version": "2012-10-17",
    "Statement": [
      {
        "Sid": "GlueCreateAndReadDataCatalog",
        "Effect": "Allow",
        "Action": [
            "glue:GetDatabase",
            "glue:GetDataBases",
            "glue:CreateTable",
            "glue:GetTable",
            "glue:GetTables",
            "glue:GetPartition",
            "glue:GetPartitions",
            "glue:CreatePartition",
            "glue:BatchCreatePartition",
            "glue:GetUserDefinedFunctions"
        ],
        "Resource": ["*"]
      }
    ]
  }'

Examples

EMR Serverless PySpark job

This sample script shows how to use EMR Serverless to run a PySpark job that analyzes data from the open NOAA Global Surface Summary of Day dataset.
Python Dependencies

Shows how to package Python dependencies (Great Expectations) using a Virtualenv and venv-pack.
Genomics analysis using Glow

This sample shows how to use EMR Serverless to combine both Python and Java dependencies in order to run genomic analysis using Glow and 1000 Genomes.
EMR Serverless Hive query

This sample script shows how to use Hive in EMR Serverless to query the same NOAA data.

SDK Usage

During the preview, additional artifacts are required in order to access the EMR Serverless API. The examples below show how to do this.

Utilities

Spark UI

You can use this Dockerfile to run Spark history server in your container.
Tez UI

You can use this Dockerfile to run Tez UI and Application Timeline Server in your container.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
examples		examples
utilities		utilities
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EMR Serverless Samples

Pre-Requisites

Examples

SDK Usage

Utilities

Security

License

About

Releases

Packages

Languages

License

syedahsn/emr-serverless-samples

Folders and files

Latest commit

History

Repository files navigation

EMR Serverless Samples

Pre-Requisites

Examples

SDK Usage

Utilities

Security

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages