Unity Catalog: Open, Multimodal Catalog for Data & AI

Unity Catalog is the industry’s only universal catalog for data and AI.

Multimodal interface supports any format, engine, and asset
- Multi-format support: It is extensible and supports Delta Lake, Apache Iceberg and Apache Hudi via UniForm, Apache Parquet, JSON, CSV, and many others.
- Multi-engine support: With its open APIs, data cataloged in Unity can be read by many leading compute engines.
- Multimodal: It supports all your data and AI assets, including tables, files, functions, AI models.
Open source API and implementation - OpenAPI spec and OSS implementation (Apache 2.0 license). It is also compatible with Apache Hive's metastore API and Apache Iceberg's REST catalog API. Unity Catalog is currently a sandbox project with LF AI and Data Foundation (part of the Linux Foundation).
Unified governance for data and AI - Govern and secure tabular data, unstructured assets, and AI assets with a single interface.

The first release of Unity Catalog focuses on a core set of APIs for tables, unstructured data, and AI assets - with more to come soon on governance, access, and client interoperability. This is just the beginning!

Vibrant ecosystem

This is a community effort. Unity Catalog is supported by

Unity Catalog is proud to be hosted by the LF AI & Data Foundation.

Quickstart - Hello UC!

Let's take Unity Catalog for spin. In this guide, we are going to do the following:

In one terminal, run the UC server.
In another terminal, we will explore the contents of the UC server using a CLI. An example project is provided to demonstrate how to use the UC SDK for various assets as well as provide a convenient way to explore the content of any UC server implementation.

Prerequisites

You have to ensure that your local environment has the following:

Clone this repository.
Ensure the JAVA_HOME environment variable your terminal is configured to point to JDK17.
Compile the project using build/sbt package

If you prefer to run this using the Unity Catalog Dockerized Environment, please refer to the Docker README.md

Run the UC Server

In a terminal, in the cloned repository root directory, start the UC server.

bin/start-uc-server

For the remaining steps, continue in a different terminal.

Operate on Delta tables with the CLI

Let's list the tables.

bin/uc table list --catalog unity --schema default

You should see a few tables. Some details are truncated because of the nested nature of the data. To see all the content, you can add --output jsonPretty to any command.

Next, let's get the metadata of one of those tables.

bin/uc table get --full_name unity.default.numbers

You can see that it is a Delta table. Now, specifically for Delta tables, this CLI can print a snippet of the contents of a Delta table (powered by the Delta Kernel Java project). Let's try that.

bin/uc table read --full_name unity.default.numbers

Operate on Delta tables with DuckDB

For operating on tables with DuckDB, you will have to install it (at least version 1.0). Let's start DuckDB and install a couple of extensions. To start DuckDB, run the command duckdb in the terminal. Then, in the DuckDB shell, run the following commands:

install uc_catalog from core_nightly;
load uc_catalog;
install delta;
load delta;

If you have installed these extensions before, you may have to run update extensions and restart DuckDB for the following steps to work.

Now that we have DuckDB all set up, let's try connecting to UC by specifying a secret.

CREATE SECRET (
      TYPE UC,
      TOKEN 'not-used',
      ENDPOINT 'http://127.0.0.1:8080',
      AWS_REGION 'us-east-2'
 );

You should see it print a short table saying Success = true. Then we attach the unity catalog to DuckDB.

ATTACH 'unity' AS unity (TYPE UC_CATALOG);

Now we are ready to query. Try the following:

SHOW ALL TABLES;
SELECT * from unity.default.numbers;

You should see the tables listed and the contents of the numbers table printed. To quit DuckDB, press Ctrl+D (if your platform supports it), press Ctrl+C, or use the .exit command in the DuckDB shell.

CLI tutorial

You can interact with a Unity Catalog server to create and manage catalogs, schemas and tables, operate on volumes and functions from the CLI, and much more. See the cli usage for more details.

APIs and Compatibility

Open API specification: The Unity Catalog Rest API is documented here.
Compatibility and stability: The APIs are currently evolving and should not be assumed to be stable.

Deployment

To create a tarball that can be used to deploy the UC server or run the CLI, run the following:
```
build/sbt createTarball
```
This will create a tarball in the target directory. See the full deployment guide for more details.

Compiling and testing

Install JDK 17 by whatever mechanism is appropriate for your system, and set that version to be the default Java version (e.g. via the env variable JAVA_HOME)
To compile all the code without running tests, run the following:
```
build/sbt clean compile
```
To compile and execute tests, run the following:
```
build/sbt clean test
```
To execute tests with coverage, run the following:
```
build/sbt jacoco 
```
To update the API specification, just update the api/all.yaml and then run the following:
```
build/sbt generate
```
This will regenerate the OpenAPI data models in the UC server and data models + APIs in the client SDK.
To format the code, run the following:
```
build/sbt javafmtAll
```

Setting up IDE

IntelliJ is the recommended IDE to use when developing Unity Catalog. The below steps outline how to add the project to IntelliJ:

Clone Unity Catalog into a local folder, such as ~/unitycatalog.
Select File > New Project > Project from Existing Sources... and select ~/unitycatalog.
Under Import project from external model select sbt. Click Next.
Click Finish.

Java code adheres to the Google style, which is verified via build/sbt javafmtCheckAll during builds. In order to automatically fix Java code style issues, please use build/sbt javafmtAll.

Configuring Code Formatter for Eclipse/IntelliJ

Follow the instructions for Eclipse or IntelliJ to install the google-java-format plugin (note the required manual actions for IntelliJ).

Using more recent JDKs

The build script checks for a lower bound on the JDK but the current SBT version imposes an upper bound. Please check the JDK compatibility documentation for more information

Serving the documentation with mkdocs

Create a virtual environment:

# Create virtual environment
python -m venv uc_docs_venv

# Activate virtual environment (Linux/macOS)
source uc_docs_venv/bin/activate

# Activate virtual environment (Windows)
uc_docs_venv\Scripts\activate

Install the required dependencies:

pip install -r requirements-docs.txt

Then serve the docs with

mkdocs serve

Name		Name	Last commit message	Last commit date
Latest commit History 161 Commits
.github		.github
api		api
bin		bin
build		build
connectors/spark		connectors/spark
dev		dev
docker		docker
docs		docs
etc		etc
examples/cli/src		examples/cli/src
project		project
server/src		server/src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
build.sbt		build.sbt
mkdocs.yml		mkdocs.yml
requirements-docs.txt		requirements-docs.txt
uc-cli.dockerfile		uc-cli.dockerfile
version.sbt		version.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unity Catalog: Open, Multimodal Catalog for Data & AI

Vibrant ecosystem

Quickstart - Hello UC!

Prerequisites

Run the UC Server

Operate on Delta tables with the CLI

Operate on Delta tables with DuckDB

CLI tutorial

APIs and Compatibility

Deployment

Compiling and testing

Setting up IDE

Configuring Code Formatter for Eclipse/IntelliJ

Using more recent JDKs

Serving the documentation with mkdocs

About

Releases

Packages

Languages

License

nijanthanvijayakumar/unitycatalog

Folders and files

Latest commit

History

Repository files navigation

Unity Catalog: Open, Multimodal Catalog for Data & AI

Vibrant ecosystem

Quickstart - Hello UC!

Prerequisites

Run the UC Server

Operate on Delta tables with the CLI

Operate on Delta tables with DuckDB

CLI tutorial

APIs and Compatibility

Deployment

Compiling and testing

Setting up IDE

Configuring Code Formatter for Eclipse/IntelliJ

Using more recent JDKs

Serving the documentation with mkdocs

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages