Skip to content

[Announcement] Major Update - The future of SocketDB #79

@TimoBechtel

Description

@TimoBechtel

The future of SocketDB

TL;DR: It's not dead. I'm working on major API changes.

I am currently doing a survey on that will influence the development of the new API: https://awesomedx.com


SocketDB will undergo drastic API changes in the next major version. It will probably jump to version 10 (and skip v9) to make this even more clear.

Reasoning

The new API will address a lot of major shortcomings of the current API, like:

Limitations of the Server API

The client-side API offers flexibility and full type safety, while the server-side API lacks this safety and imposes more restrictions. Data manipulation involves repetitive code. For instance, to modify a specific path, you must first retrieve the data from that path, update it, and then save the entire data structure to the store. This process is not only inefficient but also prone to errors.

The plugin system is beneficial for plugins, but lacks type safety and requires manual integration with the server lifecycle. Implementing custom logic on the server-side, like data validation, is complicated and prone to errors. This is especially challenging due to the encapsulation of data in the Node object.

Furthermore, the current design doesn't support server-side logic like events or triggers, making the implementation of complex logic, particularly authorization for specific data paths, more difficult.

Lack of Persistence Support

The current design of SocketDB does not provide any support for data persistence. While the plugin system can be used to implement persistence, it requires a lot of boilerplate code and is not very efficient.

Limits of the Data Structure and Update Mechanism

The hierarchical data structure used in SocketDB is not suitable for all applications. For example, if you want to store a list of users, you need to create a separate node and generate an ID for each user manually.
Also, there is no differentiation between creating a new node and updating an existing one. This is error prone, as another user might already have deleted the node you are trying to update, which results in a partial update with the current implementation.
Wrapping data nodes in a Node object also adds unnecessary overhead and makes it difficult to access the data directly, while adding metadata to the node does not provide any real benefit. Removing the Node object would simplify the data structure, reduce network and storage overhead, and make it easier to access the data directly.

Proposed Solution

The following solution aims to address the major identified limitations and provide new features to improve the SocketDB library, with a specific focus on data validation, server logic support, and improved data persistence.

Customizing Types

The current implementation of SocketDB allows overriding a few type definitions, like the Session Context (which could be used for authentication).
To customize this type, it is currently required to create a new type definition and override the default one by using type augmentation:

interface CustomContext {
	id: string;
	role: 'admin' | 'user';
}

declare module '@socketdb/server' {
	// override the default SessionContext type
	interface SessionContext extends CustomContext {}
}

This is a bit verbose and is more like a workaround than a proper solution. To improve this, we allow the user to initialize a custom SocketDB instance with custom type definitions:

const app = init().withContext<CustomContext>(); // extends the default Context type

Now we have a type-safe SocketDB instance that can be used in both the client and server-side code.

Data Validation

SocketDB should offer first-class support for data validation. This means that the user should be able to define a schema for the data and validate it on both the client and server-side.
To make this easier and also allow for automatic type inference, we can use a validation library like Zod. This allows us to define a schema for the data and validate it on both the client and server-side automatically.

SocketDB then provides methods to create a schema.

const schema = app.schema({
	appName: z.string(),
	sessions: app.collection({
		name: z.string(),
		// it also supports custom validation, so we can use any validation library
		access: validator<'public' | 'private'>((value) => {
			if (value !== 'public' && value !== 'private') {
				throw new Error('Invalid access type');
			}
		}),
		users: app.collection(
			z.object({
				username: z.string(),
			})
		),
	}),
});

Separating the schema into collections has a few benefits:

  • It allows the user to separate the schema into multiple files, making it easier to manage.
    e.g.

    // users.ts
    export const users = app.collection({
    	username: z.string(),
    });
    
    // schema.ts
    const schema = app.schema({
    	sessions: app.collection({
    		name: z.string(),
    		users: users,
    	}),
    });
  • It allows the user to define custom logic for each collection, allowing for more granular control over the data. Because we use the customized app instance, we have access to the correctly typed context, as well as the data schema of this collection.
    e.g.

    const schema = app.schema({
    	sessions: app.collection({
    		name: z.string(),
    		// we use the user collection with custom middleware
    		users: users.use({
    			create: async ({ data, ctx }) => {
    				if (ctx.user.role !== 'admin') {
    					throw new SocketDBError({
    						code: 'unauthorized',
    						message: 'Only admins can create users',
    					});
    				}
    			},
    		}),
    	}),
    });

Persistency Support

With the current implementation of SocketDB, it is quite difficult to implement persistence. The plugin system can be used to implement persistence, but it requires a lot of boilerplate code and is not very efficient.
Instead, the new SocketDB API provides a persistence adapter that can be used to implement persistence. The persistence adapter is a simple interface that allows the user to implement custom logic for fetching and writing data. It handles throttling and batching of writes , as well as fetching data when it becomes stale.

We can add a staleTime to automatically re-sync data when it becomes stale, to allow syncing to external changes (outside of SocketDB).

To persist data, you can now add a persistence adapter to individual collections:

const users = collection(
	z.object({
		username: z.string(),
	})
).withPersistenceAdapter({
	staleTime: 1000 * 60 * 5, // how long to wait until the data is considered stale
	writeInterval: 1000 * 60, // how often to write to the db (throttling)

	// allows the user to initialize the store,
	// e.g. if the database is not too large, we can preload everything
	init: async () => {
		return await prisma.user.findMany();
	},

	// these functions implement the fetching logic, when the data becomes stale or is not yet loaded
	fetchAll: async () => {
		return await prisma.user.findMany();
	},
	fetchMany: async ({ ids }) => {
		return await prisma.user.findMany({
			where: {
				socketdbId: {
					in: ids,
				},
			},
		});
	},

	// these functions implement the writing logic
	// writes are batched and throttled to reduce the number of writes
	createMany: async ({ items }) => {
		await prisma.user.createMany({
			data: items.map((item) => ({
				...item,
				socketdbId: item.id,
			})),
		});
	},
	updateMany: async ({ items }) => {
		// updating logic...
	},
	deleteMany: async ({ ids }) => {
		// deleting logic...
	},
});

This may seem verbose, but it allows for a lot of flexibility and customization. Also, by adding a persistence adapter to individual collections, we have full type safety and can use the correct data schema for each collection. This also allows us to use different persistence adapters for different collections.

In the example above, using Prisma as database ORM, we could add a prisma adapter as abstraction layer to reduce the amount of boilerplate code:

const users = collection(
	z.object({
		username: z.string(),
	})
).withPersistenceAdapter(prismaAdapter(prisma.user));

Updated API for Data Manipulation

Using a schema with clearly defined collections allows us to create a more straightforward API for data manipulation. Instead of having to use the set method for both creating and updating data, we can now separate these two operations into create and update methods. This makes it safer to use and allows more granular control over the data.

Using a JavaScript Proxy, we can also remove the need to use the get method to query data. This allows us to use the dot notation to access data, which is less verbose.

For example, instead of this:

// current way of manipulating data
db.get('sessions').get(sessionId).get('users').get(generateId()).set({
	username: 'test',
});

We can now do this:

db.sessions.get(sessionId).users.create({
	username: 'test',
});

This has the following benefits:

  • It hides the ID generation logic, as we now know whether we are creating or updating data. This allows us to validate the ID on the server side and prevents overwriting existing data by accident.
  • It allows us to attach different methods to collections and data nodes, making it more expressive and easier to use.
  • It removes boilerplate code, as we no longer have to use the get method to access data.

Client Usage Example

Instead of passing the complete schema object to the client, we just export the schema type. This prevents exposing server code to the client.

// schema.ts
export type Schema = typeof schema;

The client can then use this type to provide full type safety and auto-completion.

const client = createWSClient<Schema>();

client.db.sessions.create({
	name: 'test',
	access: 'public',
	users: [
		// creates an item in the nested users collection as well
		{
			username: 'BlazeRunner39',
		},
	],
});

// if the user or the session does not exist, it will not update anything
client.db.sessions.get(sessionId).users.create({
	username: 'SwiftStorm23',
});

client.db.sessions.each((session) => {
	session.users.each((user) => {
		user.username.on((username) => {
			// logs the username for each user when it changes or when a new user is added
			console.log(username);
		});
	});
});

client.db.sessions.get(sessionId).users.get(userId).delete();

// this will throw a type error, because we cannot delete fields of a collection item
client.db.sessions.get(sessionId).name.delete();

// this however will work:
client.db.sessions.get(sessionId).delete();

Server Procedures

In addition to data validation, SocketDB should also provide first-class support for server-side logic. This means that the user should be able to define procedures on the server-side and call them from the client-side.

This api is heavily inspired by TRPC, a library that provides a similar API for RPC calls.

Defining Procedures

To create a new procedure, we can use the procedure function that is provided by the data schema we created earlier. Because this is a method of the schema, we have the correct type definitions for the context and data schema.

You can abstract the procedure function to make it easier to use:

const publicProcedure = schema.procedure;

Then we can build procedures using a builder patter that allows us to add input using a schema, like a zod schema, and add hooks for the procedure. Using this pattern allows us to compose procedures and add hooks at any point in the chain. It is very flexible, type-safe, easy to use, and allows for composition.

const procedures = app.procedures({
	/**
	 * Adds a new user to the session with the least amount of users
	 */
	addUser: publicProcedure
		.input(
			z.object({
				username: z.string(),
			})
		)
		.mutation(async ({ input, ctx, api }) => {
			const sessions = await api.sessions.fetchAll();
			const smallestSession = sessions.reduce((smallest, session) => {
				if (session.data.users.length < smallest.data.users.length) {
					return session;
				}
				return smallest;
			}, sessions[0]);

			const user = await api.sessions
				.get(smallestSession.id)
				.users.create({ username: input.username });

			return user;
		}),
	// procedures can also be nested
	utils: app.procedures({
		getNumberOfUsers: publicProcedure.query(async ({ api }) => {
			const sessions = await api.sessions.fetchAll();
			return sessions.reduce((total, session) => {
				return total + session.data.users.length;
			}, 0);
		}),
	}),
});

Note: Because we store data in-memory, accessing the data is very fast. So in this example, we don't need to worry about performance too much. With a very large dataset, this might not be the best solution, though.

Middleware

Because we use a builder pattern, it is easy to compose procedures and create abstractions. For example, we can create a middleware function that checks if the user is authenticated and then use it in multiple procedures:

const authenticatedProcedure = publicProcedure.use(({ ctx }) => {
	if (!ctx.user) {
		throw new SocketDBError({
			code: 'unauthorized',
			message: 'You need to be authenticated to perform this action',
		});
	}
});

We can even add input to a middleware function, allowing us to create reusable procedures:

const sessionProcedure = authenticatedProcedure
	.input(
		z.object({
			sessionId: z.string(),
		})
	)
	.use(({ ctx, input, db }) => {
		const session = await db.sessions.get(input.sessionId).fetch();
		if (!session) {
			throw new SocketDBError({
				code: 'not-found',
				message: 'Session not found',
			});
		}

		// add the session to the context
		return {
			...ctx,
			session: {
				id: input.sessionId,
				...session,
			},
		};
	});

// Now when we use the procedure later, we can be sure that the session exists
// and we automatically have access to it in the context, with the correct type.
const procedures = app.procedures({
	createUser: sessionProcedure
		.input(
			z.object({
				username: z.string(),
			})
		)
		.mutation(async ({ input, ctx, db }) => {
			const user = await db.sessions
				// the session is now available in the context
				.get(input.sessionId)
				.users.create({ username: input.username });

			return user;
		}),
});

// this will give us type safety in the client-side code later, e.g.
await client.createUser({ sessionId: id, username: 'test' });

Calling Procedures

To call a procedure from the client-side, we can use the withProcedures method that is provided by the client. This method accepts the procedures type and returns a client instance with the procedures type.

We only export the procedures type to prevent exposing server code to the client.

// procedures.ts
export type ServerProcedures = typeof procedures;

In the client-side code, we pass the procedures type to the client:

const client = createWSClient<Schema>().withProcedures<ServerProcedures>();

Alternatively, if we want to share the procedures between the client and server to add support for optimistic updates, we can also pass the complete procedures object to the client instead of just the type. This will then run the procedures on the client-side as well.

const client = createWSClient<Schema>()
	.withProcedures<ServerProcedures>() // only calls procedures on the server-side
	.withProcedures(sharedProcedures); // runs procedures on the client-side as well

Now we can call the procedures from the client-side:

// mutations
const { id } = await client.addUser({ username: 'test' });
await client.createUser({ sessionId: sessionId, username: 'test' });

// query, updates automatically when the data changes
client.utils.getNumberOfUsers().on((numberOfUsers) => {
	console.log(numberOfUsers);
});

Batching & Merging

One of the main goals of SocketDB is making data synchronization efficient by reducing the number of messages sent while also reducing the amount of data that needs to be sent. Remote procedure calls are collected and sent in batches to reduce the number of messages sent. However, this can be improved even further by batching multiple procedure calls into a single procedure call.
This might not always be what the user wants, so SocketDB provides an additional method to enable batching to be turned on or off for specific procedure calls.

Let's say we want to update a username using a remote procedure.

const updateUsername = sessionProcedure
	.input(
		z.object({
			userId: z.string(),
			username: z.string(),
		})
	)
	.mutation(async ({ input, api }) => {
		await api.sessions.get(input.sessionId).users.get(input.userId).update({
			username: input.username,
		});
	});

If we call this procedure multiple times, it will send multiple messages to the server. However, we do not need to send multiple events, as we only need to send the last update. To solve this, we can use the batchedInput method to allow merging input parameters.

const updateUsername = sessionProcedure
	.input({
		userId: z.string(),
	})
	.batchedInput({
		username: z.string(),
	})
	.mutation(async ({ input, api }) => {
		await api.sessions.get(input.sessionId).users.get(input.userId).update({
			username: input.username,
		});
	});

// this can then be called like this:
client.updateUsername({ userId: '1' }, { username: 'test' });

When this procedure is called multiple times, it will merge all input parameters marked as batchedInput into a single object, reducing the number of messages sent. However, if any other non-batched input parameter differs, it will send a separate message for that.

To make this a bit more clear, let's look at an example:

// the following three calls will be merged into a single call, as they have the same userId
client.updateUsername({ userId: '1' }, { username: 'A' });
client.updateUsername({ userId: '1' }, { username: 'B' });
client.updateUsername({ userId: '1' }, { username: 'C' });

// the following will result in a separate call, as the userId is different
client.updateUsername({ userId: '2' }, { username: 'D' });

// the following will be merged again, as the userId is the same
client.updateUsername({ userId: '3' }, { username: 'E' });
client.updateUsername({ userId: '3' }, { username: 'F' });

// So all of the above calls will send the following events:
// updateUsername, { userId: '1', username: 'C' }
// updateUsername, { userId: '2', username: 'D' }
// updateUsername, { userId: '3', username: 'F' }

So, what's next?

While the API is now significantly improved, it still requires a lot of work to implement the new API and integrate it into the existing SocketDB library. This is a lot of work and will take some time to complete. However, I think it is worth it, as it will make SocketDB a lot more powerful and easier to use.

I am currently doing a survey on API design that will influence the development of the new API.
If you're interested: https://awesomedx.com

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions