codeLab is a local chat app built around a FastAPI server and a small CLI client. The server loads google/gemma-4-E4B-it, keeps per-chat history on disk, and exposes a simple HTTP API. The client connects to that server, authenticates with a local password, and provides a text-based workflow for creating, resuming, and reviewing chats.
codelabServer.py- FastAPI server, model loading, chat persistence, and API endpoints.codelab.py- CLI client for starting and continuing chats.codelabServer.bat- Windows launcher for the server with default runtime settings.codelab.bat- Windows launcher for the client.chats/- Saved chat JSON files.
- Windows with an NVIDIA GPU is expected by the current server configuration.
- Python environment with the project dependencies installed.
- Access to the Hugging Face model
google/gemma-4-E4B-it. - A
.venvfolder in the repository root if you want to use the included batch files as-is.
The repository includes a pinned requirements.txt generated from the current environment.
The scripts import the following key packages:
fastapiuvicornrequeststorchtransformersbitsandbytespydantic
- Create and activate a virtual environment if you do not already have one.
- Install the required packages from requirements.txt or install them manually.
- Make sure the model can be downloaded from Hugging Face on first run.
Example installation command:
pip install -r requirements.txtIf PyTorch needs a CUDA-specific build, follow the official PyTorch install instructions for your GPU and CUDA version.
Start the server first, then launch the client.
python codelabServer.pyOn Windows, you can also run codelabServer.bat, which uses the repository's .venv interpreter and sets the default environment variables.
python codelab.pyOr use codelab.bat on Windows.
The repository does not install itself on your system PATH by default. The included batch files use absolute paths to the local virtual environment, so they work when you run them from this folder.
If you want to launch codeLab from any terminal window, add the repository root to your user PATH and restart the terminal session. On Windows PowerShell, one way to do that is:
$repo = "C:\path\to\codeLab"
[Environment]::SetEnvironmentVariable("Path", $env:Path + ";" + $repo, "User")After that, you can run the batch files from any directory by calling them with the full path or by creating your own wrapper command.
When the client starts, it checks the server, asks for the local password, and then shows the available commands.
Supported commands at the selection prompt:
new- create a new chat.list- show chats for the current working directory.<chat-name>- open an existing chat by name.- a number - choose a chat from the displayed list.
exitorquit- close the client.
Inside a chat:
- type a message to send it to the model.
historyshows the full saved conversation.new-chatswitches to a new chat.exitorquitends the session.
Chats are scoped to the directory you launch the client from. The client stores that scope in a hidden .codelab folder in the working directory and prefixes chat IDs with the current folder name.
The server listens on http://127.0.0.1:9000 and exposes these endpoints:
GET /- health/status information, including model, device, and runtime config.POST /new-chat- create a new chat.POST /chat- send a prompt and receive a generated response.GET /chats- list saved chats.GET /chats/{chat_id}- fetch one chat.DELETE /chats/{chat_id}- delete one chat.
Example request to create a chat:
{
"chat_id": "codeLab__demo",
"title": "demo"
}Example request to send a message:
{
"chat_id": "codeLab__demo",
"text": "Hello",
"max_tokens": 256,
"temperature": 0.7
}Chats are stored as JSON files in chats/ next to codelabServer.py. Each saved chat includes:
idtitlecreated_atupdated_atmessages
The server writes files atomically to reduce the risk of losing data during interrupted saves. The repository's .gitignore excludes generated chat files, local virtual environments, and app runtime folders so your working tree stays clean.
The server and client read a few environment variables that control runtime behavior:
CODELAB_DEFAULT_MAX_TOKENS- default token limit for generation.CODELAB_MAX_HISTORY_MESSAGES- number of recent messages passed into the model.CODELAB_MAX_ALLOWED_TOKENS- hard cap for generation length.CODELAB_GPU_VRAM_LIMIT_GB- optional GPU memory budget for model loading.CODELAB_MODEL_DTYPE- model precision, usuallyfloat16orbfloat16.
The Windows launchers set conservative defaults for these values. The client also uses a fixed password of 1234 in the current code.
- If the client says the server is not running, start
codelabServer.pyfirst. - If generation times out, use a shorter prompt or lower the token limit.
- If the model hits GPU memory limits, lower
CODELAB_DEFAULT_MAX_TOKENSor setCODELAB_GPU_VRAM_LIMIT_GB. - If chats do not appear where you expect, remember that the client scopes them to the directory it was launched from.
- The server currently loads
google/gemma-4-E4B-itat startup, so the first boot can take time. - Chat history is truncated before generation to keep responses responsive.
- The current implementation expects a local, trusted environment rather than multi-user authentication.