The simplest possible starting point — call an LLM and have a multi-turn conversation.
- How to use the OpenAI SDK with an OpenAI-compatible provider (InfiniAI)
- How to maintain multi-turn conversation history via a
messageslist - How to configure system prompts
User Input ──> messages list ──> LLM API ──> Assistant Response
▲ │
└──────────────────────────────┘
(append to history)
The key data structure is the messages list — an ordered sequence of role-tagged messages (system, user, assistant) that represents the full conversation context sent to the LLM on every call.
pip install -r requirements.txt
export INFINI_API_KEY="your-api-key-here"The default base URL points to InfiniAI (https://cloud.infini-ai.com/maas/v1). Override with:
export INFINI_BASE_URL="https://your-provider.com/v1"List available models:
python chatbot.py --list-modelsStart chatting (default model: deepseek-v3):
python chatbot.pyUse a specific model:
python chatbot.py --model qwen3-32bEvery LLM API call is stateless — the model has no memory. Multi-turn conversation works by sending the entire conversation history each time:
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is Python?"},
{"role": "assistant", "content": "Python is a programming language..."},
{"role": "user", "content": "How do I install it?"}, # new turn
]This is the fundamental pattern that all LLM applications build upon. In later steps, we'll need to manage this list carefully as conversations grow long.
The OpenAI SDK can talk to any provider that implements the same API format. We just change base_url:
client = OpenAI(api_key="...", base_url="https://cloud.infini-ai.com/maas/v1")This means our code works with OpenAI, InfiniAI, Ollama, vLLM, and many others.
In Step 02, we'll add token counting and context window management — essential infrastructure before we start building agent capabilities.