@@ -7,70 +7,88 @@ WebLLM Javascript SDK
77 :local:
88 :depth: 2
99
10- `WebLLM <https://www.npmjs.com/package/@mlc-ai/web-llm >`_ is an MLC chat web runtime
11- that allows you to build chat applications directly in the browser, leveraging
12- `WebGPU <https://www.w3.org/TR/webgpu/ >`_ and providing users a natural layer of abstraction.
10+ `WebLLM <https://www.npmjs.com/package/@mlc-ai/web-llm >`_ is a high-performance in-browser LLM
11+ inference engine, aiming to be the backend of AI-powered web applications and agents.
1312
14- Try out the Prebuilt Webpage
15- ----------------------------
13+ It provides a specialized runtime for the web backend of MLCEngine, leverages
14+ `WebGPU <https://www.w3.org/TR/webgpu/ >`_ for local acceleration, offers OpenAI-compatible API,
15+ and provides built-in support for web workers to separate heavy computation from the UI flow.
16+
17+ Please checkout the `WebLLM repo <https://github.com/mlc-ai/web-llm >`__ on how to use WebLLM to build
18+ web application in Javascript/Typescript. Here we only provide a high-level idea and discuss how to
19+ use MLC-LLM to compile your own model to run with WebLLM.
1620
17- To get started, you can try out `WebLLM prebuilt webpage <https://webllm.mlc.ai/#chat-demo >`__.
21+ Getting Started
22+ ---------------
1823
19- A WebGPU-compatible browser and a local GPU are needed to run WebLLM.
24+ To get started, try out `WebLLM Chat <https://chat.webllm.ai/ >`__, which provides a great example
25+ of integrating WebLLM into a full web application.
26+
27+ A WebGPU-compatible browser is needed to run WebLLM-powered web applications.
2028You can download the latest Google Chrome and use `WebGPU Report <https://webgpureport.org/ >`__
2129to verify the functionality of WebGPU on your browser.
2230
31+ WebLLM is available as an `npm package <https://www.npmjs.com/package/@mlc-ai/web-llm >`_ and is
32+ also CDN-delivered. Try a simple chatbot example in
33+ `this JSFiddle example <https://jsfiddle.net/neetnestor/4nmgvsa2/ >`__ without setup.
34+
35+ You can also checkout `existing examples <https://github.com/mlc-ai/web-llm/tree/main/examples >`__
36+ on more advanced usage of WebLLM such as JSON mode, streaming, and more.
2337
24- Use WebLLM NPM Package
25- ----------------------
38+ Model Records in WebLLM
39+ -----------------------
2640
27- WebLLM is available as an ` npm package <https://www.npmjs.com/package/@mlc-ai/web-llm >`_.
28- The source code is available in ` the WebLLM repo < https://github.com/mlc-ai/web-llm >`_,
29- where you can make your own modifications and build from source .
41+ Each of the model in ` WebLLM Chat <https://chat.webllm.ai >`__ is registered as an instance of
42+ `` ModelRecord `` and can be accessed at
43+ ` webllm.prebuiltAppConfig.model_list < https://github.com/mlc-ai/web-llm/blob/main/src/config.ts#L293 >`__ .
3044
31- Note that the `WebLLM prebuilt webpage <https://webllm.mlc.ai/#chat-demo >`__ above
32- is powered by the WebLLM npm package, specifically with the code in
33- the `simple-chat <https://github.com/mlc-ai/web-llm/tree/main/examples/simple-chat >`__ example.
45+ Looking at the most straightforward example `get-started <https://github.com/mlc-ai/web-llm/blob/main/examples/get-started/src/get_started.ts >`__,
46+ there are two ways to run a model.
3447
35- Each of the model in the `WebLLM prebuilt webpage <https://webllm.mlc.ai/#chat-demo >`__
36- is registered as an instance of ``ModelRecord ``. Looking at the most straightforward example
37- `get-started <https://github.com/mlc-ai/web-llm/blob/main/examples/get-started/src/get_started.ts >`__,
38- we see the code snippet:
48+ One can either use the prebuilt model by simply calling ``reload() `` with the ``model_id ``:
3949
4050.. code :: typescript
4151
42- const myAppConfig: AppConfig = {
52+ const selectedModel = " Llama-3-8B-Instruct-q4f32_1-MLC" ;
53+ const engine = await webllm .CreateMLCEngine (selectedModel );
54+
55+ Or one can specify their own model to run by creating a model record:
56+
57+ .. code :: typescript
58+
59+ const appConfig: webllm .AppConfig = {
4360 model_list: [
4461 {
45- " model_url" : " https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f32_1-MLC/resolve/main/" ,
46- " local_id" : " Llama-2-7b-chat-hf-q4f32_1" ,
47- " model_lib_url" : " https://raw.githubusercontent.com/mlc-ai/binary-mlc-llm-libs/main/Llama-2-7b-chat-hf/Llama-2-7b-chat-hf-q4f32_1-ctx4k_cs1k-webgpu.wasm" ,
48- },
49- {
50- " model_url" : " https://huggingface.co/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC/resolve/main/" ,
51- " local_id" : " Mistral-7B-Instruct-v0.2-q4f16_1" ,
52- " model_lib_url" : " https://raw.githubusercontent.com/mlc-ai/binary-mlc-llm-libs/main/Mistral-7B-Instruct-v0.2/Mistral-7B-Instruct-v0.2-q4f16_1-sw4k_cs1k-webgpu.wasm" ,
53- " required_features" : [" shader-f16" ],
62+ model: " https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f32_1-MLC" ,
63+ model_id: " Llama-3-8B-Instruct-q4f32_1-MLC" ,
64+ model_lib:
65+ webllm .modelLibURLPrefix +
66+ webllm .modelVersion +
67+ " /Llama-3-8B-Instruct-q4f32_1-ctx4k_cs1k-webgpu.wasm" ,
5468 },
5569 // Add your own models here...
56- ]
57- }
58- const selectedModel = " Llama-2-7b-chat-hf-q4f32_1"
59- // const selectedModel = "Mistral-7B-Instruct-v0.1-q4f16_1"
60- await chat .reload (selectedModel , undefined , myAppConfig );
70+ ],
71+ };
72+ const selectedModel = " Llama-3-8B-Instruct-q4f32_1-MLC" ;
73+ const engine: webllm .MLCEngineInterface = await webllm .CreateMLCEngine (
74+ selectedModel ,
75+ { appConfig: appConfig },
76+ );
6177
62- Just like any other platforms, to run a model with on WebLLM, you need:
78+ Looking at the code above, we find that, just like any other platforms supported by MLC-LLM, to
79+ run a model on WebLLM, you need:
6380
64- 1. **Model weights ** converted to MLC format (e.g. `Llama-2-7b-hf-q4f32_1-MLC
65- <https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f32_1-MLC/tree/main> `_.): downloaded through ``model_url ``
66- 2. **Model library ** that comprises the inference logic (see repo `binary-mlc-llm-libs <https://github.com/mlc-ai/binary-mlc-llm-libs >`__): downloaded through ``model_lib_url ``.
81+ 1. **Model weights ** converted to MLC format (e.g. `Llama-3-8B-Instruct-q4f32_1-MLC
82+ <https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f32_1-MLC/tree/main> `_.): downloaded through the url ``ModelRecord.model ``
83+ 2. **Model library ** that comprises the inference logic (see repo `binary-mlc-llm-libs <https://github.com/mlc-ai/binary-mlc-llm-libs/tree/main/web-llm-models >`__): downloaded through the url ``ModelRecord.model_lib ``.
84+
85+ In sections below, we walk you through two examples on how to add your own model besides the ones in
86+ `webllm.prebuiltAppConfig.model_list <https://github.com/mlc-ai/web-llm/blob/main/src/config.ts#L293 >`__.
87+ Before proceeding, please verify installation of ``mlc_llm `` and ``tvm ``.
6788
6889Verify Installation for Adding Models
6990-------------------------------------
7091
71- In sections below, we walk you through two examples of adding models to WebLLM. Before proceeding,
72- please verify installation of ``mlc_llm `` and ``tvm ``:
73-
7492**Step 1. Verify mlc_llm **
7593
7694We use the python package ``mlc_llm `` to compile models. This can be installed by
@@ -106,7 +124,7 @@ In cases where the model you are adding is simply a variant of an existing
106124model, we only need to convert weights and reuse existing model library. For instance:
107125
108126- Adding ``OpenMistral `` when MLC supports ``Mistral ``
109- - Adding `` Llama2-uncensored `` when MLC supports ``Llama2 ``
127+ - Adding a `` Llama3 `` fine-tuned on a domain-specific task when MLC supports ``Llama3 ``
110128
111129
112130In this section, we walk you through adding ``WizardMath-7B-V1.1-q4f16_1 `` to the
@@ -150,23 +168,9 @@ See :ref:`compile-command-specification` for specification of ``gen_config``.
150168 --quantization q4f16_1 --conv-template wizard_coder_or_math \
151169 -o dist/WizardMath-7B-V1.1-q4f16_1-MLC/
152170
153- For the ``conv-template ``, `conversation_template.py <https://github.com/mlc-ai/mlc-llm/blob/main/python/mlc_llm/conversation_template.py >`__
154- contains a full list of conversation templates that MLC provides.
155-
156- If the model you are adding requires a new conversation template, you would need to add your own.
157- Follow `this PR <https://github.com/mlc-ai/mlc-llm/pull/2163 >`__ as an example. Besides, you also need to add the new template to ``/path/to/web-llm/src/conversation.ts ``.
158- We look up the template to use with the ``conv_template `` field in ``mlc-chat-config.json ``.
159-
160- For more details, please see :ref: `configure-mlc-chat-json `.
161-
162- .. note ::
163-
164- If you added your conversation template in ``src/conversation.ts ``, you need to build WebLLM
165- from source following the instruction in
166- `the WebLLM repo's README <https://github.com/mlc-ai/web-llm?tab=readme-ov-file#build-webllm-package-from-source >`_.
167-
168- Alternatively, you could use the ``"custom" `` conversation template so that you can pass in
169- your own ``ConvTemplateConfig `` in runtime without having to build the package from source.
171+ For the ``conv-template ``, `conversation_template.py <https://github.com/mlc-ai/mlc-llm/tree/main/python/mlc_llm/conversation_template >`__
172+ contains a full list of conversation templates that MLC provides. You can also manually modify the ``mlc-chat-config.json `` to
173+ add your customized conversation template.
170174
171175**Step 3 Upload weights to HF **
172176
@@ -192,26 +196,30 @@ Finally, we modify the code snippet for
192196`get-started <https://github.com/mlc-ai/web-llm/blob/main/examples/get-started/src/get_started.ts >`__
193197pasted above.
194198
195- We simply specify the Huggingface link as ``model_url ``, while reusing the ``model_lib_url `` for
196- ``Mistral-7B ``. Note that we need the suffix to be `` /resolve/main/ ``.
199+ We simply specify the Huggingface link as ``model ``, while reusing the ``model_lib `` for
200+ ``Mistral-7B ``.
197201
198202.. code :: typescript
199203
200- const myAppConfig : AppConfig = {
204+ const appConfig : webllm . AppConfig = {
201205 model_list: [
202- // Other records here omitted...
203206 {
204- // Substitute model_url with the one you created `my-huggingface-account/my-wizardMath-weight-huggingface-repo`
205- " model_url" : " https://huggingface.co/mlc-ai/WizardMath-7B-V1.1-q4f16_1-MLC/resolve/main/" ,
206- " local_id" : " WizardMath-7B-V1.1-q4f16_1" ,
207- " model_lib_url" : " https://raw.githubusercontent.com/mlc-ai/binary-mlc-llm-libs/main/Mistral-7B-Instruct-v0.2/Mistral-7B-Instruct-v0.2-q4f16_1-sw4k_cs1k-webgpu.wasm" ,
208- " required_features" : [" shader-f16" ],
207+ model: " https://huggingface.co/mlc-ai/WizardMath-7B-V1.1-q4f16_1-MLC" ,
208+ model_id: " WizardMath-7B-V1.1-q4f16_1-MLC" ,
209+ model_lib:
210+ webllm .modelLibURLPrefix +
211+ webllm .modelVersion +
212+ " /Mistral-7B-Instruct-v0.3-q4f16_1-ctx4k_cs1k-webgpu.wasm" ,
209213 },
210- ]
211- }
214+ // Add your own models here...
215+ ],
216+ };
212217
213218 const selectedModel = " WizardMath-7B-V1.1-q4f16_1"
214- await chat .reload (selectedModel , undefined , myAppConfig );
219+ const engine: webllm .MLCEngineInterface = await webllm .CreateMLCEngine (
220+ selectedModel ,
221+ { appConfig: appConfig },
222+ );
215223
216224 Now, running the ``get-started `` example will use the ``WizardMath `` model you just added.
217225See `get-started's README <https://github.com/mlc-ai/web-llm/tree/main/examples/get-started#webllm-get-started-app >`__
@@ -223,9 +231,9 @@ Bring Your Own Model Library
223231
224232A model library is specified by:
225233
226- - The model architecture (e.g. ``llama-2 ``, ``gpt-neox ``)
234+ - The model architecture (e.g. ``llama-3 ``, ``gpt-neox ``, `` phi-3 ``)
227235 - Quantization (e.g. ``q4f16_1 ``, ``q0f32 ``)
228- - Metadata (e.g. ``context_window_size ``, ``sliding_window_size ``, ``prefill-chunk-size ``), which affects memory planning
236+ - Metadata (e.g. ``context_window_size ``, ``sliding_window_size ``, ``prefill-chunk-size ``), which affects memory planning (currently only `` prefill-chunk-size `` affects the compiled model)
229237 - Platform (e.g. ``cuda ``, ``webgpu ``, ``iOS ``)
230238
231239In cases where the model you want to run is not compatible with the provided MLC
@@ -288,9 +296,8 @@ All these knobs are specified in ``mlc-chat-config.json`` generated by ``gen_con
288296 --device webgpu -o dist/libs/RedPajama-INCITE-Chat-3B-v1-q4f16_1-webgpu.wasm
289297
290298 .. note ::
291- When compiling larger models like ``Llama-2-7B ``, you may want to add ``--prefill_chunk_size 1024 `` or
292- lower ``context_window_size `` to decrease memory usage. Otherwise, during runtime,
293- you may run into issues like:
299+ When compiling larger models like ``Llama-3-8B ``, you may want to add ``--prefill_chunk_size 1024 ``
300+ to decrease memory usage. Otherwise, during runtime, you may run into issues like:
294301
295302 .. code :: text
296303
@@ -344,17 +351,20 @@ Finally, we are able to run the model we added in WebLLM's `get-started <https:/
344351 model_list: [
345352 // Other records here omitted...
346353 {
347- " model_url " : " https://huggingface.co/my-hf-account/my-redpajama3b-weight-huggingface-repo/resolve/main/" ,
348- " local_id " : " RedPajama-INCITE-Instruct-3B-v1" ,
349- " model_lib_url " : " https://raw.githubusercontent.com/my-gh-account/my-repo/main/RedPajama-INCITE-Chat-3B-v1-q4f16_1-webgpu.wasm" ,
354+ " model " : " https://huggingface.co/my-hf-account/my-redpajama3b-weight-huggingface-repo/resolve/main/" ,
355+ " model_id " : " RedPajama-INCITE-Instruct-3B-v1" ,
356+ " model_lib " : " https://raw.githubusercontent.com/my-gh-account/my-repo/main/RedPajama-INCITE-Chat-3B-v1-q4f16_1-webgpu.wasm" ,
350357 " required_features" : [" shader-f16" ],
351358 },
352359 ]
353360 }
354361
355- const selectedModel = " RedPajama-INCITE-Instruct-3B-v1"
356- await chat .reload (selectedModel , undefined , myAppConfig );
362+ const selectedModel = " RedPajama-INCITE-Instruct-3B-v1" ;
363+ const engine: webllm .MLCEngineInterface = await webllm .CreateMLCEngine (
364+ selectedModel ,
365+ { appConfig: appConfig },
366+ );
357367
358368 Now, running the ``get-started `` example will use the ``RedPajama `` model you just added.
359369See `get-started's README <https://github.com/mlc-ai/web-llm/tree/main/examples/get-started#webllm-get-started-app >`__
360- on how to run it.
370+ on how to run it.
0 commit comments