- axelera.runtime documentation
- Objects
- Types
- Exceptions
- exception axelera.runtime.ConnectionError
- exception axelera.runtime.DeviceInUse
- exception axelera.runtime.IncompatibleDevice
- exception axelera.runtime.InternalError
- exception axelera.runtime.InvalidArgument
- exception axelera.runtime.InvalidConfiguration
- exception axelera.runtime.NotImplemented
- exception axelera.runtime.Pending
- exception axelera.runtime.UnknownError
Abstract base class for all objects in the runtime.
All objects created by the runtime are owned by the Context that created them. When the Context is released, all objects created by it are also released.
Reference to the Context that owns this object.
Release the object and all its children.
Bases: Object
The Context object is the root object of the runtime.
A Context is used to create and manage all other objects in the runtime. Normally you would create it in a with statement, like this
with axr.Context() as context:
devices = context.list_devices()
Alternatively, you can call release() to release the context and all its children.
Returns True if the configuration setting is complete, False if it is pending.
For example to change a configuration on two devices:
res0 = context.configure_device(device0, clock_profile=1000)
res1 = context.configure_device(device1, clock_profile=1000)
while not res0 or not res1:
time.sleep(0.05)
res0 = context.device_ready(device0)
res1 = context.device_ready(device1)
Valid properties are :
| Property | Default | Description |
|---|---|---|
| clock_profile | 800 | Device clock profile in MHz |
| clock_profile_core_0-3 | 800 | Per-core clock profile in MHz |
| mvm_utilisation_core_0-3 | 100 | Per-core MVM utilisation as percentage |
- Return type:
bool
Connect to one or more sub-devices.
This reserves the sub-devices so that other processes cannot use them.
The returned Connection object is used to load models and run them on the sub-devices.
- Return type:
Connection
Returns True if the configuration setting is complete, False if is pending.
- Return type:
bool
List all devices on the system.
- Return type:
list[DeviceInfo]
Load a model from a file.
The returned model can be loaded onto multiple Connection objects using Connection.load_model_instance().
- Return type:
Model
Read all available configiration properties of the device.
- Return type:
dict[str,str]
Release the context and all its children.
This function is called automatically when Context is used as acontext manager.
Bases: Object
A connection to one or more sub-devices.
Load a model onto the sub-devices.
Valid kwargs are :
| Property | Default | Description |
|---|---|---|
| aipu_cores | 0 | Amount of L2 resources to allocate for the model, set to batch size |
| num_sub_devices | 0 | Number of sub-devices to use, set to batch size of the model |
| input_dmabuf | 0 | True if the input arguments are dmabuf file descriptors |
| device_profiling | 0 | True to enable device profiling |
| host_profiling | 0 | True to enable host profiling |
| output_dmabuf | 0 | True if the output arguments are dmabuf file descriptors |
| double_buffer | 0 | True to enable double buffering |
| elf_in_ddr | 1 | True if the model was compiled with elf_in_ddr as True. |
- Return type:
ModelInstance
Bases: Object
A model object that can be loaded onto a Connection object.
preamble_graphRelative path to a preamble ONNX file containing the initial nodes of the model. These nodes are removed by the compiler and executed on the host.postamble_graphRelative path to a postamble ONNX file containing the final nodes of the model. These nodes are removed by the compiler and executed on the host.input_tensor_layoutAlways NHWC in this version.
Return information about the input tensors to the model.
- Return type:
list[TensorInfo]
Return information about the output tensors of the model.
- Return type:
list[TensorInfo]
Bases: Object
A model instance that has been loaded onto a Connection object.
Run the model instance.
If the model instance was created with input_dmabuf=True then inputs must be a list of file descriptors. Otherwise it should be a list of numpy arrays.
If the model instance was created with output_dmabuf=True then outputs must be a list of file descriptors. Otherwise it should be a list of numpy arrays.
On failure, an exception is raised.
- Return type:
None
class BoardType(enum.Enum):
alpha_pcie = 0
alpha_m2 = 1
pcie = 2
m2 = 3
devboard = 4
sbc = 5
unknown = 6
The result of enumeraing the available Axelera devices.
DeviceInfo is also used to indicate which device to configure, read configuration, and select desired device.
-
name: str
_The name of the device. For example 'metis-0:3:0'. -
subdevice_count: int
The number of subdevices on the device, for metis this is 4. -
max_memory: int
The maximum memory available on the device. Note in the current implementation this field is not populated and will always be 0. -
in_use: bool
The number of subdevices in use. Note in the current implementation this field is not populated and will always be 0. -
in_use_by: str
The username and process id of the user(s) using the device, comma separated. Note in the current implementation this field is not populated and will always be 0. -
board_type: BoardType
The board type of the device. -
firmware_version: str
The firmware version of the device, for example v1.1.0-rc5-2-g1234567. -
board_revision: int
The board revision of the device. -
flashed_firmware_version: str
The version of the firmware stored in flash memory of the device. -
board_controller_firmware_version: str
The board controller firmware version on the device. -
board_controller_board_type: str
The board controller board type.
Information about a tensor input/output.
This includes quantization and padding information if manifest.json was found in the model.
For example to quantize and pad a tensor:
>>> input = TensorInfo((1, 230, 240, 3), padding=[(0, 0), (3, 3), (3, 13), (0, 1)])
>>> src = np.zeros(input.unpadded_shape, dtype=np.float32)
>>> quant = np.round((src / input.scale) + input.zero_point).clip(-128, 127).astype(np.int8)
>>> padded = np.pad(quant, input.padding, constant_values=input.zero_point)To depad and dequantize a tensor:
>>> output = TensorInfo((1, 1, 1, 1024), padding=[(0, 0), (0, 0), (0, 0), (0, 24)])
>>> out = np.zeros(output.shape, dtype=np.int8)
>>> depadded = out[tuple(slice(b, -e if e else None) for b, e in output.padding)]
>>> dequant = (depadded.astype(np.float32) - output.zero_point) * output.scale
-
shape: tuple[int, ...]
The shape of the tensor. -
dtype: np.dtype = np.int8
The data type of the tensor. -
name: str = ''
The name of the tensor. -
padding: list[tuple[int, int]]
Amount of padding on the tensor. As list of [(start0, end0), (start1, end1), ... (see numpy.pad) -
scale: float = 1.0
scale for quantization/dequantization. -
zero_point: int = 0
zero-point for quantization/dequantization.
Properties
-
size: int
The size of the tensor in bytes. -
unpadded_shape: tuple[int, ...]
The shape of the tensor without padding.
