DTB2.0 Python Interface Documentation #9

jimingda · 2024-05-10T09:01:35Z

jimingda
May 10, 2024
Maintainer

DTB2.0 Python Interface Documentation

1 basic software info

os : Centos 7.6
kernel version : 3.10.0-957.el7.x86_64
Python version : 3.7.7
devtools version : 7.3.1
hpcx version : 2.4.1
rocm version : dtk22.10.1

2 basic hardware info

CPU : Hygon C86 7185 32-core Processor
DRAM : 128GB
DCU : 4 * Z100 (1319MHz, 16GB)
Server Type : X785

3 environment preparation

DTB1.2 adopts a server-client architecture, where messages are defined using proto files. The server responds to service requests sent by the client, encapsulated in the protobuf protocol, via gRPC, as illustrated in Figure 1. The server runs on nodes via dist_simulator.sh, with the process communicating with the client running on rank 0, while the remaining processes run on other ranks. Communication and data transfer between rank 0 and other ranks occur via MPI. The main server process executes the dist_simulator executable file. Rank 0 exclusively occupies one node, while the remaining ranks run on nodes in groups of four, each binding to one DCU per node and sharing 200Gbps of IB bandwidth equally. Thus, each process has an IB bandwidth of 50Gbps. Communication and data transfer between rank 0 and other ranks are facilitated via MPI. The script is provided below:

#!/bin/bash

EXEPATH=`realpath $0`  
DIR=`dirname $EXEPATH`  
CMD="$DIR/dist_simulator -log $1"  

LOC_RANK=$((OMPI_COMM_WORLD_LOCAL_RANK % 4))  

if [ $OMPI_COMM_WORLD_RANK -eq 0 ]; then  
　　export UCX_NET_DEVICES=mlx5_0:1  
　　export UCX_IB_PCI_BW=mlx5_0:200Gbs  
　　echo $CMD  
　　numactl --cpunodebind=0-3 --membind=0-3 ${CMD}  
else  
　　case ${LOC_RANK} in  
　　[0])  
　　　　export UCX_NET_DEVICES=mlx5_0:1  
　　　　export UCX_IB_PCI_BW=mlx5_0:50Gbs  　　  
　　　　export GOMP_CPU_AFFINITY='0-7'  
　　　　numactl --cpunodebind=0 --membind=0 ${CMD}  　　  
　　　　;;  
　　[1])  
　　　　export UCX_NET_DEVICES=mlx5_1:1  
　　　　export UCX_IB_PCI_BW=mlx5_1:50Gbs  
　　　　export GOMP_CPU_AFFINITY='8-15'  
　　　　numactl --cpunodebind=1 --membind=1 ${CMD}  
　　　　;;
　　[2])  
　　　　export UCX_NET_DEVICES=mlx5_2:1  
　　　　export UCX_IB_PCI_BW=mlx5_2:50Gbs  
　　　　export GOMP_CPU_AFFINITY='16-23'  
　　　　numactl --cpunodebind=2 --membind=2 ${CMD}  
　　　　;;
　　[3])    
　　　　export UCX_NET_DEVICES=mlx5_3:1  
　　　　export UCX_IB\_PCI_BW=mlx5_3:50Gbs  
　　　　export GOMP_CPU_AFFINITY='24-31'  
　　　　numactl --cpunodebind=3 --membind=3 ${CMD}  
　　　　;;  
　　esac  
fi

3.1 launch server

tbd

3.2 launch client

tbd

4 Client Python Interface

In DTB2.0, the client-side Python interface is implemented through the BlockWrapper class in cuda/python/dist_blockwrapper_pytorch.py. This interface can be categorized into two types based on their functionality: operation interfaces and data retrieval interfaces.

4.1 operation interface

init

The constructor interface for the BlockWrapper class is defined as follows:

def __init__(self, address, path, delta_t=1, route_path=None, print_stat=False,
                    force_rebase=False, allow_rebase=True, overlap=2):

Initializes a new instance of the BlockWrapper class.

address: The IP address and port number of the server, e.g., "10.11.2.10:50051".
path: The absolute path where the address table is stored, such as the address table block_0.npz.
delta_t: The biological time interval between two consecutive simulations, defaulting to 1ms.
route_path: The absolute path where the routing table is stored, defaulting to None, indicating peer-to-peer transmission.
print_stat: Option to output statistical items for each iteration, defaulting to False.
force_rebase: In the assimilation program, whether to
forcefully perform non-accumulative sorting on population IDs from
multiple address tables (re-sorting from 0), defaulting to False.
allow_rebase: In the assimilation program, whether to allow non-accumulative sorting, defaulting to True.
overlap: In the assimilation program, the number of
populations into which a single voxel in the address table can be split,
defaulting to 2.

run

Network simulation execution interface, defined as follows:

def run(self, iterations, freqs=True, freq_char=False, vmean=False, sample_for_show=False, 
             imean=False, iou=False, stra

iterations: Number of iterations
freqs: Setting for spike statistics of populations, defaults to True
freq_char: (Description not provided)
vmean: Setting for vmean statistics of populations, defaults to False
sample_for_show: (Description not provided)
imean: Setting for vmean statistics of populations, defaults to False
iou: Whether to use background current drive, defaults to False
strategy: MPI sending strategy, defaults to None, i.e.,
STRATEGY_SEND_PAIRWISE, can also be set to STRATEGY_SEND_SEQUENTIAL or
STRATEGY_SEND_RANDOM

update_property

Update property interface, defined as follows:

def update_property(self, property_idx, property_weight, bid=None):

property_idx: Property ID, range from 0 to 21
property_weight: The parameter to be updated
bid: The ID of the population to be updated, defaults to None, meaning all populations are updated

mul_property_by_subblk

In the DA process, we use it to update parameters which are sampled from a distribution.
Multiply the attribute parameters in the population by a constant, defined as follows:

def mul_property_by_subblk(self, property_idx, property_hyper_parameter, accumulate=False):

property_idx: Property ID, range from 0 to 21
property_hyper_parameter: The parameter to be updated
accumulate: In the assimilation program, divide the previous
parameter by the parameter passed before multiplying it, making the
current passed parameter the parameter to be set for this property,
defaults to True

assign_property_by_subblk

Assign property parameters in the population, defined as follows:

def assign_property_by_subblk(self, property_idx, property_hyper_parameter):

property_idx: Property ID, range from 0 to 21
property_hyper_parameter: The parameter to be assigned

gamma_property_by_subblk

Select a property in a certain population and generate gamma distribution parameters based on alpha/beta to update this property, defined as follows:

def gamma_property_by_subblk(self, property_idx, gamma_concentration, gamma_rate, debug=False):

property_idx: Property ID, ranging from 0 to 21
gamma_concentration: Parameter alpha of the gamma distribution
gamma_rate: Parameter beta of the gamma distribution

set_samples

Set sampling neurons, defined as follows:

def set_samples(self, sample_idx, bid=None):

sample_idx: Sequence of neuron numbers to be sampled
bid: Block ID number where the sampling neurons are located, defaults to None

set_state_rule

def set_state_rule(self, file_path:str, observe_counter:int, observe_interval:int):

file_path: Path to save the checkpoint file
observe_counter: Number of observation points
observe_interval: Interval between observation points

load_state_from_file

Load checkpoint state from file:

def load_state_from_file(self, file_path:str, observe_counter:int):

file_path: Path to the checkpoint file to be loaded, data type: str
observe_counter: Number of observation points, data type: int

update_ou_background_stimuli

def update_ou_background_stimuli(self, correlation_time, mean, deviation):

correlation_time: Parameter of the OU current, correlation constant
mean: Mean of the OU current
deviation: Fluctuation term of the OU current

update_ttype_ca_stimuli

def update_ttype_ca_stimuli(self, population_id, h_init_val, g_t, tao_h_minus, tao_h_plus, v_h, v_t):

population_id: Population ID
h_init_val: Initial value of h
g_t: Specific constant for each population, for some populations
(such as STN and GPe) it takes a constant of 0.06nS, while for other
populations it takes 0
tao_h_minus: Time constant
tao_h_plus: Time constant
v_h: Ca parameter
v_t: Ca parameter

check_sample_conn_weight

def check_sample_conn_weight(self, bid, nid):

bid: Block ID of the sampling neuron
nid: Neuron ID of the sampling neuron

set_samples_by_specifying_popu_idx

def set_samples_by_specifying_popu_idx(self, population_id, sample_number=None):

population_id: Population ID for sampling
sample_number: Value to set

4.2 data retrieval interface

total_neurons

Returns the total number of neurons, such as: tensor(6826335882, device='cuda:0').

return self._neurons_per_block.sum()

block_id

Returns the sequence of DCU card numbers, such as [1,2,3…2000].

return self._block_id

subblk_id

Gets the sequence of population numbers, such as: tensor([2, 3, 4, ..., 227017, 227026, 227027], device='cuda:0').

return self._subblock_id

total_subblks

Gets the total number of populations.

return self._subblock_id.shape[0]

neurons_per_subblk

Sequence of the number of neurons for each population, such as tensor([90906, 25639, 96272, ..., 17933, 62897, 15725], device='cuda:0').

if self._subblk_idx is None:
    return self._neurons_per_subblk
else:
    return self._reduceat(self._neurons_per_subblk)

neurons_per_block

Number of neurons per DCU card (sequence).

return self._neurons_per_block

last_time_stat

Statistics of each card in the previous iteration, including 34 indicators as shown in the table below:

"computing_before_comm_time_point",
        "computing_after_comm_time_point",
        "computing_time_point_end",
        "routing_computation_time_point_start",
        "routing_computation_duration",
        "routing_computation_time_point_end",
        "reporting_duration",
        "sending_copy_time_point_start",
        "sending_copy_time_point_end",
        "recving_copy_time_point_start",
        "recving_copy_time_point_end",
        "sending_time_point_start",
        "sending_time_point_end",
        "recving_time_point_start",
        "recving_time_point_end",
        "routing_time_point_start",
        "routing_time_point_end",
        "sending_duration_inter_node",
        "sending_duration_intra_node",
        "recving_duration_inter_node",
        "recving_duration_intra_node",
        "routing_duration_inter_node",
        "routing_duration_intra_node",
        "sending_byte_size_inter_node",
        "sending_byte_size_intra_node",
        "recving_byte_size_inter_node",
        "recving_byte_size_intra_node",
        "routing_byte_size_inter_node",
        "routing_byte_size_intra_node",
        "flops_update_v_membrane",
        "flops_update_inner_j_presynaptic",
        "flops_update_outer_j_presynaptic",
        "flops_update_j_presynaptic",
        "flops_update_i_synaptic"

5 test

To verify whether neuron spikes are correctly emitted in the network, i.e., whether the membrane potentials computed by the GPU are accurate, the process can be reproduced in the CPU and compared against the GPU results. However, this method is not feasible for large-scale scenarios due to the long computational time of CPU serial calculations. In DTB1.2, a sampling network composed of several sampling neurons is taken from the network, and this sampling network also participates in the simulation of the entire network. At the same time, a sampling network of the same scale and parameters is simulated on the CPU for comparison. If the comparison is correct, the membrane potential calculation can be considered accurate.

Network Simulation

The computational time required to simulate 1 ms of biological time (i.e., one iteration) is referred to as the slowdown ratio, which is an important performance indicator for DTB. The test results are as follows,

Scale	Firing Rate	Communication Method	Slowdown Ratio
7.5 Billion	14Hz	p2p	50
7.5 Billion	14Hz	p2p	65
15 Billion	14Hz	p2p	63
15 Billion	14Hz	p2p	80
86 Billion	7Hz	p2p	65
86 Billion	15Hz	p2p	79
86 Billion	30Hz	p2p	119
100 Billion	15Hz	p2p	63

The slower speed of the 860-billion-scale compared to the 100-billion-scale is mainly due to the poor state of the IB network in the cluster during testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DTB

DTB2.0 Python Interface Documentation #9

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

DTB

DTB2.0 Python Interface Documentation #9

jimingda May 10, 2024 Maintainer

DTB2.0 Python Interface Documentation

1 basic software info

2 basic hardware info

3 environment preparation

3.1 launch server

3.2 launch client

4 Client Python Interface

4.1 operation interface

__init__

run

update_property

mul_property_by_subblk

assign_property_by_subblk

gamma_property_by_subblk

set_samples

set_state_rule

load_state_from_file

update_ou_background_stimuli

update_ttype_ca_stimuli

check_sample_conn_weight

set_samples_by_specifying_popu_idx

4.2 data retrieval interface

total_neurons

block_id

subblk_id

total_subblks

neurons_per_subblk

neurons_per_block

last_time_stat

5 test

Network Simulation

Replies: 0 comments

jimingda
May 10, 2024
Maintainer

init