My organization is using rules_itest heavily now. We use autoassign_ports to randomize port assignments and I'm worried that the time-of-check time-of-use race condition around ports is going to cause problems when we run many parallel integration tests. We hit a similar issue with testcontainers much more quickly than we expected (port collisions are isomorphic to the birthday paradox).
We've adopted a system where we use the filesystem to reserve ports with some guarantee of transactionality. I'll reproduce our function for doing so below:
def find_and_reserve_listening_port(
port_range: tuple[int, int] = (10000, 49151),
host: str = "",
default: Optional[int] = None,
lock_dir: str = "/tmp/ports",
) -> int:
"""Find and reserve an available listening port in the given range.
Reservations are tracked through a file in the lock_dir directory.
Args:
port_range: A tuple of two integers representing the range of port numbers.
host: The hostname or IP address to test the port on.
default: The default port number to try first.
lock_dir: The directory to store port reservations
Returns:
The reserved port number.
Example:
>>> port = find_and_reserve_listening_port()
>>> ...
>>> release_listening_port(port)
"""
if not os.path.exists(lock_dir):
os.makedirs(lock_dir)
for _ in range(100):
port = find_listening_port(port_range, host, default)
lock_file = os.path.join(lock_dir, str(port))
try:
# This is the UNIX pattern for creating a lockfile
# The combination of O_CREAT, O_EXCL, and O_RDWR ensures that
# the file is created if it doesn't exist, and fails if it does exist.
# It is also robust against multiple processes trying to create the file.
fd = os.open(lock_file, os.O_CREAT | os.O_EXCL | os.O_RDWR)
os.close(fd)
return port
except FileExistsError:
# If the file already exists, continue to find another port
continue
raise Exception("Failed to find and reserve a listening port after 100 tries")
I am wondering if this pattern could be useful in rules_itest. I was thinking of two approaches:
itest_service gets parameters port_lock_directory and port_lock_prefix and uses this filesystem structure for port reservations.
itest_service gets a parameter port_reservation_exe which would be a bazel runnable target that takes arguments --reserve_n_ports=N and --release N and returns data as needed in their output.
I understand rules_itest has support for SO_REUSEPORT which is a good solution but does not work for us in at least some cases where we are starting docker containers for services we don't control. I poked at e.g. redpanda and its published container can't be convinced to open a port in a compatible way.
I'd love some feedback on these ideas. With some guidance I could contribute a PR.
My organization is using rules_itest heavily now. We use
autoassign_portsto randomize port assignments and I'm worried that the time-of-check time-of-use race condition around ports is going to cause problems when we run many parallel integration tests. We hit a similar issue with testcontainers much more quickly than we expected (port collisions are isomorphic to the birthday paradox).We've adopted a system where we use the filesystem to reserve ports with some guarantee of transactionality. I'll reproduce our function for doing so below:
I am wondering if this pattern could be useful in rules_itest. I was thinking of two approaches:
itest_servicegets parametersport_lock_directoryandport_lock_prefixand uses this filesystem structure for port reservations.itest_servicegets a parameterport_reservation_exewhich would be a bazel runnable target that takes arguments--reserve_n_ports=Nand--release Nand returns data as needed in their output.I understand rules_itest has support for
SO_REUSEPORTwhich is a good solution but does not work for us in at least some cases where we are starting docker containers for services we don't control. I poked at e.g. redpanda and its published container can't be convinced to open a port in a compatible way.I'd love some feedback on these ideas. With some guidance I could contribute a PR.