NVIDIA
diff --git a/‎GRPC_ARCHITECTURE.md‎
Lines changed: 23 additions & 37 deletions b/‎GRPC_ARCHITECTURE.md‎
Lines changed: 23 additions & 37 deletions
diff --git a/‎SERVER_ARCHITECTURE.md‎
Lines changed: 6 additions & 4 deletions b/‎SERVER_ARCHITECTURE.md‎
Lines changed: 6 additions & 4 deletions
diff --git a/‎cpp/cuopt_grpc_server.cpp‎
Lines changed: 0 additions & 27 deletions b/‎cpp/cuopt_grpc_server.cpp‎
Lines changed: 0 additions & 27 deletions
@@ -6,8 +6,9 @@ The cuOpt remote execution system uses gRPC for client-server communication. The
 supports arbitrarily large optimization problems (multi-GB) through a chunked array transfer
 protocol that uses only unary (request-response) RPCs — no bidirectional streaming.
 
-All serialization uses protocol buffers generated by `protoc` and `grpc_cpp_plugin` —
-no custom serialization logic is implemented.
+All client-server serialization uses protocol buffers generated by `protoc` and
+`grpc_cpp_plugin`. The internal server-to-worker pipe uses protobuf for metadata
+headers and raw byte transfer for bulk array data (see Security Notes).
 
 ## Directory Layout
 
@@ -30,7 +31,7 @@ cpp/src/grpc/
     ├── grpc_service_impl.cpp       # CuOptRemoteServiceImpl — all RPC handlers
     ├── grpc_server_types.hpp       # Shared types, globals, forward declarations
     ├── grpc_field_element_size.hpp # ArrayFieldId → element byte size (codegen target)
-    ├── grpc_pipe_serialization.hpp # Pipe blob serialize/deserialize (varint-delimited proto)
+    ├── grpc_pipe_serialization.hpp # Pipe I/O: protobuf headers + raw byte arrays (request/result)
     ├── grpc_incumbent_proto.hpp    # Incumbent proto build/parse (codegen target)
     ├── grpc_worker.cpp             # worker_process(), incumbent callback, store_simple_result
     ├── grpc_worker_infra.cpp       # Pipes, spawn, wait_for_workers, mark_worker_jobs_failed
@@ -192,34 +193,15 @@ The client handles size-based routing transparently:
 - Server cleans up job state on client disconnect during upload
 - Automatic reconnection is NOT built-in (caller should retry)
 
-## Completion Strategies
+## Completion Strategy
 
-The client supports two strategies for waiting until a job completes:
+The `solve_lp` and `solve_mip` methods poll `CheckStatus` every `poll_interval_ms`
+until the job reaches a terminal state (COMPLETED/FAILED/CANCELLED) or `timeout_seconds`
+is exceeded. During polling, MIP incumbent callbacks are invoked on the main thread.
 
-### Polling (default)
-
-```cpp
-config.use_wait = false;  // default
-```
-
-- Main thread polls `CheckStatus` every `poll_interval_ms`
-- Detects completion when status changes to COMPLETED/FAILED/CANCELLED
-- Allows timeout detection (max_polls = timeout_seconds / poll_interval_ms)
-- Compatible with all server configurations
-
-### Wait RPC
-
-```cpp
-config.use_wait = true;
-```
-
-- Main thread makes single blocking `WaitForCompletion` call
-- More efficient (no repeated RPCs)
-- Server blocks until job completes, then returns final status
-- Result must still be fetched separately via `GetResult` or chunked download
-
-Both strategies support concurrent log streaming and incumbent callbacks — these
-run in background threads independent of the main completion check.
+The `WaitForCompletion` RPC is available as a public async API primitive for callers
+managing jobs directly, but it is not used by the convenience `solve_*` methods because
+polling provides timeout protection and enables incumbent callbacks.
 
 ## Client API (`grpc_client_t`)
 
@@ -230,7 +212,6 @@ struct grpc_client_config_t {
   std::string server_address = "localhost:8765";
   int poll_interval_ms       = 1000;
   int timeout_seconds        = 3600;  // Max wait for job completion (1 hour)
-  bool use_wait              = false; // Use WaitForCompletion instead of polling
   bool stream_logs           = false; // Stream solver logs from server
 
   // Callbacks
@@ -339,20 +320,25 @@ config.tls_client_key = read_file("client.key");
 
 | Configuration | Default | Notes |
 |---------------|---------|-------|
-| Server `--max-message-mb` | 256 MiB | Per-message limit |
-| Client `max_message_bytes` | 256 MiB | Should match server |
+| Server `--max-message-mb` | 256 MiB | Per-message limit (also `--max-message-bytes` for exact byte values) |
+| Server clamping | [4 MiB, ~2 GiB] | Enforced at startup to stay within protobuf's serialization limit |
+| Client `max_message_bytes` | 256 MiB | Clamped to [4 MiB, ~2 GiB] at construction |
 | Chunk size | 16 MiB | Payload per `SendArrayChunk`/`GetResultChunk` |
 | Chunked threshold | 75% of max_message_bytes | Problems above this use chunked upload (e.g. 192 MiB when max is 256 MiB) |
 
 Chunked transfer allows unlimited total payload size; only individual
-chunks must fit within the per-message limit.
+chunks must fit within the per-message limit. Neither client nor server
+allows "unlimited" message size — both clamp to the protobuf 2 GiB ceiling.
 
 ## Security Notes
 
-1. **No Custom Serialization**: All message parsing uses protobuf-generated code
-2. **Standard gRPC Security**: HTTP/2 framing, flow control, standard status codes
-3. **TLS Support**: Optional encryption with mutual authentication
-4. **Input Validation**: Server validates all incoming messages before processing
+1. **gRPC Layer**: All client-server message parsing uses protobuf-generated code
+2. **Internal Pipe**: The server-to-worker pipe uses protobuf for metadata headers
+   and length-prefixed raw `read()`/`write()` for bulk array data. This pipe is
+   internal to the server process (main → forked worker) and not exposed to clients.
+3. **Standard gRPC Security**: HTTP/2 framing, flow control, standard status codes
+4. **TLS Support**: Optional encryption with mutual authentication
+5. **Input Validation**: Server validates all incoming gRPC messages before processing
 
 ## Data Flow Summary
 
 
@@ -85,7 +85,7 @@ All paths below are under `cpp/src/grpc/server/`.
 | `grpc_service_impl.cpp` | `CuOptRemoteServiceImpl`: all 14 RPC handlers (SubmitJob, CheckStatus, GetResult, chunked upload/download, StreamLogs, GetIncumbents, CancelJob, DeleteResult, WaitForCompletion, Status probe). Uses mappers and job_management to enqueue jobs and trigger pipe I/O. |
 | `grpc_server_types.hpp` | Shared structs (e.g. `JobQueueEntry`, `ResultQueueEntry`, `ServerConfig`, `JobInfo`), enums, globals (atomics, mutexes, condition variables), and forward declarations used across server .cpp files. |
 | `grpc_field_element_size.hpp` | Maps `cuopt::remote::ArrayFieldId` to element byte size; used by pipe deserialization and chunked logic. |
-| `grpc_pipe_serialization.hpp` | Serialize/deserialize result blobs (ChunkedResultHeader + array chunks) and chunked request blobs (ChunkedProblemHeader + chunks) and SubmitJobRequest for pipe transfer. |
+| `grpc_pipe_serialization.hpp` | Streaming pipe I/O: write/read individual length-prefixed protobuf messages (ChunkedProblemHeader, ChunkedResultHeader, ArrayChunk) directly to/from pipe fds. Avoids large intermediate buffers. Also serializes SubmitJobRequest for unary pipe transfer. |
 | `grpc_incumbent_proto.hpp` | Build `Incumbent` proto from (job_id, objective, assignment) and parse it back; used by worker when pushing incumbents and by main when reading from the incumbent pipe. |
 | `grpc_worker.cpp` | `worker_process(worker_index)`: loop over job queue, receive job data via pipe (unary or chunked), call solver, send result (and optionally incumbents) back. Contains `IncumbentPipeCallback` and `store_simple_result`. |
 | `grpc_worker_infra.cpp` | Pipe creation/teardown, `spawn_worker` / `spawn_workers`, `wait_for_workers`, `mark_worker_jobs_failed`, `cleanup_shared_memory`. |
@@ -97,9 +97,11 @@ All paths below are under `cpp/src/grpc/server/`.
 For large problems uploaded via chunked gRPC RPCs:
 
 1. Server holds chunked upload state in memory (`ChunkedUploadState`: header + array chunks per `upload_id`).
-2. When `FinishChunkedUpload` is called, the server serializes header and chunks (varint-delimited) and sends them to a worker via the job's pipe.
-3. Worker deserializes, runs the solver, and writes result (and optionally incumbents) back via pipes.
-4. Main process result-retrieval thread reads the result pipe and stores the result for `GetResult` or chunked download.
+2. When `FinishChunkedUpload` is called, the header and chunks are stored in `pending_chunked_data`. The data dispatch thread streams them directly to the worker pipe as individual length-prefixed protobuf messages — no intermediate blob is created.
+3. Worker reads the streamed messages from the pipe, reassembles arrays, runs the solver, and writes the result (and optionally incumbents) back via pipes using the same streaming format.
+4. Main process result-retrieval thread reads the streamed result messages from the pipe and stores the result for `GetResult` or chunked download.
+
+This streaming approach avoids creating a single large buffer, eliminating the 2 GiB protobuf serialization limit for pipe transfers and reducing peak memory usage. Each individual protobuf message (max 64 MiB) is serialized with standard `SerializeToArray`/`ParseFromArray`.
 
 No disk spooling: chunked data is kept in memory in the main process until forwarded to the worker.
 
 
@@ -75,33 +75,6 @@ using grpc::StatusCode;
 using namespace cuopt::linear_programming;
 // Note: NOT using "using namespace cuopt::remote" to avoid JobStatus enum conflict
 
-// =============================================================================
-// Data Integrity - Simple Hash for Transfer Verification
-// =============================================================================
-
-/**
- * @brief Compute FNV-1a 64-bit hash for data integrity verification.
- * Same algorithm as client - allows comparison of upload/download hashes.
- */
-inline uint64_t compute_data_hash(const uint8_t* data, size_t size)
-{
-  constexpr uint64_t FNV_OFFSET_BASIS = 14695981039346656037ULL;
-  constexpr uint64_t FNV_PRIME        = 1099511628211ULL;
-  uint64_t hash                       = FNV_OFFSET_BASIS;
-  for (size_t i = 0; i < size; ++i) {
-    hash ^= static_cast<uint64_t>(data[i]);
-    hash *= FNV_PRIME;
-  }
-  return hash;
-}
-
-inline std::string hash_to_hex(uint64_t hash)
-{
-  std::ostringstream oss;
-  oss << std::hex << std::setfill('0') << std::setw(16) << hash;
-  return oss.str();
-}
-
 // ============================================================================
 // Pipe IPC result serialization.
 // Uses standard protobuf varint-delimited format (SerializeDelimitedToCodedStream).