Performance Benchmark¶
paglets examples perf demonstrates a pure mobile-agent fan-out pattern. It does
not use a resident service and it is not started from launch config. The
benchmark code is carried by the mobile agent class itself.
The core agent is:
The CLI creates one parent benchmark agent on the entry host. The parent clones
children to online same-version mesh hosts. Each child runs benchmarks locally
and sends one result back to the parent. The CLI polls the parent with drain
until all hosts have replied. Parent result bookkeeping uses the paglet state
lock, but the actual benchmark work and remote calls happen outside that lock.
The public parent protocol uses typed operations, while MeshFanoutMixin
handles the repeated parent/child clone bookkeeping.
Benchmarks¶
The default run includes all categories:
| Category | Measurements |
|---|---|
| CPU single-core | Python integer loop, Python float loop, SHA-256 throughput. |
| CPU multi-core | Same kernels through worker processes. |
| Memory | Byte-buffer copy throughput and byte-buffer scan/checksum throughput. |
| Disk | Sequential write, fsync, sequential read, and small-file metadata rate. |
Disk benchmarks are intentionally bounded:
- only writable real volumes are selected by default;
- when a mountpoint is not directly writable, the benchmark also tries
per-user writable directories such as
~/.paglets/benchmarksand the OS temp directory on that same volume; - special, pseudo, read-only, duplicate, missing, and unwritable volumes are skipped;
- each tested volume gets a temporary benchmark directory;
- temporary files are cleaned up afterward;
- a volume is skipped if free space is less than twice the requested test size.
Normal text output hides skipped read-only, special, and duplicate targets. Use
--verbose or --debug when you want to inspect those skipped targets. JSON
output always includes the full skipped-target list.
These numbers are practical comparison data for a paglets mesh. They are not calibrated hardware certification results.
CLI Commands¶
Run all benchmark categories:
Useful variations:
uv run paglets examples perf --json
uv run paglets examples perf --duration 2 --disk-size 256M
uv run paglets examples perf --path /data --path /scratch
uv run paglets examples perf --no-disk
uv run paglets examples perf --workers 4
uv run paglets examples perf --verbose
Example with two local hosts running in separate terminals:
uv run paglets host --name alpha --port 8765 --mesh-version dev
uv run paglets host --name beta --port 8766 --peer http://127.0.0.1:8765 --mesh-version dev
Across machines, use --bind-public 192.0.2.10 on each host instead of loopback.
Repeat explicit values such as --bind-public 192.0.2.10 only when the host
must listen on multiple specific interfaces.
Then run the benchmark from the repository checkout:
klukas@mac-studio paglets % uv run paglets examples perf
host int/s float/s sha multi-int/s mem copy disk wr disk rd err
alpha 17.2M 19.8M 2.1G/s 140.2M 30.6G/s 3.7G/s 16.0G/s 0
beta 17.2M 20.0M 2.2G/s 147.6M 31.0G/s 3.4G/s 15.7G/s 0
disks:
host path size write read metadata
alpha /Users/klukas/.paglets/benchmark 128.0M 3.7G/s 16.0G/s 9130/s
beta /Users/klukas/.paglets/benchmark 128.0M 3.4G/s 15.7G/s 9301/s
Important options:
| Option | Meaning |
|---|---|
--duration |
Seconds per CPU and memory kernel. Default: 1.0. |
--disk-size |
Temporary file size per tested volume. Default: 128M. |
--workers |
Multi-core worker count. Default: logical CPU count. |
--path |
Limit disk I/O to explicit paths. Can be repeated. |
--no-cpu |
Skip CPU tests. |
--no-memory |
Skip memory tests. |
--no-disk |
Skip disk I/O tests. |
--lock-timeout |
Seconds to wait for another local benchmark run to finish. |
--verbose |
Print skipped disk targets and cleanup diagnostics. |
--debug |
Same diagnostic output as --verbose. |
Agent Flow¶
The benchmark agent uses cloning because benchmark work should run in parallel on different hosts:
- The CLI creates a parent
PerformanceBenchmarkAgenton the entry host. - The parent discovers online same-version mesh hosts.
- The parent clones a child to each host.
- Each child starts benchmark work in a background thread so clone arrival does not serialize the fan-out.
- Children on different hosts run in parallel.
- A host-local benchmark lock prevents two benchmark children on the same server from running expensive tests at the same time.
- Each child reports
HostBenchmarkResultor an error to the parent. - The parent wakes any
draincall waiting for completion. - The CLI returns a summary with
results,errors, and non-fatalcleanup_errors.
The lock has two layers: a process-local threading.Lock for threads inside one
benchmark child, and a best-effort OS file lock in the system temp directory.
The OS file lock is the important cross-process guard in the process-isolated
runtime; it serializes benchmark paglets started by the same user on the same
machine while still allowing different physical hosts to work in parallel.
Programmatic Use¶
The request and reply dataclasses are importable:
from paglets.examples.performance import (
PERFORMANCE_COLLECT,
BenchmarkRequest,
PerformanceBenchmarkAgent,
PerformanceCollectRequest,
)
from paglets.patterns.operations import OperationClient
from paglets.serialization.codec import dataclass_to_wire
proxy = self.context.create_paglet(PerformanceBenchmarkAgent)
client = OperationClient(proxy)
summary = client.call(
PERFORMANCE_COLLECT,
PerformanceCollectRequest(
request=dataclass_to_wire(
BenchmarkRequest(duration_seconds=0.5, disk_size_bytes=64 * 1024 * 1024)
),
timeout=120.0,
)
)
Most applications should use the paglets examples perf CLI unless they need to
embed benchmark collection into another paglet workflow.