Architecture and Core Mechanics

image-syncer is designed to be highly concurrent, memory-efficient, and network-driven. It operates on raw container images directly over the network without requiring local disk storage or dependency on a local Docker daemon.

Technical Workflow Pipeline

The following diagram illustrates the lifecycle of a synchronization task, showing how configuration rules are converted into concurrent execution steps:

[ User Input / Configuration Files ]
                  │
                  ▼
             [ RuleTask ]
                  │ (Resolves source tags / applies regex patterns)
                  ▼
             [ URLTask ]
                  │ (Downloads source manifests, checks target, filters platform)
                  ├─────────────────────────────────────────┐
                  ▼                                         ▼
             [ BlobTask 1 ] ... [ BlobTask N ]      [ ManifestTask ]
                  │ (Checks existence at target)            │
                  │ (Streams layer payload on-the-fly)       │
                  ▼                                         │
       [ Ants Goroutine Pool ]                              │
                  │                                         ▼
                  └──────────────────────────────► [ Atomic Counter ]
                                                            │ (Triggers manifest write)
                                                            ▼
                                                   [ Target Registry ]

The Goroutine Concurrency Engine

At the core of image-syncer is a structured task execution pipeline managed by the panjf2000/ants goroutine pool. This pooling library reuses a fixed set of goroutines, reducing CPU overhead from constantly spawning threads during large-scale copies.

Thread Allocation (--proc): This parameter sets the maximum number of concurrent workers allowed in the pool. Instead of creating unbounded, concurrent network connections, workers pull and execute tasks sequentially from a shared list.
Thread-Safe Concurrency Primitives:
- concurrent.List: Wraps Go's standard container/list in a mutual exclusion lock (sync.Mutex), enabling workers to safely retrieve (PopFront), update, and insert tasks.
- concurrent.Counter: Uses Go's sync/atomic library to track when manifest sub-tasks finish, safely coordinating parent and child tasks.
- concurrent.ImageList: A thread-safe structure used to compile and export the list of successfully synchronized images when execution finishes.

Task Execution Stages

The synchronization lifecycle is broken down into four distinct task stages:

1. RuleTask (Tag Evaluation and Discovery)

This stage parses the user's configurations. If a rule does not specify a tag (e.g., quay.io/coreos/etcd), image-syncer queries the source registry via its catalog API to list all tags in the repository. It applies regex filters if specified, resolving the initial rule into a series of explicit source-to-destination pairs. Each pair is then packaged and scheduled as a URLTask.

2. URLTask (Manifest Resolution and Platform Filtering)

Each URLTask manages a single source/destination pair. The task retrieves the source image's manifest. By default, it queries the destination registry to check if the manifest digest matches. If the manifest already exists on the destination registry, the task exits early, saving network bandwidth. However, if the --force flag is set, this check is skipped, and the transfer continues.

For multi-architecture manifests, the tool applies any OS/Architecture filters configured via CLI flags, then splits the remaining transfer into BlobTask and ManifestTask steps.

3. BlobTask (Direct Layer Streaming)

Each image is composed of configuration JSON blobs and layer blobs. A BlobTask handles a single layer blob. Before starting a download stream, the tool queries the target registry (TryReusingBlob). If the target already contains a blob with the same digest and size, the download is skipped.

If the blob is missing from the destination, the tool downloads the layer stream from the source registry and pipes it directly to the target upload endpoint. This is done on-the-fly using in-memory buffers; no raw layers are unpacked or written to the host's filesystem. Once the transfer completes, the tool decrements the associated concurrent.Counter.

4. ManifestTask (Target Registration)

To maintain image integrity, the manifest registry entry can only be written after all of its associated layer blobs have been uploaded. The ManifestTask waits until the associated concurrent.Counter reaches zero, indicating all layer transfers are complete. Once verified, the manifest JSON is pushed to the target registry to register the image tag.

For multi-architecture images, the sub-manifests are written first, followed by the top-level parent index.

Manifest Formats and Media Types

image-syncer uses libraries from github.com/containers/image/v5 to support several manifest and container media types:

Docker Schema V2 (Single and List): Supports application/vnd.docker.distribution.manifest.v2+json and application/vnd.docker.distribution.manifest.list.v2+json.
OCI Manifests & Indices: Conforms to application/vnd.oci.image.manifest.v1+json and application/vnd.oci.image.index.v1+json.
Docker Schema V1 (Deprecated): Supported for older images, though with limitations. Because Schema V1 lacks modern layer digest structures, some modern target registries (such as Harbor or cloud-native registry services) will reject these uploads. image-syncer does not automatically upgrade Schema V1 manifests to Schema V2.

Platform-Based Filtering Engine

Multi-architecture manifest lists and OCI indexes contain sub-manifests for various operating systems and architectures. Transferring all versions of an image can consume significant network bandwidth and storage.

You can limit which architectures are transferred by using the --os and --arch flags. During the URLTask stage, image-syncer inspects the root manifest structure:

It parses the platforms list within the source manifest list or OCI index.
It filters out sub-manifest entries that do not match the specified OS/Architecture constraints (e.g., keeping only linux/amd64).
It schedules BlobTask operations only for the layers associated with the matching architectures.
It modifies the parent manifest list JSON to reference only the synchronized architectures, then pushes this updated index to the destination registry.

For details on setting these performance and concurrency options, see the Command-Line Interface Reference.