comicbox.process

source module comicbox.process

Parallel processing for large-scale comic metadata reading.

Classes

  • ReadResult Result of reading metadata from a single comic archive.

Functions

  • iter_process_files Yield (path, (ReadResult, exception_or_None)) as each file completes.

  • process_files Process multiple comic files in parallel via ProcessPoolExecutor.

  • aread_metadata Read metadata from a single comic file in a thread executor.

source class ReadResult()

Bases : TypedDict

Result of reading metadata from a single comic archive.

The envelope fields (metadata_mtime, page_count, file_type) are populated cheaply on every successful read and are the source of truth for archive-level state tracking. tags carries the parsed metadata payload and is None when extraction was skipped — either because the embedded metadata mtime hadn’t advanced past old_mtime or because the caller passed full_metadata=False.

Distinguishing “skip” from “extracted-but-empty” is the contract that lets callers preserve existing tag links when the archive’s tags haven’t changed; an empty {} would mean “extracted, no tags found”, which would force the caller to clear those links.

source iter_process_files(paths: Iterable[Path | str], config: ComicboxSettings | Mapping | None = None, logger: Any = None, fmt: MetadataFormats = MetadataFormats.COMICBOX_YAML, max_workers: int | None = None, old_mtime_map: Mapping[str, datetime.datetime] | None = None, worker_log_config: Mapping | None = None, *, full_metadata: bool = True)Generator[tuple[Path, tuple[ReadResult, BaseException | None]], None, None]

Yield (path, (ReadResult, exception_or_None)) as each file completes.

All per-path failures — submit-time, worker-raised, or pool-broken — are delivered as the second element of the tuple rather than raised, so a single bad path cannot abort the whole run. On failure the ReadResult is the empty sentinel (all fields None); inspect the exception, not the result, to detect failure.

worker_log_config: optional dict of {“level”, “format”, “sink”} used to re-initialize loguru inside each worker so subprocess log output matches the caller’s format. The dict must be picklable; pass sink as “stdout”/”stderr”/path string rather than a file object.

source process_files(paths: Iterable[Path | str], config: ComicboxSettings | Mapping | None = None, logger: Any = None, fmt: MetadataFormats = MetadataFormats.COMICBOX_YAML, max_workers: int | None = None, worker_log_config: Mapping | None = None)dict[Path, tuple[ReadResult, BaseException | None]]

Process multiple comic files in parallel via ProcessPoolExecutor.

source async aread_metadata(path: Path | str, config: ComicboxSettings | Mapping | None = None, fmt: MetadataFormats = MetadataFormats.COMICBOX_YAML)ReadResult

Read metadata from a single comic file in a thread executor.