-
ReadResult — Result of reading metadata from a single comic archive.
comicbox.process¶
source module comicbox.process
Parallel processing for large-scale comic metadata reading.
Classes
Functions
-
iter_process_files — Yield (path, (ReadResult, exception_or_None)) as each file completes.
-
process_files — Process multiple comic files in parallel via ProcessPoolExecutor.
-
aread_metadata — Read metadata from a single comic file in a thread executor.
source class ReadResult()
Bases : TypedDict
Result of reading metadata from a single comic archive.
The envelope fields (metadata_mtime, page_count, file_type) are populated cheaply on every successful read and are the source of truth for archive-level state tracking. tags carries the parsed metadata payload and is None when extraction was skipped — either because the embedded metadata mtime hadn’t advanced past old_mtime or because the caller passed full_metadata=False.
Distinguishing “skip” from “extracted-but-empty” is the contract that lets callers preserve existing tag links when the archive’s tags haven’t changed; an empty {} would mean “extracted, no tags found”, which would force the caller to clear those links.
source iter_process_files(paths: Iterable[Path | str], config: ComicboxSettings | Mapping | None = None, logger: Any = None, fmt: MetadataFormats = MetadataFormats.COMICBOX_YAML, max_workers: int | None = None, old_mtime_map: Mapping[str, datetime.datetime] | None = None, worker_log_config: Mapping | None = None, *, full_metadata: bool = True) → Generator[tuple[Path, tuple[ReadResult, BaseException | None]], None, None]
Yield (path, (ReadResult, exception_or_None)) as each file completes.
All per-path failures — submit-time, worker-raised, or pool-broken — are delivered as the second element of the tuple rather than raised, so a single bad path cannot abort the whole run. On failure the ReadResult is the empty sentinel (all fields None); inspect the exception, not the result, to detect failure.
worker_log_config: optional dict of {“level”, “format”, “sink”} used to re-initialize loguru inside each worker so subprocess log output matches the caller’s format. The dict must be picklable; pass sink as “stdout”/”stderr”/path string rather than a file object.
source process_files(paths: Iterable[Path | str], config: ComicboxSettings | Mapping | None = None, logger: Any = None, fmt: MetadataFormats = MetadataFormats.COMICBOX_YAML, max_workers: int | None = None, worker_log_config: Mapping | None = None) → dict[Path, tuple[ReadResult, BaseException | None]]
Process multiple comic files in parallel via ProcessPoolExecutor.
source async aread_metadata(path: Path | str, config: ComicboxSettings | Mapping | None = None, fmt: MetadataFormats = MetadataFormats.COMICBOX_YAML) → ReadResult
Read metadata from a single comic file in a thread executor.