Note

    0002_capture-archive-storage-design

    Design daily Chronicle frame and written-context preservation without keeping th...

    Document Metadata

    • title: 0002 - Capture Archive And Storage Design
    • description: Design daily Chronicle frame and written-context preservation without keeping the full archive locally.
    • status: completed
    • lastUpdated: "2026-06-04 14:27 ET (America/New_York)"
    • owner: Product/Engineering
    • priority: high
    • projectType: child
    • parentProject: 0001_public-site-context-layer-setup prog

    Document Metadata

    • title: 0002 - Capture Archive And Storage Design
    • description: Design daily Chronicle frame and written-context preservation without keeping the full archive locally.
    • status: completed
    • lastUpdated: "2026-06-04 14:27 ET (America/New_York)"
    • owner: Product/Engineering
    • priority: high
    • projectType: child
    • parentProject: 0001_public-site-context-layer-setup
    • programTrack: capture-archive

    0002 - Capture Archive And Storage Design

    Goals

    • Preserve Chronicle screen recordings every day before $TMPDIR rolls off.
    • Preserve written context alongside visual context: Chronicle summaries, memory registry files, raw memory notes, and rollout summaries.
    • Avoid retaining the full image archive on the local machine.
    • Keep raw captures private while enabling a later redacted public dataset.
    • Separate image/blob storage from metadata/search indexing.

    Recommended Architecture

    • Rolling local spool: keep only a short local buffer, such as 1-3 days, for recovery and processing.
    • Private object storage: upload raw daily/hourly bundles to Backblaze B2, Cloudflare R2, S3, or Google Cloud Storage.
    • Written context archive: store ~/.codex/memories excluding its internal .git, so images stay linked to summaries, memory registry entries, and rollout summaries.
    • Derived public storage: store only reviewed redacted derivatives in a separate public bucket or CDN path.
    • Disposable thumbnails: regenerate thumbnails from the active local spool or redacted derivatives; do not treat thumbnails as canonical.

    Implemented v1 Storage

    • Provider: AWS S3.
    • Bucket: chronicle-visualizer-raw-private-528049652889-us-east-1.
    • Region/profile: us-east-1, AWS profile default.
    • AWS account guardrail: uploads refuse accounts other than 528049652889.
    • Client encryption: GPG symmetric AES256 before upload.
    • Local passphrase: ~/.config/chronicle-visualizer/archive-passphrase, 0600.
    • Local emergency root: /Users/maggielerman/ChronicleArchiveEmergency.
    • Retention: 72 hours for local raw/delta artifacts after successful upload receipts; logs and receipts default to 30 days.

    Storage Guidance

    Google Drive is acceptable as a personal backup target, but it is not ideal as the primary archive for high-volume daily screenshot files. Prefer object storage for the raw archive, with daily or hourly bundles to avoid millions of tiny files.

    Private git repositories are also technically possible, but they are not recommended for this project because raw captures are large binary history, hard to purge completely, easy to accidentally push into public history later, and awkward to host without Git LFS. Use a separate private archive target instead.

    Suggested layout:

    chronicle-raw-private/
      frames/YYYY/MM/DD/display-1/hour-16.tar.zst
      ocr/YYYY/MM/DD/display-1/segment.ocr.jsonl.zst
      written-context/YYYY/MM/DD/written-context.tar.zst
    
    chronicle-public-redacted/
      frames/YYYY/MM/DD/display-1/frame-2026-06-04T16-19-00Z.jpg
      manifests/YYYY/MM/DD/public-manifest.json
    

    Scope

    In Scope

    • Daily capture strategy.
    • Private raw archive model.
    • Public derivative boundary.
    • Metadata/index design.
    • Retention and deletion considerations.

    Out Of Scope

    • Publishing any raw captures.
    • Public redaction workflow implementation.
    • Long-term cold-storage lifecycle tuning after several days of real archive volume.
    • Metadata database implementation beyond upload receipts and manifests.

    Success Criteria

    • A first archival script can copy or bundle frames from $TMPDIR/chronicle/screen_recording/ before rollover.
    • Raw archive paths never overlap with public derivative paths.
    • Written-context snapshots are archived with the same cadence as frames.
    • Metadata index can answer date, project, source, written-context snapshot, redaction status, and publication status queries.
    • Local disk use can be bounded by a configurable retention window.

    Checkpoint Log

    Checkpoint 01 - 2026-06-04 12:42 ET (America/New_York)

    Completed Since Prior Checkpoint

    • Documented object-storage-first approach for large daily capture preservation.
    • Rejected database-as-blob-store as the primary image storage model.
    • Preserved database/index usage for metadata and search.

    Next Checkpoint Targets

    • Choose provider shortlist and cost model.
    • Prototype a dry-run archive command that bundles one day/hour without uploading.
    • Define retention policy for local spool and private raw bucket.

    Checkpoint 02 - 2026-06-04 13:14 ET (America/New_York)

    Completed Since Prior Checkpoint

    • Confirmed the rolling frame buffer had already dropped the 6-7am ET screenshots.
    • Added scripts/archive/chronicle-snapshot.mjs for compressed emergency raw snapshots.
    • Added scripts/archive/chronicle-incremental.mjs for recurring preservation of new/changed Chronicle files.
    • Added scripts/archive/install-launch-agent.mjs and installed LaunchAgent com.maggielerman.chronicle-visualizer.archive.
    • Preserved the current buffer to /Users/maggielerman/ChronicleArchiveEmergency.

    Evidence

    • Compressed snapshot: /Users/maggielerman/ChronicleArchiveEmergency/raw/2026/06/04/2026-06-04T17-12-05Z-screen-recording.tar.zst
    • Snapshot manifest: /Users/maggielerman/ChronicleArchiveEmergency/raw/2026/06/04/2026-06-04T17-12-05Z-screen-recording.manifest.json
    • Incremental latest state: /Users/maggielerman/ChronicleArchiveEmergency/latest-incremental.json
    • LaunchAgent plist: /Users/maggielerman/Library/LaunchAgents/com.maggielerman.chronicle-visualizer.archive.plist
    • Runbook: DOCS/development/archive-runbook.md

    Current State

    • The Google Drive CloudStorage path timed out during archive directory creation, so it is not the active archive destination.
    • Emergency local archive is active and currently scheduled every 10 minutes.
    • The emergency archive should be moved to durable private object storage as soon as a provider is selected.

    Next Checkpoint Targets

    • Choose remote archive destination and wire the incremental archive to it.
    • Add retention cleanup for local emergency archive after remote sync is verified.
    • Add a restore/verify command that lists bundle contents and validates manifest counts.

    Checkpoint 03 - 2026-06-04 13:22 ET (America/New_York)

    Completed Since Prior Checkpoint

    • Expanded archive scripts from screen frames plus Chronicle resources to screen frames plus full written context under ~/.codex/memories.
    • Preserved 376 written-context files, excluding .git, via incremental archive.
    • Created a compressed written-context bundle alongside the fresh screen-recording bundle.

    Evidence

    • Written-context bundle: /Users/maggielerman/ChronicleArchiveEmergency/raw/2026/06/04/2026-06-04T17-22-26Z-written-context.tar.zst
    • Written-context manifest: /Users/maggielerman/ChronicleArchiveEmergency/raw/2026/06/04/2026-06-04T17-22-26Z-written-context.manifest.json
    • Screen bundle from same snapshot: /Users/maggielerman/ChronicleArchiveEmergency/raw/2026/06/04/2026-06-04T17-22-26Z-screen-recording.tar.zst
    • Incremental written context root: /Users/maggielerman/ChronicleArchiveEmergency/incremental/written-context/

    Next Checkpoint Targets

    • Add restore verification for both screen and written-context bundles.
    • Define a storage manifest that links frames to written context snapshots by timestamp.
    • Select remote storage and sync the emergency archive off-machine.

    Checkpoint 04 - 2026-06-04 14:05 ET (America/New_York)

    Completed Since Prior Checkpoint

    • Added S3 setup and verification tooling with account guardrail for AWS account 528049652889.
    • Created bucket chronicle-visualizer-raw-private-528049652889-us-east-1 in us-east-1.
    • Applied S3 block-public-access, versioning, and default SSE AES256.
    • Added client-side GPG AES256 encryption before upload for bundles and manifests.
    • Generated local passphrase file at ~/.config/chronicle-visualizer/archive-passphrase with 0600 permissions.
    • Extended the 10-minute LaunchAgent flow so incremental runs copy locally, bundle copied files, encrypt the delta, upload to S3, and write receipts.
    • Added remote verify, restore dry-run, and local prune dry-run tooling.
    • Corrected written-context snapshot bundling to archive from the manifest file list so ~/.codex/memories/.git is excluded from written-context bundles.

    Evidence

    • S3 setup receipt: /Users/maggielerman/ChronicleArchiveEmergency/latest-s3-setup.json
    • Latest incremental state: /Users/maggielerman/ChronicleArchiveEmergency/latest-incremental.json
    • Latest snapshot upload receipt: /Users/maggielerman/ChronicleArchiveEmergency/latest-snapshot-upload.json
    • Corrected written-context object: raw/snapshots/2026/06/04/2026-06-04T18-03-41Z-written-context.tar.zst.gpg
    • Screen snapshot object: raw/snapshots/2026/06/04/2026-06-04T17-59-36Z-screen-recording.tar.zst.gpg
    • Restore dry-run confirmed MEMORY.md, memory_summary.md, rollout summaries, Chronicle resources, and .git exclusion.
    • LaunchAgent: /Users/maggielerman/Library/LaunchAgents/com.maggielerman.chronicle-visualizer.archive.plist

    Current State

    • Durable private S3 storage is active.
    • Scheduled 10-minute incremental uploads are active.
    • Remote verify passes with no missing referenced keys.
    • Prune dry-run shows no local files eligible yet because the 72-hour retention window has not elapsed.

    Next Checkpoint Targets

    • Add lifecycle/cold-storage policy after several successful daily uploads.
    • Add redaction review pipeline before any public dataset or public visual timeline.
    • Add a metadata index that links frames, OCR sidecars, written-context snapshots, inferred project tags, and redaction/publication status.

    Checkpoint 05 - 2026-06-04 14:27 ET (America/New_York)

    Completed Since Prior Checkpoint

    • Confirmed AWS storage setup against the live bucket.
    • Confirmed the bucket resolves to AWS account 528049652889.
    • Confirmed S3 public access block, versioning, and default SSE AES256 remain enabled.
    • Ran a fresh incremental archive pass at 2026-06-04T18:12:19Z.
    • Confirmed the latest incremental delta uploaded 36 new/changed screen files and 1 new written-context Chronicle memory summary.
    • Confirmed remote verification passes with no missing referenced keys.
    • Confirmed the 10-minute LaunchAgent is active with last exit code 0.
    • Accepted the v1 storage setup as complete.

    Evidence

    • Latest delta object: raw/deltas/2026/06/04/2026-06-04T18-12-19Z-delta.tar.zst.gpg
    • Latest delta receipt: receipts/2026/06/04/2026-06-04T18-12-19Z-upload-receipt.json
    • Full screen snapshot object: raw/snapshots/2026/06/04/2026-06-04T17-59-36Z-screen-recording.tar.zst.gpg
    • Corrected written-context snapshot object: raw/snapshots/2026/06/04/2026-06-04T18-03-41Z-written-context.tar.zst.gpg
    • Current remote verification: 12 encrypted raw objects, 6 receipts, 0 missing referenced keys.

    Completion State

    • Project 0002 is complete for v1 durable private storage.
    • Follow-up lifecycle/cost controls move to backlog project 0005.
    • Public redaction and publication safety remain tracked by project 0003.

    Risks

    • Object storage costs can grow quickly if every frame is retained at full resolution.
    • Daily bundles improve sync performance but make single-frame retrieval require an index and extraction path.
    • Cloud backup does not solve publication safety; redaction remains a separate gate.
    • Local emergency archive can grow quickly if remote storage is not connected soon.

    Open Questions

    • None for v1 durable private storage completion.

    MAGGIE TODO

    • None for this completed project.

    Provenance

    Dataset Preview

    • Raw CSV row/table content is available in the source artifact.

    Metadata

    Created
    Not recorded
    Last updated
    Not recorded