0002_capture-archive-storage-design
Design daily Chronicle frame and written-context preservation without keeping th...
Document Metadata
- title: 0002 - Capture Archive And Storage Design
- description: Design daily Chronicle frame and written-context preservation without keeping the full archive locally.
- status: completed
- lastUpdated: "2026-06-04 14:27 ET (America/New_York)"
- owner: Product/Engineering
- priority: high
- projectType: child
- parentProject: 0001_public-site-context-layer-setup prog
Document Metadata
- title: 0002 - Capture Archive And Storage Design
- description: Design daily Chronicle frame and written-context preservation without keeping the full archive locally.
- status: completed
- lastUpdated: "2026-06-04 14:27 ET (America/New_York)"
- owner: Product/Engineering
- priority: high
- projectType: child
- parentProject: 0001_public-site-context-layer-setup
- programTrack: capture-archive
0002 - Capture Archive And Storage Design
Goals
- Preserve Chronicle screen recordings every day before
$TMPDIRrolls off. - Preserve written context alongside visual context: Chronicle summaries, memory registry files, raw memory notes, and rollout summaries.
- Avoid retaining the full image archive on the local machine.
- Keep raw captures private while enabling a later redacted public dataset.
- Separate image/blob storage from metadata/search indexing.
Recommended Architecture
- Rolling local spool: keep only a short local buffer, such as 1-3 days, for recovery and processing.
- Private object storage: upload raw daily/hourly bundles to Backblaze B2, Cloudflare R2, S3, or Google Cloud Storage.
- Written context archive: store
~/.codex/memoriesexcluding its internal.git, so images stay linked to summaries, memory registry entries, and rollout summaries. - Derived public storage: store only reviewed redacted derivatives in a separate public bucket or CDN path.
- Disposable thumbnails: regenerate thumbnails from the active local spool or redacted derivatives; do not treat thumbnails as canonical.
Implemented v1 Storage
- Provider: AWS S3.
- Bucket:
chronicle-visualizer-raw-private-528049652889-us-east-1. - Region/profile:
us-east-1, AWS profiledefault. - AWS account guardrail: uploads refuse accounts other than
528049652889. - Client encryption: GPG symmetric AES256 before upload.
- Local passphrase:
~/.config/chronicle-visualizer/archive-passphrase,0600. - Local emergency root:
/Users/maggielerman/ChronicleArchiveEmergency. - Retention: 72 hours for local raw/delta artifacts after successful upload receipts; logs and receipts default to 30 days.
Storage Guidance
Google Drive is acceptable as a personal backup target, but it is not ideal as the primary archive for high-volume daily screenshot files. Prefer object storage for the raw archive, with daily or hourly bundles to avoid millions of tiny files.
Private git repositories are also technically possible, but they are not recommended for this project because raw captures are large binary history, hard to purge completely, easy to accidentally push into public history later, and awkward to host without Git LFS. Use a separate private archive target instead.
Suggested layout:
chronicle-raw-private/
frames/YYYY/MM/DD/display-1/hour-16.tar.zst
ocr/YYYY/MM/DD/display-1/segment.ocr.jsonl.zst
written-context/YYYY/MM/DD/written-context.tar.zst
chronicle-public-redacted/
frames/YYYY/MM/DD/display-1/frame-2026-06-04T16-19-00Z.jpg
manifests/YYYY/MM/DD/public-manifest.json
Scope
In Scope
- Daily capture strategy.
- Private raw archive model.
- Public derivative boundary.
- Metadata/index design.
- Retention and deletion considerations.
Out Of Scope
- Publishing any raw captures.
- Public redaction workflow implementation.
- Long-term cold-storage lifecycle tuning after several days of real archive volume.
- Metadata database implementation beyond upload receipts and manifests.
Success Criteria
- A first archival script can copy or bundle frames from
$TMPDIR/chronicle/screen_recording/before rollover. - Raw archive paths never overlap with public derivative paths.
- Written-context snapshots are archived with the same cadence as frames.
- Metadata index can answer date, project, source, written-context snapshot, redaction status, and publication status queries.
- Local disk use can be bounded by a configurable retention window.
Checkpoint Log
Checkpoint 01 - 2026-06-04 12:42 ET (America/New_York)
Completed Since Prior Checkpoint
- Documented object-storage-first approach for large daily capture preservation.
- Rejected database-as-blob-store as the primary image storage model.
- Preserved database/index usage for metadata and search.
Next Checkpoint Targets
- Choose provider shortlist and cost model.
- Prototype a dry-run archive command that bundles one day/hour without uploading.
- Define retention policy for local spool and private raw bucket.
Checkpoint 02 - 2026-06-04 13:14 ET (America/New_York)
Completed Since Prior Checkpoint
- Confirmed the rolling frame buffer had already dropped the 6-7am ET screenshots.
- Added
scripts/archive/chronicle-snapshot.mjsfor compressed emergency raw snapshots. - Added
scripts/archive/chronicle-incremental.mjsfor recurring preservation of new/changed Chronicle files. - Added
scripts/archive/install-launch-agent.mjsand installed LaunchAgentcom.maggielerman.chronicle-visualizer.archive. - Preserved the current buffer to
/Users/maggielerman/ChronicleArchiveEmergency.
Evidence
- Compressed snapshot:
/Users/maggielerman/ChronicleArchiveEmergency/raw/2026/06/04/2026-06-04T17-12-05Z-screen-recording.tar.zst - Snapshot manifest:
/Users/maggielerman/ChronicleArchiveEmergency/raw/2026/06/04/2026-06-04T17-12-05Z-screen-recording.manifest.json - Incremental latest state:
/Users/maggielerman/ChronicleArchiveEmergency/latest-incremental.json - LaunchAgent plist:
/Users/maggielerman/Library/LaunchAgents/com.maggielerman.chronicle-visualizer.archive.plist - Runbook:
DOCS/development/archive-runbook.md
Current State
- The Google Drive CloudStorage path timed out during archive directory creation, so it is not the active archive destination.
- Emergency local archive is active and currently scheduled every 10 minutes.
- The emergency archive should be moved to durable private object storage as soon as a provider is selected.
Next Checkpoint Targets
- Choose remote archive destination and wire the incremental archive to it.
- Add retention cleanup for local emergency archive after remote sync is verified.
- Add a restore/verify command that lists bundle contents and validates manifest counts.
Checkpoint 03 - 2026-06-04 13:22 ET (America/New_York)
Completed Since Prior Checkpoint
- Expanded archive scripts from screen frames plus Chronicle resources to screen frames plus full written context under
~/.codex/memories. - Preserved 376 written-context files, excluding
.git, via incremental archive. - Created a compressed
written-contextbundle alongside the fresh screen-recording bundle.
Evidence
- Written-context bundle:
/Users/maggielerman/ChronicleArchiveEmergency/raw/2026/06/04/2026-06-04T17-22-26Z-written-context.tar.zst - Written-context manifest:
/Users/maggielerman/ChronicleArchiveEmergency/raw/2026/06/04/2026-06-04T17-22-26Z-written-context.manifest.json - Screen bundle from same snapshot:
/Users/maggielerman/ChronicleArchiveEmergency/raw/2026/06/04/2026-06-04T17-22-26Z-screen-recording.tar.zst - Incremental written context root:
/Users/maggielerman/ChronicleArchiveEmergency/incremental/written-context/
Next Checkpoint Targets
- Add restore verification for both screen and written-context bundles.
- Define a storage manifest that links frames to written context snapshots by timestamp.
- Select remote storage and sync the emergency archive off-machine.
Checkpoint 04 - 2026-06-04 14:05 ET (America/New_York)
Completed Since Prior Checkpoint
- Added S3 setup and verification tooling with account guardrail for AWS account
528049652889. - Created bucket
chronicle-visualizer-raw-private-528049652889-us-east-1inus-east-1. - Applied S3 block-public-access, versioning, and default SSE AES256.
- Added client-side GPG AES256 encryption before upload for bundles and manifests.
- Generated local passphrase file at
~/.config/chronicle-visualizer/archive-passphrasewith0600permissions. - Extended the 10-minute LaunchAgent flow so incremental runs copy locally, bundle copied files, encrypt the delta, upload to S3, and write receipts.
- Added remote verify, restore dry-run, and local prune dry-run tooling.
- Corrected written-context snapshot bundling to archive from the manifest file list so
~/.codex/memories/.gitis excluded from written-context bundles.
Evidence
- S3 setup receipt:
/Users/maggielerman/ChronicleArchiveEmergency/latest-s3-setup.json - Latest incremental state:
/Users/maggielerman/ChronicleArchiveEmergency/latest-incremental.json - Latest snapshot upload receipt:
/Users/maggielerman/ChronicleArchiveEmergency/latest-snapshot-upload.json - Corrected written-context object:
raw/snapshots/2026/06/04/2026-06-04T18-03-41Z-written-context.tar.zst.gpg - Screen snapshot object:
raw/snapshots/2026/06/04/2026-06-04T17-59-36Z-screen-recording.tar.zst.gpg - Restore dry-run confirmed
MEMORY.md,memory_summary.md, rollout summaries, Chronicle resources, and.gitexclusion. - LaunchAgent:
/Users/maggielerman/Library/LaunchAgents/com.maggielerman.chronicle-visualizer.archive.plist
Current State
- Durable private S3 storage is active.
- Scheduled 10-minute incremental uploads are active.
- Remote verify passes with no missing referenced keys.
- Prune dry-run shows no local files eligible yet because the 72-hour retention window has not elapsed.
Next Checkpoint Targets
- Add lifecycle/cold-storage policy after several successful daily uploads.
- Add redaction review pipeline before any public dataset or public visual timeline.
- Add a metadata index that links frames, OCR sidecars, written-context snapshots, inferred project tags, and redaction/publication status.
Checkpoint 05 - 2026-06-04 14:27 ET (America/New_York)
Completed Since Prior Checkpoint
- Confirmed AWS storage setup against the live bucket.
- Confirmed the bucket resolves to AWS account
528049652889. - Confirmed S3 public access block, versioning, and default SSE AES256 remain enabled.
- Ran a fresh incremental archive pass at
2026-06-04T18:12:19Z. - Confirmed the latest incremental delta uploaded 36 new/changed screen files and 1 new written-context Chronicle memory summary.
- Confirmed remote verification passes with no missing referenced keys.
- Confirmed the 10-minute LaunchAgent is active with last exit code
0. - Accepted the v1 storage setup as complete.
Evidence
- Latest delta object:
raw/deltas/2026/06/04/2026-06-04T18-12-19Z-delta.tar.zst.gpg - Latest delta receipt:
receipts/2026/06/04/2026-06-04T18-12-19Z-upload-receipt.json - Full screen snapshot object:
raw/snapshots/2026/06/04/2026-06-04T17-59-36Z-screen-recording.tar.zst.gpg - Corrected written-context snapshot object:
raw/snapshots/2026/06/04/2026-06-04T18-03-41Z-written-context.tar.zst.gpg - Current remote verification: 12 encrypted raw objects, 6 receipts, 0 missing referenced keys.
Completion State
- Project
0002is complete for v1 durable private storage. - Follow-up lifecycle/cost controls move to backlog project
0005. - Public redaction and publication safety remain tracked by project
0003.
Risks
- Object storage costs can grow quickly if every frame is retained at full resolution.
- Daily bundles improve sync performance but make single-frame retrieval require an index and extraction path.
- Cloud backup does not solve publication safety; redaction remains a separate gate.
- Local emergency archive can grow quickly if remote storage is not connected soon.
Open Questions
- None for v1 durable private storage completion.
MAGGIE TODO
- None for this completed project.
Provenance
- Source file:
DOCS/PROJECTS/completed/0002_capture-archive-storage-design.md - Source URL: https://github.com/maggielerman/chronicle-visualizer/blob/main/DOCS/PROJECTS/completed/0002_capture-archive-storage-design.md
Dataset Preview
- Raw CSV row/table content is available in the source artifact.
Metadata
- Created
- Not recorded
- Last updated
- Not recorded