3035_blog-editorial-ai-and-voice-program
New umbrella program for editorial AI workflow, WhatsApp corpus ingestion, voic...
Dates
- Created
- Not recorded
- Last updated
- Not recorded
Document Metadata
- title: 3035 - Blog Editorial AI and Voice Program
- description: New umbrella program for editorial AI workflow, WhatsApp corpus ingestion, voice calibration, and governed blog production
- status: active
- lastUpdated: "2026-03-25 11:18 ET (America/New_York)"
- owner: Product/Engineering
3035 - Blog Editorial AI and Voice Program ## Goals - Build an AI-as
3035 - Blog Editorial AI and Voice Program
Goals
- Build an AI-assisted editorial system that can source ideas, manage a content calendar, generate briefs/outlines/drafts, and support blog operations without flattening the brand voice.
- Use WhatsApp exports as a high-signal research corpus for community language, concerns, provider/product recommendations, and visual culture while protecting member privacy.
- Add review/authorship controls so sensitive or personal-story posts always require human signoff and can selectively use Maggie's named byline.
Why This Program Exists
- The current blog platform is already live and strong on publishing UX, feeds, SEO, and admin editing, but it is still a single-author Markdown workflow without:
- source-ingestion pipelines
- editorial-intelligence tooling
- selective byline controls
- signoff gates for sensitive content
- structured voice/tone governance
- The next body of work is large enough to justify its own umbrella program rather than squeezing it into the remaining
3001stabilization backlog.
Dependency-Ordered Project Sequence
3036_editorial-platform-architecture-and-repo-boundaries.md(completed)3037_corpus-governance-redaction-and-privacy-policy.md(completed)3038_whatsapp-intake-and-multimodal-normalization.md(active)3039_editorial-intelligence-and-retrieval-layer.md(active)3040_content-calendar-brief-and-draft-pipeline.md(completed)3041_review-signoff-and-byline-controls.md(active)3042_voice-calibration-and-model-routing.md(backlog)
Why This Order
3036comes first because we need stable whole-repo system boundaries, usable repo organization, and editorial-platform architecture before choosing jobs, storage, schemas, or approval surfaces.3037comes second because corpus-governance and privacy rules need to constrain ingestion design before large-scale data processing begins.3038then builds the normalized corpus pipeline once the boundaries and redaction rules are explicit.3039depends on the normalized corpus and turns it into retrieval-ready editorial intelligence.3040depends on retrieval and annotations so the content calendar, briefs, and drafts are grounded rather than generic.3041then integrates approval, byline, and publish-safety controls into the existing blog workflow.3042comes last so model-tiering and any fine-tune decision are informed by the real tasks, corpus shape, and approval workflow instead of guesswork.
Execution Status
- Current state:
active - Phase: umbrella execution with
3036completed after the repo-wide architecture, organization, governance-definition, and safe reorganization lane - Current downstream state:
3037is completed after collaborative walkthrough/sign-off locked the governing corpus/privacy boundary for downstream work3038remains active, but only as a conditional source-layer follow-up lane now that the phase-1 corpus contract and persisted pilot import are stable3039remains active and is back on the critical path because weak-signal WhatsApp topic retrieval is currently the main blocker on useful brief/draft quality, even though live model-backed brief synthesis is now verified3040is now completed after collaborative walkthrough/sign-off approved the phase-1 candidate -> calendar -> brief -> draft delivery, live run-key closeout reporting, and verifiedready_for_3041_handoffevidence on the persisted pilot run3041remains active, but the current highest-value work is now upstream quality improvement rather than more admin workflow expansion
- Blocked by: none
Planning Inputs
- Existing repo baseline:
DOCS/features/blog.md- completed blog streams
3031,3032, and3033
- External planning input reviewed at kickoff:
- March 21, 2026 exported ChatGPT planning transcript on WhatsApp chat AI training
- Core planning direction carried forward from that intake:
- retrieval first
- redaction before downstream AI use
- multimodal corpus support (text + media + screenshots + voice notes)
- optional fine-tuning only after runtime tasks and output shape are clear
Non-Negotiable Operating Decisions
- Keep the raw WhatsApp exports in a restricted raw vault and do all downstream AI work from a redacted derivative corpus.
- Preserve provider, clinic, hospital, product, medication, and service recommendations as knowledge-layer data unless the surrounding context would re-identify a member.
- Treat this as AI-assisted redaction plus human review for high-risk items, not as fully autonomous anonymization.
- Default blog authorship remains generic/organizational unless a post is explicitly approved for Maggie's named byline.
- Any post with strong personal-story framing, intimate health detail, direct first-person lived-experience claims, or near-direct source quotation must require human signoff before publishing.
- Do not start with raw-corpus fine-tuning; first reach a working retrieval-grounded editorial pipeline and only then decide whether a curated fine-tune is worth the extra cost/maintenance.
Scope
In Scope
- WhatsApp export intake architecture for large text + media corpora.
- AI-assisted redaction, pseudonymization, OCR, transcription, media annotation, and sensitivity scoring.
- Structured editorial research corpus design:
- thread/chunk records
- topic and pain-point labels
- phrase banks
- quote candidates
- provider/product recommendation extraction
- visual motif and media tags
- Retrieval-powered editorial workflows:
- article-idea sourcing
- content-calendar planning
- brief generation
- outline generation
- draft generation with internal source traceability
- Voice/tone governance using redacted source material and human-approved exemplars.
- Review, approval, authorship, and signoff workflow for blog publishing.
- Stack/model evaluation for cost-quality tradeoffs.
Out of Scope
- Direct training on raw, unredacted personal chat logs.
- Fully autonomous publishing with no human review.
- Impersonation of named community members or preserving identifiable private speech as a style target.
- Replacing the current blog platform before extending it.
Success Criteria
- A single approved architecture exists for raw-vault storage, redaction pipeline, structured corpus, retrieval layer, and editorial workflow orchestration.
- One pilot WhatsApp corpus can be ingested end to end into a redacted, searchable editorial dataset.
- The system can reliably generate:
- grounded blog ideas
- content calendar candidates
- article briefs
- first-draft posts with internal source traceability
- Sensitive posts are always routed through explicit human review/signoff.
- Byline policy is implemented at the data/workflow level:
- generic byline default
- Maggie byline only on approved posts
- The program reaches a clear model-routing recommendation instead of defaulting to the most expensive model for every step.
Recommended Architecture Direction
Storage And Data Layers
- Raw vault:
- original WhatsApp exports and media stored outside normal app flows with restricted access
- Redacted working corpus:
- structured text/media records derived from the raw vault
- Editorial intelligence layer:
- thread summaries
- topic clusters
- recommendation entities
- quote candidates
- voice markers
- content-angle suggestions
- Retrieval layer:
- searchable chunks plus editorial annotations for grounded drafting
Stack Recommendation
- Start repo-native instead of jumping straight to
n8n. - Recommended first stack:
- TypeScript batch scripts and server-side app actions in this repo
- Neon/Postgres for metadata, workflow state, approvals, and editorial objects
- private object storage for raw and redacted media artifacts
- retrieval index added only after the normalized corpus schema is stable
n8ncan still be useful later for notifications, inbox routing, calendar syncing, or cross-tool approvals, but it should not be the first core processing engine for parsing/redaction/governance logic.- If orchestration complexity grows beyond simple repo-native jobs, evaluate a job/workflow layer after schema + review rules are stable rather than before.
Model Strategy Recommendation
- Use a tiered model mix, not one model everywhere.
- Lower-cost models are appropriate for:
- first-pass parsing
- candidate extraction
- metadata normalization
- coarse classification
- Higher-end models are appropriate for:
- multimodal understanding on difficult media
- subtle voice/tone analysis
- sensitive-subject classification
- high-quality brief and draft generation
- approval-ready synthesis
- Do not assume expensive models are required for every ingestion step; reserve them for stages where quality materially changes editorial output.
Workflow Guardrails
Default Authorship Policy
- Most posts stay on the generic site/organization byline.
- Named Maggie byline is reserved for posts explicitly approved as personal/editorial voice pieces.
Mandatory Signoff Categories
- Personal-story or memoir-style posts
- Posts that read as Maggie's own lived experience
- Posts covering fertility, pregnancy loss, mental health, relationship trauma, or other intimate/identity-heavy topics in a first-person or advisory voice
- Posts using direct or near-direct quotes from source conversations
- Posts whose recommendation strength or emotional framing could be interpreted as personal endorsement
Source-Material Policy
- Provider/product/clinic recommendations may be preserved when identity-safe.
- Direct identifiers and quasi-identifiers for community members must be redacted, generalized, or withheld.
- High-risk media must receive human review before it can inform editorial outputs.
Proposed Workstreams
- Corpus Governance And Privacy
- data handling rules
- redaction standards
- approval boundaries
- source-permission posture
- WhatsApp Intake And Normalization
- parsing exports
- media manifests
- OCR/transcription pipeline
- structured record schema
- Editorial Intelligence Layer
- topic extraction
- quote bank
- provider/product recommendation maps
- voice and phrase bank
- Content Operations Layer
- idea backlog
- article brief generation
- calendar planning
- freshness/reuse controls
- Drafting, Review, And Authorship Controls
- sensitive-content routing
- signoff states
- named-byline policy
- publish-safe approval flow
- Model And Automation Optimization
- cost/quality benchmarks
- model routing
- optional fine-tune decision
- optional automation tooling expansion
Milestones
- Milestone 1: Program architecture, governance rules, and stack decision locked.
- Milestone 2: Pilot WhatsApp corpus ingested into a redacted multimodal research dataset.
- Milestone 3: Retrieval-grounded editorial tools produce blog ideas, briefs, and calendar candidates.
- Milestone 4: Draft-generation and signoff workflow integrated with the blog publishing system.
- Milestone 5: Byline controls, sensitive-content gates, and model-routing recommendation finalized.
Dependencies
- Builds on the existing blog platform documented in
DOCS/features/blog.md. - Needs a small carry-forward maintenance lane from
3008/3013so governance and walkthrough debt do not accumulate while this program becomes the lead focus.
Risks
- Privacy and trust risk if redaction is treated as fully solved by AI without human review.
- Voice quality risk if the system overfits to private conversati
...[truncated for intake]
Provenance
- Source file:
DOCS/PROJECTS/active/3035_blog-editorial-ai-and-voice-program.md - Source URL: https://github.com/maggielerman/smc-directory/blob/main/DOCS/PROJECTS/active/3035_blog-editorial-ai-and-voice-program.md