Ryanhub - file viewer
filename: assistant/plan.md
branch: main
back to repo
# Local-First Assistant - Current Plan (Mar 2026)

## Product Direction

Build a personal AI assistant that is:

- Local-first by default (local models/services whenever practical)
- Config-driven (`config/config.yaml` as the source of truth)
- Integration-based (new capabilities added as tools/integrations, not hardcoded hacks)
- Single-binary friendly for core runtime (`go run .` / `go build`)

Primary UX goals:

- Fast daily utility (calendar, memory, briefings, concise chat)
- Collaborative planning (shared scratchpad for short/long-term goals)
- Progressive multimodal expansion (audio + images)

---

## Current State (Shipped)

### Core runtime

- Go HTTP server with:
  - `POST /ask`
  - `POST /ask/stream` (SSE)
  - `GET /api/status`
- Single Ollama-compatible LLM client (`llm/llm.go`)
- Agent loop with tool-calling rounds, telemetry, and context budget controls
- Config validation and feature gating in `config/config.go`

### Enabled capability stack (from current config)

- Weather tools
- News tools
- Calendar tools
- Memory tools
- Git log tools
- Code tools present but disabled by config

### Persistence and data model

- SQLite store in `memory/`
- Active tables:
  - `memories`
  - `calendar_items` (date-only scheduling via `due_at`, undated supported)
  - `calendar_pending` (approval queue)
  - `calendar_meta` (`revision` for refresh/polling)

Removed legacy scope:

- Old task toolchain and task table were removed to keep the codebase slim.

---

## Calendar + Pending Workflow (Shipped)

### API surface

- `GET /api/calendar/items`
- `GET /api/calendar/items/{id}`
- `POST /api/calendar/items`
- `PUT /api/calendar/items/{id}`
- `DELETE /api/calendar/items/{id}`
- `GET /api/calendar/pending`
- `PUT /api/calendar/pending/{id}` (kept for compatibility)
- `POST /api/calendar/pending/{id}/confirm`
- `POST /api/calendar/pending/{id}/reject`

### Tooling

- `calendar_list_range`
- `calendar_propose_change` (proposal queue only, user confirms in UI)

### Behavior highlights

- Date-only scheduling with optional undated items
- Range normalization for date/datetime inputs in list queries
- Pending approvals rendered in the UI below composer
- Day detail includes:
  - separate undated section
  - 14-day rolling agenda (today first)
  - scrollable list area
- Done items remain visible with status pill styling
- Completing an undated item assigns due date to selected day (fallback to today)

---

## Integration Architecture (Target Shape)

Treat every major feature as a pluggable integration:

- Config-controlled enable/disable
- Health-checkable
- Exposed to the agent through stable tools
- Local endpoint first, optional remote fallback

Suggested structure:

```text
integrations/
  scratchpad/
  stt/
  tts/
  vision/
  image_gen/
```

Each integration should define:

- Config schema
- Runtime client (local service/api)
- Minimal server endpoints (if UI uploads/playback required)
- Tool wrappers for agent use

---

## Next Major Build: Shared Scratchpad (P1)

### Why first

- Highest leverage for collaborative planning and long/short-term goals
- Creates durable shared context between user and model
- Makes future audio/vision workflows more coherent

### MVP scope

- SQLite table(s): scratchpads + optional revisions
- API:
  - `GET /api/scratchpad`
  - `PUT /api/scratchpad`
  - optional `GET /api/scratchpad/history`
- Tools:
  - `scratchpad_read`
  - `scratchpad_update`
- UI panel with sections:
  - Now
  - Next
  - Blockers
  - Notes
- Revision counter + polling pattern similar to calendar

### Acceptance criteria

- User and model can both update a shared artifact safely
- Changes appear quickly in UI
- Agent can reference scratchpad reliably in subsequent turns

---

## Multimodal Roadmap

### P2: Audio (STT + TTS), local-first

Goal: hands-free interaction and spoken responses.

- STT integration options:
  - `faster-whisper` or `whisper.cpp` service
- TTS integration options:
  - Piper (default local)
- API candidates:
  - `POST /api/audio/transcribe`
  - `POST /api/audio/speak`
- Tool candidates:
  - `transcribe_audio`
  - `speak_text`
- UI:
  - push-to-talk
  - replay assistant audio

Acceptance:

- Reliable transcription for short prompts
- Low-latency local speech synthesis

### P3: Vision + Image Generation

Goal: image-aware assistant and local image creation pipeline.

- Vision (analyze images):
  - upload image -> parse/describe -> optional memory/scratchpad entry
- Image generation:
  - local SD/Flux backend via integration endpoint
- API candidates:
  - `POST /api/images/analyze`
  - `POST /api/images/generate`
- Tool candidates:
  - `analyze_image`
  - `generate_image`
- UI:
  - upload + preview
  - generation gallery/history

Acceptance:

- Agent can reason over user-provided images
- Agent can generate and return local image artifacts

---

## Engineering Principles

- Keep files small, explicit, and testable.
- Prefer typed structs over loose maps in hot paths.
- Keep DB ownership centralized in `memory/`.
- Keep tool descriptions precise to reduce model drift.
- Add capability only when config + UI + tool + telemetry are all wired.
- Preserve local-first defaults; remote is optional opt-in.

---

## Immediate Execution Plan

1. Implement scratchpad storage + API + UI + tools.
2. Add scratchpad-aware prompt guidance and tool usage rules.
3. Add regression tests around calendar range semantics and pending flows.
4. Add integration scaffold for audio (STT/TTS) with one local provider each.
5. Add vision/image generation integrations behind config flags.

---

## Success Criteria (Near-Term)

- Calendar and pending flows remain stable and predictable.
- Scratchpad becomes the default workspace for ongoing goals.
- Audio loop works locally with acceptable latency.
- Image workflows are usable without cloud dependency.
- The project stays slim enough to understand and iterate quickly.