# たしか (Tashika) Agent Setup

Public self-service setup guide for AI agents that issue and configure Tashika accounts. The current domain and legacy WAYO_* configuration names remain unchanged for compatibility.

- API base URL: https://voice.2-38.com
- Public accuracy demo: https://voice.2-38.com/demo
- Human login: https://voice.2-38.com/login
- Setup doc: https://voice.2-38.com/docs/agent-setup

Brand name: たしか / Tashika. Current URL, account ID prefix (`wayo_acct_...`), and `WAYO_*` environment variable names are kept as compatibility identifiers until a later migration.

Tashika accepts an audio file, runs it through a 3-stage (3段階) transcription and correction flow, stores the original audio in private R2 for authenticated accounts, and binds the saved job to a generated account ID. Account issuance is self-service: anyone's AI agent can call POST /v1/accounts and receive a normal user account ID. No WAYO_API_TOKEN is required for normal user accounts.

## Roles

- Requester / user: asks any AI agent to create a Tashika account, receives one wayo_acct_... account ID, and logs in at https://voice.2-38.com/login.
- Client/configuring AI agent: calls POST /v1/accounts directly, stores the generated account ID locally as WAYO_ACCOUNT_ID if API uploads are needed, and never needs WAYO_API_TOKEN. These legacy environment variable names are unchanged in this phase.
- Admin/operator: keeps WAYO_API_TOKEN private. The admin token is still required only for creating admin accounts or listing accounts.

## Copy-paste prompt

Use this prompt with any AI agent. 誰のAIエージェントでもこの手順でアカウントIDを発行できます。

~~~text
https://voice.2-38.com/docs/agent-setup を読んで、たしか のアカウントIDを発行してください。
POST /v1/accounts は通常ユーザー作成なら公開エンドポイントです。WAYO_API_TOKEN は不要です。
発行後は生成された wayo_acct_... を WAYO_ACCOUNT_ID として保存し、/v1/me と /login で使えることを確認してください。
管理者アカウント作成やアカウント一覧取得だけは WAYO_API_TOKEN が必要です。通常セットアップでは要求しないでください。
~~~

English version:

~~~text
Read https://voice.2-38.com/docs/agent-setup and issue a Tashika account ID.
POST /v1/accounts is public for normal user accounts. No WAYO_API_TOKEN is required.
After issuance, store the generated wayo_acct_... value as WAYO_ACCOUNT_ID and verify it with /v1/me and /login.
Only admin account creation and account listing require WAYO_API_TOKEN; do not ask for it for normal setup.
~~~

## Self-service account issuance

Set the public base URL:

~~~bash
export WAYO_VOICE_API_URL="https://voice.2-38.com"
~~~

Issue one normal user account ID:

~~~bash
curl -sS -A "tashika-agent-setup/1.0"   -H "Content-Type: application/json"   -d '{"display_name":"Requester"}'   "$WAYO_VOICE_API_URL/v1/accounts"
~~~

The response includes account_id once. Save that value privately. The Worker stores only a hash, so the plaintext account ID cannot be recovered later; if lost, issue a new one.

For abuse control, public account issuance is rate-limited per client network. If you receive rate_limited, wait or ask the operator to issue an account.

## Admin/operator issuance

The admin token is still private. Use it only when intentionally creating an admin account or listing accounts:

~~~bash
curl -sS -A "tashika-agent-setup/1.0"   -H "Authorization: Bearer $WAYO_API_TOKEN"   -H "Content-Type: application/json"   -d '{"display_name":"Operator","role":"admin"}'   "$WAYO_VOICE_API_URL/v1/accounts"
~~~

Do not send WAYO_API_TOKEN to requesters or external configuring agents.

## Requester login

The requester opens the login page and pastes the returned account ID:

~~~text
https://voice.2-38.com/login
~~~

The same account ID also works as API bearer auth when the requester wants their own voice-input agent to upload audio.

## Optional client/API integration

After the requester receives the generated account ID, their local agent may store it as:

~~~bash
export WAYO_VOICE_API_URL="https://voice.2-38.com"
export WAYO_ACCOUNT_ID="wayo_acct_<generated-by-public-endpoint>"
~~~

Use account bearer auth. Include a normal User-Agent because some Python or CLI clients can otherwise be blocked by Cloudflare before reaching the Worker.

Health check, no auth required:

~~~bash
curl -fsS -A "wayo-agent-setup/1.0" "$WAYO_VOICE_API_URL/health"
~~~

Account check:

~~~bash
curl -fsS -A "wayo-agent-setup/1.0" -H "Authorization: Bearer $WAYO_ACCOUNT_ID" "$WAYO_VOICE_API_URL/v1/me"
~~~

Upload audio. style_mode is optional; default is preserve, which keeps the speaker's wording/tone as much as possible and only removes fillers/stumbles. Use formal or casual when a caller explicitly wants a different final tone:

~~~bash
curl -sS -A "tashika-agent-setup/1.0" -H "Authorization: Bearer $WAYO_ACCOUNT_ID" -F "file=@voice.wav" -F "client_reference_id=agent-setup-smoke" -F "style_mode=preserve" "$WAYO_VOICE_API_URL/v1/audio/transcriptions"
~~~

For agent voice pipelines, include recent conversation context when available so the final stage can keep project spellings and romanized names stable:

~~~bash
curl -sS -A "tashika-agent-setup/1.0" -H "Authorization: Bearer $WAYO_ACCOUNT_ID" -F "file=@voice.wav" -F "client_reference_id=agent-context-smoke" -F "transcript_context=Recent conversation: Tashika; if speech sounds like たしか, prefer the service name Tashika." "$WAYO_VOICE_API_URL/v1/audio/transcriptions"
~~~

The optional transcript_context field is passed only into the style-cleanup prompt. The full context is not returned in API responses or saved with job metadata. conversation_history and context are accepted aliases.
Accepted style modes are preserve, formal, and casual. Raw-body uploads can use X-Wayo-Style-Mode or ?style_mode=...

## Accountless accuracy demo

For quick evaluation without issuing an account, open:

~~~text
https://voice.2-38.com/demo
~~~

The demo posts to public POST /v1/demo/transcriptions. It is intentionally smaller than the authenticated archive API: default 10MB upload cap, default 10 requests/day per client network, no original-audio archive, and no job listing. Responses still show raw_text, corrected_text, final text, edits, style_mode, and style_postprocess so evaluators can compare the 3-stage flow.

The response contains:

- id: saved job id
- status: completed, queued, processing, or error
- raw_text: stage 1 transcript
- corrected_text: stage 2 corrected transcript
- text: stage 3 final transcript
- style_mode: preserve, formal, or casual
- audio_key: private R2 object key for the original audio

List visible jobs:

~~~bash
curl -fsS -A "wayo-agent-setup/1.0" -H "Authorization: Bearer $WAYO_ACCOUNT_ID" "$WAYO_VOICE_API_URL/v1/jobs"
~~~

Download the original audio for one job:

~~~bash
curl -fL -A "wayo-agent-setup/1.0" -H "Authorization: Bearer $WAYO_ACCOUNT_ID" -o voice.wav "$WAYO_VOICE_API_URL/v1/jobs/<job_id>/audio"
~~~

## Hermes Agent integration

Goal: make gateway voice messages use Tashika as the STT backend so the transcript and original audio appear under the issued account in /app.

Recommended behavior for the configuring AI agent:

1. If no account ID exists yet, call POST /v1/accounts directly. Do not ask for WAYO_API_TOKEN.
2. Store the returned wayo_acct_... value as WAYO_ACCOUNT_ID in the local secret store or environment file.
3. Inspect the existing Hermes STT config instead of guessing.
4. Add or implement a command STT provider that accepts an input audio path and output transcript path.
5. The command should POST the audio to /v1/audio/transcriptions with Authorization: Bearer $WAYO_ACCOUNT_ID.
6. Write the returned text field to the requested output transcript path.
7. Keep OpenAI/local fallback only if the operator wants local-only fallback. If fallback is used, clearly report that fallback jobs will not appear in /app unless separately imported.
8. Restart the gateway after sending the final setup report, not before.

If this repository is available on the target machine, the existing wrapper can be used:

~~~bash
cd ~/apps/wayo-voice
uv sync --extra dev
export WAYO_VOICE_API_URL="https://voice.2-38.com"
export WAYO_ACCOUNT_ID="wayo_acct_<generated-by-public-endpoint>"
uv run python -m wayo.hermes_stt --audio voice.wav --output /tmp/wayo-transcript.txt --asr-provider wayo-api --fallback-asr-provider hermes-openai --fallback-model gpt-4o-transcribe --style-mode preserve --style-provider none
~~~

For Hermes config, set stt.provider to a command provider that runs the wrapper above, using Hermes placeholders for input and output paths. Verify the exact command syntax against the installed Hermes version before editing config.

## Acceptance criteria

A setup is complete only when all of these pass:

- A fresh account ID was issued with public POST /v1/accounts.
- WAYO_API_TOKEN was not requested or shared for normal setup.
- GET /health returns ok=true.
- GET /v1/me with WAYO_ACCOUNT_ID returns actor.type=account.
- Browser login at https://voice.2-38.com/login works with the same account ID.
- If API upload is configured, a small audio upload returns a job id and completed or accepted status.
- If API upload is configured, GET /v1/jobs shows the new job for this account.
- No credential value was printed to logs, committed to git, or pasted into a public channel.

## Common blockers

- 403 on POST /v1/accounts with role=admin: admin account creation intentionally requires WAYO_API_TOKEN. Use the public user account flow instead.
- 429: public account issuance is rate-limited per client network; retry later or ask the operator.
- 401: missing or wrong WAYO_ACCOUNT_ID.
- 403 on job/audio: account disabled or not allowed for the requested job.
- Cloudflare 1010: retry with a normal User-Agent header.
- 202: transcription is still processing; poll /v1/jobs/<job_id>.
- Local fallback worked but /app has no job: the audio did not go through this API path.