Initial commit: InkFlow — EPUB vers livre audio local (MLX/Kokoro)

This commit is contained in:
2026-06-21 00:10:11 +02:00
commit d3bb91394b
71 changed files with 8138 additions and 0 deletions

23
.gitignore vendored Normal file
View File

@@ -0,0 +1,23 @@
# Python
.venv/
__pycache__/
*.pyc
*.egg-info/
.pytest_cache/
# InkFlow : artefacts générés et sorties
data/
output/
# Node
node_modules/
# Échantillons audio (volumineux, non versionnés)
samples/
# Modèles / caches HF (au cas où téléchargés localement)
.cache/
models/
# OS
.DS_Store

10
.idea/.gitignore generated vendored Normal file
View File

@@ -0,0 +1,10 @@
# Default ignored files
/shelf/
/workspace.xml
# Editor-based HTTP Client requests
/httpRequests/
# Ignored default folder with query files
/queries/
# Datasource local storage ignored files
/dataSources/
/dataSources.local.xml

105
README.md Normal file
View File

@@ -0,0 +1,105 @@
# InkFlow
Transforme un **EPUB** en **livre audio**, 100 % en local sur Mac (Apple Silicon / MLX),
avec des modèles open-source. Sortie : **1 dossier par livre, 1 MP3 par chapitre**
(tags ID3 + cover), au format calqué sur un audiobook classique.
- **Analyse de texte** : Gemma via `mlx-lm` (segmentation narration/dialogue,
attribution des locuteurs, extraction du casting, prononciations).
- **Synthèse vocale** : backend pluggable —
- **Kokoro** : rapide, voix préréglées → previews / mono-narrateur.
- **Qwen3-TTS** : qualité + clonage par audio de référence → rendu final, casting par personnage.
- **Langue** : optimisé français (puis multilingue).
## Pré-requis
- macOS Apple Silicon (arm64), Python ≥ 3.11
- `ffmpeg` et `espeak-ng` :
```bash
brew install ffmpeg espeak-ng
```
## Installation
```bash
python3.13 -m venv .venv
source .venv/bin/activate
pip install -e backend # installe inkflow + dépendances
python backend/scripts/setup_models.py # vérifie l'env + télécharge les modèles MLX
```
> Kokoro en français nécessite `espeak-ng` ; InkFlow localise automatiquement
> `libespeak-ng.dylib` (sinon, exporter `PHONEMIZER_ESPEAK_LIBRARY`).
## Utilisation (CLI)
```bash
# 1. Parser l'EPUB -> data/<slug>/book.json + chapters/chNN.json
inkflow parse "samples/Colère de Tiamat, La - James S.A. Corey.epub"
# 2. Analyser (Gemma) -> analysis/chNN.json + cast.json
inkflow analyze la-colere-de-tiamat --chapter 5 # un chapitre
inkflow analyze la-colere-de-tiamat # tous les chapitres
# 3. Synthétiser un chapitre -> output/<livre>/NN-....mp3
inkflow render la-colere-de-tiamat 5 --backend kokoro # rapide
inkflow render la-colere-de-tiamat 5 --backend qwen3 --no-mono # qualité + multi-voix (M3)
# Infos
inkflow info la-colere-de-tiamat
```
(Sans installation `-e`, lancer depuis `backend/` via `python -m inkflow.cli …`.)
## Interface web
```bash
# 1. Build du frontend (une fois)
cd frontend && npm install && npm run build && cd ..
# 2. Lancer l'app (API + UI servie sur le même port)
inkflow serve # http://127.0.0.1:8000
```
L'UI permet : import EPUB par glisser-déposer, suivi temps réel des étapes
(WebSocket), édition du casting (personnage → voix, avec preview), édition du
dictionnaire de prononciation, choix du moteur (Kokoro/Qwen3) et rendu des
chapitres avec lecteur audio + téléchargement.
Pour le développement frontend avec rechargement à chaud :
```bash
inkflow serve # backend sur :8000
cd frontend && npm run dev # UI sur :5173 (proxy API/WS vers :8000)
```
## Architecture
```
backend/inkflow/
epub/parser.py EPUB -> book.json + texte par chapitre
analysis/gemma.py wrapper mlx-lm (Gemma)
analysis/segmenter.py narration/dialogue + locuteurs + casting
analysis/pronunciation.py
tts/base.py interface TTSBackend + VoiceSpec
tts/kokoro.py tts/qwen3.py tts/factory.py
audio/postprocess.py concat + normalisation + MP3 (ffmpeg) + cover
pipeline/render.py (segments + voix) -> MP3
store/artifacts.py persistance JSON (reprenable)
data/<slug>/ artefacts intermédiaires (json, wav, cover)
output/<livre>/ MP3 finaux (1 par chapitre)
voicebank/ clips de référence pour le clonage (M3)
```
## État d'avancement
- [x] **M1** — Parsing EPUB, analyse Gemma (segments + casting), CLI.
- [x] **M2** — TTS bout-en-bout (Kokoro/Qwen3), mono-narrateur → MP3 taggé + cover.
- [x] **M3** — Multi-voix : voice bank + auto-casting personnage → voix (clonage Qwen3).
- [x] **M4** — Interface web (FastAPI + WebSocket + React) : suivi, éditeurs casting/prononciation, previews.
- [x] **M5** — État reprenable (réconciliation avec les artefacts), run par lots via UI/CLI.
### Note sur les moteurs
- **Kokoro** : ~30 s/chapitre, voix distinctes par timbre (rendu rapide, brouillons).
- **Qwen3-TTS** : clonage des voix de la banque par personnage, qualité supérieure,
nettement plus lent — réservé au rendu final. Tout rendu est **repris** chapitre
par chapitre (relancer ne refait pas les MP3 déjà produits).

View File

View File

View File

@@ -0,0 +1,123 @@
"""Wrapper mlx-lm autour de Gemma pour l'analyse de texte.
Charge le modele paresseusement (une seule fois par process) et expose des
helpers de generation, dont un `generate_json` tolerant qui extrait le premier
objet/array JSON valide de la sortie du modele.
"""
from __future__ import annotations
import json
import re
from functools import lru_cache
from typing import Any, Optional
from ..settings import get_settings
# Bornes d'un bloc JSON dans une reponse potentiellement bavarde.
_JSON_SPAN_RE = re.compile(r"(\{.*\}|\[.*\])", re.DOTALL)
_FENCE_RE = re.compile(r"```(?:json)?\s*(.*?)```", re.DOTALL)
@lru_cache(maxsize=2)
def _load(model_id: str):
# Import paresseux : evite de charger mlx tant qu'on n'analyse pas.
from mlx_lm import load
return load(model_id)
class Gemma:
"""Petite facade autour de mlx-lm pour piloter Gemma."""
def __init__(self, model_id: Optional[str] = None):
self.model_id = model_id or get_settings().gemma_model
self._model = None
self._tokenizer = None
def _ensure_loaded(self) -> None:
if self._model is None:
self._model, self._tokenizer = _load(self.model_id)
def generate(
self,
prompt: str,
*,
system: Optional[str] = None,
max_tokens: Optional[int] = None,
temperature: Optional[float] = None,
) -> str:
"""Genere une reponse texte a partir d'un prompt (template de chat).
`max_tokens`/`temperature` non fournis -> valeurs des reglages courants.
"""
self._ensure_loaded()
settings = get_settings()
if max_tokens is None:
max_tokens = settings.gemma_max_tokens
if temperature is None:
temperature = settings.gemma_temperature
from mlx_lm import generate
from mlx_lm.sample_utils import make_sampler
messages = []
if system:
messages.append({"role": "system", "content": system})
messages.append({"role": "user", "content": prompt})
formatted = self._tokenizer.apply_chat_template(
messages, add_generation_prompt=True, tokenize=False
)
sampler = make_sampler(temp=temperature)
return generate(
self._model,
self._tokenizer,
prompt=formatted,
max_tokens=max_tokens,
sampler=sampler,
verbose=False,
)
def generate_json(
self,
prompt: str,
*,
system: Optional[str] = None,
max_tokens: Optional[int] = None,
temperature: Optional[float] = None,
retries: int = 1,
) -> Any:
"""Genere puis parse un JSON. Reessaie en cas d'echec de parsing.
`max_tokens`/`temperature` non fournis -> valeurs des reglages courants.
"""
last_err: Optional[Exception] = None
for attempt in range(retries + 1):
raw = self.generate(
prompt, system=system, max_tokens=max_tokens,
temperature=temperature if attempt == 0 else 0.0,
)
try:
return _extract_json(raw)
except Exception as exc: # noqa: BLE001
last_err = exc
raise ValueError(f"Reponse JSON invalide apres {retries + 1} essais: {last_err}")
def _extract_json(text: str) -> Any:
"""Extrait le premier objet/array JSON d'une reponse libre du modele.
Tolere le texte parasite avant/apres (y compris un 2e bloc) grace a
raw_decode, qui s'arrete au premier JSON complet.
"""
text = text.strip()
fence = _FENCE_RE.search(text)
if fence:
text = fence.group(1).strip()
decoder = json.JSONDecoder()
# Cherche le 1er debut de structure JSON et decode a partir de la.
for i, ch in enumerate(text):
if ch in "[{":
try:
obj, _ = decoder.raw_decode(text[i:])
return obj
except json.JSONDecodeError:
continue
raise ValueError("aucun JSON trouve dans la reponse")

View File

@@ -0,0 +1,59 @@
"""Dictionnaire de prononciation : application + proposition de candidats.
L'application est une simple reecriture de surface du texte (graphie guidee)
avant synthese. Les candidats (noms propres, termes SF) peuvent etre proposes
par Gemma puis valides par l'utilisateur dans l'UI.
"""
from __future__ import annotations
import re
from typing import Iterable
from ..models import Pronunciation, PronunciationEntry
from ..settings import get_settings
from .gemma import Gemma
def apply_pronunciation(text: str, pron: Pronunciation) -> str:
"""Remplace chaque terme actif par sa graphie phonetique (mot entier)."""
for entry in pron.entries:
if not entry.enabled or not entry.term:
continue
pattern = re.compile(rf"\b{re.escape(entry.term)}\b")
text = pattern.sub(entry.replacement, text)
return text
# Le prompt systeme est editable dans les reglages (settings.prompt_pronunciation).
def propose_pronunciations(text: str, gemma: Gemma, *, max_chars: int = 16000) -> list[PronunciationEntry]:
"""Propose des candidats de prononciation a valider."""
sample = text[:max_chars]
prompt = (
"Repere dans cet extrait les mots a risque de mauvaise prononciation par "
"une voix de synthese francaise. Pour chacun, propose une graphie "
"phonetique francaise (replacement) qui guide la prononciation.\n\n"
f"EXTRAIT:\n{sample}\n\n"
'Reponds par un tableau JSON: '
'[{"term":"Tiamat","replacement":"Tia-matt","note":"nom propre"}]'
)
result = gemma.generate_json(prompt, system=get_settings().prompt_pronunciation)
entries: list[PronunciationEntry] = []
for item in result:
if isinstance(item, dict) and item.get("term") and item.get("replacement"):
entries.append(PronunciationEntry(
term=str(item["term"]).strip(),
replacement=str(item["replacement"]).strip(),
note=item.get("note"),
))
return entries
def merge_pronunciations(
existing: Pronunciation, new: Iterable[PronunciationEntry]
) -> Pronunciation:
by_term = {e.term.lower(): e for e in existing.entries}
for e in new:
by_term.setdefault(e.term.lower(), e)
return Pronunciation(entries=list(by_term.values()))

View File

@@ -0,0 +1,622 @@
"""Segmentation narration/dialogue + attribution de locuteur + casting.
Approche hybride :
1. Pre-segmentation deterministe au niveau paragraphe (regles de ponctuation
francaise : un paragraphe commencant par un cadratin "" est une replique).
2. Gemma attribue un locuteur a chaque replique, en un seul appel par chapitre
(liste numerotee + contexte), et extrait le casting (personnages + attributs).
Le decoupage fin des incises ("..., dit-il") est laisse a une passe ulterieure ;
en v1 la replique entiere est portee par la voix du personnage.
"""
from __future__ import annotations
import re
from typing import Optional
from ..models import (
Cast,
Chapter,
ChapterAnalysis,
ChapterText,
Character,
Incise,
Segment,
SegmentType,
)
from ..settings import get_settings
from .gemma import Gemma
# Un paragraphe de dialogue commence par un cadratin (U+2014) ou un tiret long.
_DIALOGUE_LEAD_RE = re.compile(r"^\s*[—―]\s*")
# --- Detection des incises (inversion verbe-sujet francaise) ------------------
# Une incise est un groupe de narration insere dans une replique ("..., dit-il.").
# On exclut tu/nous/vous (imperatifs "Donne-le-moi", "Crois-tu ?") pour limiter
# les faux positifs. Voir `detect_incises` plus bas pour les deux passes
# (inversion verbe-pronom + nominale "lanca Drummer", conscience du casting).
_INCISE_PRON = r"(?:il|elle|on|ils|elles|je)"
# Verbe de parole, eventuellement reflechi ("s'ecria", "s'exclama").
_INCISE_VERB = r"(?:[A-Za-zÀ-ÿ]+['])?[A-Za-zÀ-ÿ]{2,}"
def segment_chapter_text(ct: ChapterText) -> list[Segment]:
"""Decoupe un chapitre en segments narration/dialogue (regles seules)."""
segments: list[Segment] = []
for para in ct.paragraphs:
if _DIALOGUE_LEAD_RE.match(para):
text = _DIALOGUE_LEAD_RE.sub("", para).strip()
segments.append(Segment(
type=SegmentType.DIALOGUE, text=text, speaker="?"))
else:
segments.append(Segment(
type=SegmentType.NARRATION, text=para, speaker="narrateur"))
return segments
# --- Attribution des locuteurs (Gemma) --------------------------------------
# Le prompt systeme est editable dans les reglages (settings.prompt_speakers).
_UNKNOWN = {"", "?", "inconnu", "narrateur"}
_CTX_CHARS = 160 # troncature du contexte narratif avant/apres
_CHUNK_MAX_DIALOGUES = 30 # repliques par appel (fiabilite du modele)
def attribute_speakers(
segments: list[Segment],
gemma: Gemma,
*,
characters: Optional[list[Character]] = None,
pov: Optional[str] = None,
) -> dict[int, str]:
"""Renseigne `speaker` pour chaque dialogue (mutation en place).
Fournit au modele la liste canonique enrichie des personnages (nom, genre,
description) et, pour chaque replique, le contexte narratif AVANT et APRES
(l'incise d'attribution est souvent placee apres : "— Bonjour. dit Marie.").
Renvoie une map {index_de_segment: confidence} ("high"/"medium"/"low"),
conservee en memoire (non persistee) pour piloter la 2e passe retroactive.
Une replique dont le nom rendu sort de la liste fournie est gardee mais
marquee "low" afin d'etre reexaminee.
"""
dialogues = [(i, s) for i, s in enumerate(segments)
if s.type is SegmentType.DIALOGUE]
if not dialogues:
return {}
# Repliques deja resolues (seed par incise) : montrees comme contexte fixe,
# jamais re-demandees au modele. Si tout est resolu, rien a faire.
locked = {i for i, s in dialogues if _is_resolved(s.speaker)}
if len(locked) == len(dialogues):
return {i: "high" for i, _ in dialogues}
hint = _speakers_hint(characters, pov)
valid = {c.name.strip().lower() for c in (characters or [])}
confidence: dict[int, str] = {}
for chunk in _chunk_dialogues(dialogues, segments, hint):
prompt = (
"Voici les repliques de dialogue d'un extrait, numerotees, avec la "
"narration qui precede et qui suit chaque replique. Les repliques "
"deja attribuees affichent (locuteur: X) : ne les modifie pas, "
"sers-t'en comme contexte (alternance des tours). Pour les AUTRES, "
"indique le personnage qui parle (recopie son nom depuis la liste "
"fournie ; 'inconnu' si vraiment indeterminable) et ta confiance "
"(high/medium/low)."
f"{hint}\n\n" + "\n".join(line for _, line in chunk) +
'\n\nReponds par un tableau JSON: '
'[{"i": 0, "speaker": "Holden", "confidence": "high"}, ...]'
)
result = gemma.generate_json(prompt, system=get_settings().prompt_speakers)
by_i: dict[int, dict] = {item["i"]: item for item in result
if isinstance(item, dict) and "i" in item}
for j, (seg_idx, _line) in enumerate(chunk):
if seg_idx in locked: # seed conserve
confidence[seg_idx] = "high"
continue
seg = segments[seg_idx]
item = by_i.get(j) or {}
speaker = (str(item.get("speaker") or "inconnu").strip()
or "inconnu")
conf = str(item.get("confidence") or "low").strip().lower()
if conf not in {"high", "medium", "low"}:
conf = "low"
# Nom hors liste connue -> on garde le nom mais on le rejuge.
if (valid and speaker.lower() not in _UNKNOWN
and speaker.lower() not in valid):
conf = "low"
seg.speaker = speaker
confidence[seg_idx] = conf
return confidence
def _speakers_hint(characters: Optional[list[Character]], pov: Optional[str]) -> str:
hint = ""
if characters:
lines = []
for c in characters:
attrs = c.gender or ""
desc = f"{c.description}" if c.description else ""
lines.append(f"- {c.name}" + (f" ({attrs})" if attrs else "") + desc)
hint += "\nPersonnages du chapitre:\n" + "\n".join(lines)
if pov:
hint += f"\nLe point de vue de ce chapitre est: {pov}."
return hint
def _is_resolved(speaker: str) -> bool:
"""Vrai si la replique a deja un locuteur sur (seed incise, etc.)."""
return (speaker or "").strip().lower() not in _UNKNOWN
def _dialogue_line(n: int, segments: list[Segment], idx: int) -> str:
seg = segments[idx]
# Replique deja resolue (ex: seed par incise) -> montree comme contexte fixe.
if _is_resolved(seg.speaker):
return f"[{n}] (locuteur: {seg.speaker}) REPLIQUE: {seg.text!r}"
before = _adjacent_narration(segments, idx, -1)
after = _adjacent_narration(segments, idx, +1)
parts = [f"[{n}]"]
if before:
parts.append(f"(avant: {before!r})")
parts.append(f"REPLIQUE: {seg.text!r}")
if after:
parts.append(f"(apres: {after!r})")
return " ".join(parts)
def _adjacent_narration(segments: list[Segment], idx: int, direction: int) -> str:
"""Texte de la narration immediatement adjacente (incise d'attribution)."""
j = idx + direction
if 0 <= j < len(segments) and segments[j].type is SegmentType.NARRATION:
return segments[j].text[:_CTX_CHARS]
return ""
def _chunk_dialogues(
dialogues: list[tuple[int, Segment]],
segments: list[Segment],
hint: str,
) -> list[list[tuple[int, str]]]:
"""Decoupe les repliques en lots tenant sous `_MAX_PROMPT_CHARS`.
Chaque lot est une liste de (index_segment, ligne_rendue) ; la ligne est
numerotee localement (0..k) pour le prompt, l'index segment sert au mapping
retour. Evite la troncature brutale sur les longs chapitres.
"""
budget = _MAX_PROMPT_CHARS - len(hint) - 400 # marge pour les consignes
chunks: list[list[tuple[int, str]]] = []
current: list[tuple[int, str]] = []
size = 0
for idx, _seg in dialogues:
line = _dialogue_line(len(current), segments, idx)
if current and (size + len(line) > budget
or len(current) >= _CHUNK_MAX_DIALOGUES):
chunks.append(current)
current = []
size = 0
line = _dialogue_line(0, segments, idx)
current.append((idx, line))
size += len(line) + 1
if current:
chunks.append(current)
return chunks
# --- Passe retroactive : re-resolution des repliques indeterminees ----------
# Le prompt systeme est editable (settings.prompt_speakers_refine).
def _refine_unknown_speakers(
segments: list[Segment],
gemma: Gemma,
*,
characters: Optional[list[Character]] = None,
confidence: dict[int, str],
) -> None:
"""2e passe : re-resout les repliques restees indeterminees/peu sures.
Chaque replique douteuse est presentee avec ses voisines de dialogue DEJA
identifiees (alternance des tours) et son contexte narratif, pour exploiter
l'information venant des repliques *suivantes*. Mutation en place ; aucun
appel Gemma si rien n'est douteux.
"""
dialogues = [(i, s) for i, s in enumerate(segments)
if s.type is SegmentType.DIALOGUE]
if not dialogues:
return
pos = {seg_idx: n for n, (seg_idx, _s) in enumerate(dialogues)}
doubtful = [seg_idx for seg_idx, _s in dialogues
if segments[seg_idx].speaker.strip().lower() in _UNKNOWN
or confidence.get(seg_idx) == "low"]
if not doubtful:
return
hint = _speakers_hint(characters, pov=None)
lines = []
for j, seg_idx in enumerate(doubtful):
n = pos[seg_idx]
ctx = []
if n > 0:
prev_idx = dialogues[n - 1][0]
ctx.append(f"replique precedente (dite par "
f"{segments[prev_idx].speaker}): "
f"{segments[prev_idx].text[:_CTX_CHARS]!r}")
before = _adjacent_narration(segments, seg_idx, -1)
if before:
ctx.append(f"narration avant: {before!r}")
after = _adjacent_narration(segments, seg_idx, +1)
if after:
ctx.append(f"narration apres: {after!r}")
if n < len(dialogues) - 1:
next_idx = dialogues[n + 1][0]
ctx.append(f"replique suivante (dite par "
f"{segments[next_idx].speaker}): "
f"{segments[next_idx].text[:_CTX_CHARS]!r}")
ctx_str = (" [" + " ; ".join(ctx) + "]") if ctx else ""
lines.append(f"[{j}]{ctx_str} REPLIQUE: {segments[seg_idx].text!r}")
prompt = (
"Repliques au locuteur indetermine. Pour chacune, en t'appuyant sur les "
"repliques voisines DEJA attribuees (alternance des tours) et le "
"contexte, indique qui parle (recopie le nom depuis la liste ; "
"'inconnu' si toujours indeterminable)."
f"{hint}\n\n" + "\n".join(lines) +
'\n\nReponds par un tableau JSON: [{"i": 0, "speaker": "Holden"}, ...]'
)
result = gemma.generate_json(_truncate(prompt),
system=get_settings().prompt_speakers_refine)
by_i = {item["i"]: item.get("speaker") for item in result
if isinstance(item, dict) and "i" in item}
for j, seg_idx in enumerate(doubtful):
new = (str(by_i.get(j) or "").strip())
if new and new.lower() not in _UNKNOWN:
segments[seg_idx].speaker = new
# --- Extraction du casting (Gemma) ------------------------------------------
# Le prompt systeme est editable dans les reglages (settings.prompt_characters).
def extract_characters(text: str, gemma: Gemma) -> list[Character]:
"""Extrait les personnages et leurs attributs (genre, age) d'un texte."""
prompt = (
"A partir de l'extrait suivant, liste les personnages qui parlent ou "
"sont nommes. Pour chacun, donne: name (nom court canonique), gender "
"(male/female/unknown), age (child/young/adult/old/unknown), et une "
"courte description. Ignore les figurants sans nom.\n\n"
f"EXTRAIT:\n{_truncate(text)}\n\n"
'Reponds par un tableau JSON: '
'[{"name":"Holden","gender":"male","age":"adult","description":"..."}]'
)
result = gemma.generate_json(prompt, system=get_settings().prompt_characters)
characters: list[Character] = []
for item in result:
if not isinstance(item, dict) or not item.get("name"):
continue
characters.append(Character(
name=str(item["name"]).strip(),
gender=_norm(item.get("gender")),
age=_norm(item.get("age")),
description=(item.get("description") or None),
))
return characters
def merge_characters(existing: list[Character], new: list[Character]) -> list[Character]:
"""Fusionne deux listes de personnages par nom (insensible a la casse)."""
by_key = {c.name.lower(): c for c in existing}
for c in new:
key = c.name.lower()
if key in by_key:
cur = by_key[key]
cur.gender = cur.gender or c.gender
cur.age = cur.age or c.age
cur.description = cur.description or c.description
else:
by_key[key] = c
return list(by_key.values())
def _norm(value) -> Optional[str]:
if not value:
return None
v = str(value).strip().lower()
return v if v and v != "unknown" else None
# --- Helpers -----------------------------------------------------------------
# Garde-fou de contexte (caracteres) pour rester dans une fenetre raisonnable.
_MAX_PROMPT_CHARS = 24000
def _truncate(text: str) -> str:
return text if len(text) <= _MAX_PROMPT_CHARS else text[:_MAX_PROMPT_CHARS]
# --- Detection des incises (deterministe, conscience du casting) -------------
# Les incises sont annotees par des bornes (offsets) sur la replique persistee
# (non destructif) ; le rendu les fait porter par la voix du narrateur. Deux
# passes complementaires :
# 1. inversion verbe-pronom ("dit-il", "coupa-t-elle") ;
# 2. nominale : verbe de parole + sujet connu (nom du casting OU nom de role,
# ex: "compatit Holden", "lanca Drummer", "informa le soldat").
# La passe nominale s'appuie sur la liste des personnages -> peu de faux positifs
# et permet d'extraire le locuteur explicite (seeding de l'attribution).
# Pronom objet eventuel devant le verbe ("lui demanda un garde").
_CLITIC = r"(?:lui|leur|nous|vous|me|te|se|y|en|[mts]['])"
# Formes conjuguees de verbes de parole (3e pers., passe simple / present /
# imparfait). Liste curee : on prefere rater une incise que d'en inventer une.
_SPEECH_VERBS = {
"dit", "disait", "redit", "répondit", "repondit", "répond", "repond",
"répondait", "repondait", "demanda", "demandait", "demande", "interrogea",
"questionna", "ecria", "écria", "exclama", "enquit", "lança", "lanca",
"lançait", "lance", "murmura", "chuchota", "souffla", "soupira", "ajouta",
"ajoute", "reprit", "poursuivit", "poursuit", "continua", "enchaîna",
"enchaina", "fit", "faisait", "remarqua", "observa", "nota", "déclara",
"declara", "affirma", "assura", "rétorqua", "retorqua", "répliqua",
"repliqua", "riposta", "objecta", "protesta", "insista", "renchérit",
"rencherit", "acquiesça", "acquiesca", "admit", "avoua", "convint",
"concéda", "conceda", "rectifia", "corrigea", "précisa", "precisa",
"expliqua", "raconta", "annonça", "annonca", "proclama", "ordonna",
"commanda", "supplia", "implora", "gémit", "gemit", "grogna", "ronchonna",
"maugréa", "maugrea", "marmonna", "glissa", "lâcha", "lacha", "coupa",
"interrompit", "conclut", "compléta", "completa", "suggéra", "suggera",
"proposa", "promit", "jura", "menaça", "menaca", "ironisa", "plaisanta",
"railla", "cria", "hurla", "tonna", "gronda", "rugit", "susurra",
"compatit", "salua", "appela", "héla", "hela", "interpella", "balbutia",
"bredouilla", "bafouilla", "gloussa", "ricana", "siffla", "tempêta",
"tempeta", "rétorque", "lâche", "informa", "renseigna", "indiqua",
"rappela", "avertit", "prévint", "prevint", "intima", "rétorquait",
"lançait", "questionnait", "reconnut", "constata", "répéta", "repeta",
}
# Noms de role pouvant etre sujet d'une incise ("informa le soldat").
_ROLE_NOUNS = {
"garde", "soldat", "sentinelle", "gardien", "prêtre", "pretre", "homme",
"femme", "fille", "garçon", "garcon", "vieille", "vieillard", "capitaine",
"lieutenant", "sergent", "général", "general", "amiral", "officier", "voix",
"inconnu", "inconnue", "étranger", "etranger", "enfant", "serviteur",
"servante", "messager", "domestique", "médecin", "medecin",
}
# Mots vides ignores quand on indexe les tokens d'un nom de personnage.
_NAME_STOP = {
"le", "la", "les", "un", "une", "de", "du", "des", "monsieur", "madame",
"mademoiselle", "m", "mme", "mlle", "mr", "dr", "docteur", "saint", "sainte",
}
# Ponctuations qui terminent la partie parlee : si l'incise les suit, tout le
# reste de la replique est de la narration (la parole est finie). Apres une
# simple virgule au contraire, le dialogue reprend apres l'incise.
_SENTENCE_FINAL = {"", ".", "!", "?", ""}
def _incise_end(text: str, close_end: int, lead: str) -> int:
"""Fin effective de l'incise : jusqu'au bout de la replique si la parole
etait deja close a gauche (`.`/`!`/`?`/`…` ou debut), sinon la cloture."""
return len(text) if lead in _SENTENCE_FINAL else close_end
# Passe 1 : inversion verbe-(t-)pronom, ancree sur une ponctuation a gauche
# (virgule, point, ?, !, …) ou le debut de la replique.
_INVERSION_RE = re.compile(
r"(?P<lead>[,.!?…]|^)\s*"
r"(?P<inc>" + _INCISE_VERB + r"-(?:t-)?" + _INCISE_PRON +
r"(?:\s+[^.!?…»\",;]*?)?)" # complements eventuels ("dit-il en souriant")
r"(?P<close>[.!?…,])", # cloture : ponctuation forte OU virgule
re.IGNORECASE,
)
def _inversion_spans(text: str) -> list[tuple[int, int]]:
return [(m.start("inc"), _incise_end(text, m.end("close"), m.group("lead")))
for m in _INVERSION_RE.finditer(text)]
def _name_token_index(names) -> dict[str, str]:
"""Index token -> nom canonique (tokens distinctifs uniquement).
Un token partage par plusieurs personnages est ambigu et ecarte.
"""
idx: dict[str, str] = {}
ambiguous: set[str] = set()
for name in names or ():
for tok in re.split(r"[^\wÀ-ÿ]+", name):
t = tok.lower()
if len(t) < 2 or t in _NAME_STOP:
continue
if t in idx and idx[t] != name:
ambiguous.add(t)
else:
idx[t] = name
for t in ambiguous:
idx.pop(t, None)
return idx
# Nom propre : initiale majuscule (motif sensible a la casse).
_PROPER = r"[A-ZÀ-Ÿ][\wÀ-ÿ’'\-]+"
_REJECT = object() # le sujet n'en est pas un -> pas une incise
def _classify_subject(subj: str, idx: dict[str, str]):
"""Locuteur porte par le sujet d'une incise nominale.
- personnage connu -> nom canonique ;
- nom propre (capitalise) inconnu -> nom de surface (seed quand meme : le
texte le nomme, independamment de la fiabilite de l'extraction) ;
- nom de role generique ("le soldat") -> None (incise reelle, pas de seed) ;
- mot quelconque -> _REJECT (pas une incise).
"""
low = subj.lower()
if low in idx:
return idx[low]
if low in _ROLE_NOUNS:
return None
if subj[:1].isupper() and len(low) >= 2 and low not in _NAME_STOP:
return subj.strip("'")
return _REJECT
def _nominal_matches(text: str, names) -> list[tuple[int, int, Optional[str]]]:
"""Passe 2 : (start, end, locuteur) pour chaque incise nominale.
Une incise nominale = verbe de parole + sujet (nom du casting, nom propre,
ou nom de role). Le sujet nom propre est seede meme absent du casting.
"""
idx = _name_token_index(names)
literals = sorted(set(idx) | _ROLE_NOUNS, key=len, reverse=True)
lit_alt = "|".join(re.escape(s) for s in literals)
# Sujet : nom connu/role (insensible casse) OU nom propre (capitalise, sensible
# casse pour ne pas happer un determiner "un"/"le"). Pas d'IGNORECASE global.
subj_alt = (f"(?i:{lit_alt})|{_PROPER}") if lit_alt else _PROPER
verbs = "|".join(re.escape(v) for v in sorted(_SPEECH_VERBS, key=len, reverse=True))
pat = re.compile(
r"(?P<lead>[,.!?…]|^)\s*"
r"(?P<inc>(?:(?i:" + _CLITIC + r")\s+)?"
r"(?i:" + verbs + r")\b"
r"[^.!?…»\",;]{0,40}?\b"
r"(?P<subj>" + subj_alt + r")\b"
r"[^.!?…»\",;]*?)"
r"(?P<close>[.!?…,])",
)
out: list[tuple[int, int, Optional[str]]] = []
for m in pat.finditer(text):
spk = _classify_subject(m.group("subj"), idx)
if spk is _REJECT:
continue
out.append((m.start("inc"),
_incise_end(text, m.end("close"), m.group("lead")), spk))
return out
def _merge_spans(spans: list[tuple[int, int]]) -> list[Incise]:
"""Trie et fusionne (sans chevauchement) une liste de bornes -> Incise."""
out: list[Incise] = []
last_end = -1
for s, e in sorted(set(spans)):
if s < last_end: # chevauchement -> on garde le premier vu
continue
out.append(Incise(start=s, end=e))
last_end = e
return out
def detect_incises(text: str, *, names=None) -> list[Incise]:
"""Bornes des incises dans une replique (inversion + nominale cast-aware)."""
spans = _inversion_spans(text)
spans += [(s, e) for s, e, _ in _nominal_matches(text, names or set())]
return _merge_spans(spans)
def incise_speaker(text: str, incise: Incise, names) -> Optional[str]:
"""Locuteur explicite porte par une incise nominale ("compatit Holden")."""
for s, e, spk in _nominal_matches(text, names):
if s == incise.start and e == incise.end:
return spk
return None
def iter_incise_pieces(
text: str, incises: list[Incise]
) -> list[tuple[bool, str]]:
"""Decoupe `text` en morceaux (is_incise, sous_texte) via les bornes.
Utilise au rendu : pieces dialogue -> voix du personnage, pieces incise ->
voix du narrateur. Texte conserve modulo espaces de bordure.
"""
pieces: list[tuple[bool, str]] = []
cursor = 0
for inc in sorted(incises, key=lambda i: i.start):
if inc.start < cursor: # garde-fou chevauchement
continue
before = text[cursor:inc.start]
if before.strip():
pieces.append((False, before.strip()))
body = text[inc.start:inc.end]
if body.strip():
pieces.append((True, body.strip()))
cursor = inc.end
tail = text[cursor:]
if tail.strip():
pieces.append((False, tail.strip()))
return pieces
def analyze_chapter(
chapter: Chapter,
ct: ChapterText,
gemma: Gemma,
*,
book_chars: Optional[list[Character]] = None,
dedup_gemma: Optional[Gemma] = None,
) -> tuple[ChapterAnalysis, list[Character]]:
"""Analyse complete d'un chapitre.
Sequence : segmentation -> extraction des personnages -> reconciliation
(dedup contre le cast cumule du livre) -> annotation des incises + seeding
du locuteur explicite -> attribution LLM des repliques restantes -> passe
retroactive. Les repliques sont persistees entieres (incises = bornes).
`book_chars` : cast cumule du livre (personnages canoniques deja connus).
`dedup_gemma` : si fourni, tranche les cas de dedup ambigus.
Renvoie (analyse, cast cumule mis a jour) ; le 2e element est l'ensemble du
casting du livre reconcilie, pret a etre persiste tel quel.
"""
from ..casting.dedup import reconcile_characters
segments = segment_chapter_text(ct)
full_text = "\n".join(ct.paragraphs)
found = extract_characters(full_text, gemma)
# Dedup AVANT l'attribution : le modele recevra des noms canoniques.
chars, name_map = reconcile_characters(book_chars or [], found, dedup_gemma)
# Liste canonique restreinte a ce chapitre (personnages detectes + POV).
chapter_canon = {(name_map.get(c.name.strip().lower()) or c.name).strip().lower()
for c in found}
chapter_chars = [c for c in chars if c.name.strip().lower() in chapter_canon]
if chapter.pov:
pv = chapter.pov.strip().lower()
for c in chars:
if (c not in chapter_chars and
(pv in c.name.lower()
or any(pv in a.lower() for a in c.aliases))):
chapter_chars.append(c)
# Annotation deterministe des incises (bornes, non destructif) + seeding :
# une incise nominale qui nomme un personnage fixe le locuteur avec certitude
# AVANT l'appel LLM (corrige les cas que le petit modele rate).
names = {c.name for c in chars}
for seg in segments:
if seg.type is not SegmentType.DIALOGUE:
continue
seg.incises = detect_incises(seg.text, names=names)
for inc in seg.incises:
spk = incise_speaker(seg.text, inc, names)
if spk:
seg.speaker = spk
break
conf = attribute_speakers(segments, gemma, characters=chapter_chars,
pov=chapter.pov)
if get_settings().retro_pass_use_gemma:
_refine_unknown_speakers(segments, gemma, characters=chapter_chars,
confidence=conf)
# Absorbe les locuteurs residuels (hors liste) en aliases (heuristique seule).
chars, _ = reconcile_characters(
chars, [], None, speaker_names=[s.speaker for s in segments])
# Les repliques sont persistees entieres ; les incises restent des bornes
# (rendu : voix narrateur). Plus de fragmentation a l'analyse.
analysis = ChapterAnalysis(index=chapter.index, title=ct.title,
segments=segments)
return analysis, chars

View File

295
backend/inkflow/api/app.py Normal file
View File

@@ -0,0 +1,295 @@
"""Application FastAPI : pilote le pipeline et sert l'UI.
Toutes les routes lourdes (analyse, casting, rendu) sont *enfilees* dans
l'orchestrateur et rendent la main immediatement ; l'avancement arrive par
WebSocket. Les operations rapides (preview de voix) tournent dans un threadpool.
"""
from __future__ import annotations
import asyncio
import io
from pathlib import Path
from typing import Optional
import soundfile as sf
from fastapi import FastAPI, HTTPException, UploadFile, WebSocket, WebSocketDisconnect
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import FileResponse, Response
from fastapi.staticfiles import StaticFiles
from pydantic import BaseModel
from ..config import DATA_DIR, book_data_dir, book_output_dir, ensure_dirs
from ..epub.parser import load_book, load_chapter_text, parse_epub
from ..models import Cast, ChapterAnalysis, Pronunciation
from ..pipeline.orchestrator import load_state, orchestrator
from ..settings import Settings, get_settings, save_settings
from ..store import artifacts
from ..util import slugify
from .ws import manager
app = FastAPI(title="InkFlow API")
app.add_middleware(
CORSMiddleware, allow_origins=["*"], allow_methods=["*"], allow_headers=["*"],
)
@app.on_event("startup")
async def _startup() -> None:
ensure_dirs()
manager.bind_loop(asyncio.get_running_loop())
orchestrator.set_broadcaster(manager.broadcast_threadsafe)
# --- Helpers -----------------------------------------------------------------
def _list_book_slugs() -> list[str]:
if not DATA_DIR.exists():
return []
return sorted(p.parent.name for p in DATA_DIR.glob("*/book.json"))
def _book_summary(slug: str) -> dict:
book = load_book(slug)
state = load_state(slug)
rendered = sum(1 for r in state.render.values() if r.mp3)
return {
"slug": slug,
"title": book.title,
"author": book.author,
"chapters": len(book.render_chapters),
"rendered": rendered,
"cover": f"/api/books/{slug}/cover" if book.cover_file else None,
}
# --- Bibliotheque / upload ---------------------------------------------------
@app.get("/api/books")
def list_books() -> list[dict]:
return [_book_summary(s) for s in _list_book_slugs()]
@app.post("/api/books")
async def upload_book(file: UploadFile) -> dict:
ensure_dirs()
uploads = DATA_DIR / "_uploads"
uploads.mkdir(parents=True, exist_ok=True)
dest = uploads / (file.filename or "livre.epub")
dest.write_bytes(await file.read())
book = await asyncio.to_thread(parse_epub, dest)
# Initialise l'etat.
load_state(book.slug)
return {"slug": book.slug, "title": book.title}
@app.get("/api/books/{slug}")
def get_book(slug: str) -> dict:
_require(slug)
book = load_book(slug)
return {"book": book.model_dump(mode="json"),
"state": load_state(slug).model_dump(mode="json")}
@app.get("/api/books/{slug}/cover")
def get_cover(slug: str):
book = load_book(slug)
if not book.cover_file:
raise HTTPException(404, "pas de couverture")
return FileResponse(str(book_data_dir(slug) / book.cover_file))
@app.get("/api/books/{slug}/chapters/{index}")
def get_chapter(slug: str, index: int) -> dict:
_require(slug)
book = load_book(slug)
ch = next((c for c in book.chapters if c.index == index), None)
if ch is None:
raise HTTPException(404, "chapitre inconnu")
out: dict = {"chapter": ch.model_dump(mode="json")}
apath = artifacts.analysis_path(slug, index)
if apath.exists():
out["analysis"] = artifacts.load_analysis(slug, index).model_dump(mode="json")
elif ch.text_file:
out["text"] = load_chapter_text(slug, ch).model_dump(mode="json")
return out
@app.put("/api/books/{slug}/chapters/{index}/analysis")
def put_analysis(slug: str, index: int, analysis: ChapterAnalysis) -> dict:
_require(slug)
if analysis.index != index:
raise HTTPException(400, "index incoherent")
artifacts.save_analysis(slug, analysis)
return {"saved": True}
# --- Etapes (enfilees) -------------------------------------------------------
class ChaptersBody(BaseModel):
chapters: Optional[list[int]] = None
@app.post("/api/books/{slug}/analyze")
def analyze(slug: str, body: ChaptersBody) -> dict:
_require(slug)
orchestrator.run_analyze(slug, body.chapters)
return {"queued": True}
@app.post("/api/books/{slug}/pronounce")
def pronounce(slug: str) -> dict:
_require(slug)
orchestrator.run_pronounce(slug)
return {"queued": True}
@app.post("/api/books/{slug}/cast/auto")
def cast_auto(slug: str) -> dict:
_require(slug)
orchestrator.run_cast(slug)
return {"queued": True}
@app.post("/api/books/{slug}/cast/analyze")
def cast_analyze(slug: str, body: ChaptersBody) -> dict:
"""(Re)analyse le casting d'un/des chapitre(s) avec reconciliation."""
_require(slug)
orchestrator.run_cast_analyze(slug, body.chapters)
return {"queued": True}
@app.post("/api/books/{slug}/cast/dedup")
def cast_dedup(slug: str) -> dict:
"""Deduplique le casting existant (variantes de noms -> aliases)."""
_require(slug)
orchestrator.run_dedup_cast(slug)
return {"queued": True}
class RenderBody(BaseModel):
chapters: list[int]
backend: Optional[str] = None
mono: bool = False
@app.post("/api/books/{slug}/render")
def render(slug: str, body: RenderBody) -> dict:
_require(slug)
orchestrator.run_render(slug, body.chapters, backend=body.backend, mono=body.mono)
return {"queued": True}
# --- Casting / prononciation (lecture-ecriture directe) ----------------------
@app.get("/api/books/{slug}/cast")
def get_cast(slug: str) -> dict:
from ..casting.voicebank import load_voicebank
_require(slug)
return {"cast": artifacts.load_cast(slug).model_dump(mode="json"),
"voicebank": load_voicebank().model_dump(mode="json")}
@app.put("/api/books/{slug}/cast")
def put_cast(slug: str, cast: Cast) -> dict:
_require(slug)
artifacts.save_cast(slug, cast)
return {"saved": True}
@app.get("/api/books/{slug}/pronunciation")
def get_pron(slug: str) -> dict:
_require(slug)
return artifacts.load_pronunciation(slug).model_dump(mode="json")
@app.put("/api/books/{slug}/pronunciation")
def put_pron(slug: str, pron: Pronunciation) -> dict:
_require(slug)
artifacts.save_pronunciation(slug, pron)
return {"saved": True}
# --- Reglages techniques globaux ---------------------------------------------
@app.get("/api/settings")
def read_settings() -> dict:
return get_settings().model_dump(mode="json")
@app.put("/api/settings")
def write_settings(settings: Settings) -> dict:
save_settings(settings)
return {"saved": True}
# --- Voicebank + preview -----------------------------------------------------
@app.get("/api/voicebank")
def get_voicebank() -> dict:
from ..casting.voicebank import load_voicebank
return load_voicebank().model_dump(mode="json")
class PreviewBody(BaseModel):
voice_id: str
text: str = "Bonjour, voici un aperçu de cette voix pour votre livre audio."
@app.post("/api/voicebank/preview")
async def preview_voice(body: PreviewBody):
from ..casting.voicebank import load_voicebank
from ..tts.base import VoiceSpec
entry = load_voicebank().by_id(body.voice_id)
if entry is None:
raise HTTPException(404, "voix inconnue")
def _synth() -> bytes:
from ..tts.factory import get_backend
backend = get_backend("kokoro")
audio, sr = backend.synthesize(body.text, VoiceSpec(preset=entry.kokoro_voice))
buf = io.BytesIO()
sf.write(buf, audio, sr, format="WAV")
return buf.getvalue()
data = await asyncio.to_thread(_synth)
return Response(content=data, media_type="audio/wav")
@app.get("/api/books/{slug}/audio/{index}")
def get_audio(slug: str, index: int):
state = load_state(slug)
rs = state.render.get(index)
if not rs or not rs.mp3:
raise HTTPException(404, "audio non genere")
path = book_output_dir(load_book(slug).title) / rs.mp3
if not path.exists():
raise HTTPException(404, "fichier introuvable")
return FileResponse(str(path), media_type="audio/mpeg", filename=rs.mp3)
# --- WebSocket ---------------------------------------------------------------
@app.websocket("/ws/{slug}")
async def ws_endpoint(ws: WebSocket, slug: str) -> None:
await manager.connect(slug, ws)
try:
# Envoi de l'etat courant a la connexion.
await ws.send_json({"type": "state", "state": load_state(slug).model_dump(mode="json")})
while True:
await ws.receive_text() # garde la connexion ouverte
except WebSocketDisconnect:
manager.disconnect(slug, ws)
except Exception: # noqa: BLE001
manager.disconnect(slug, ws)
def _require(slug: str) -> None:
if not (book_data_dir(slug) / "book.json").exists():
raise HTTPException(404, "livre inconnu")
# --- Service du frontend build (si present) ----------------------------------
_FRONTEND_DIST = Path(__file__).resolve().parents[2].parent / "frontend" / "dist"
if _FRONTEND_DIST.exists():
app.mount("/", StaticFiles(directory=str(_FRONTEND_DIST), html=True), name="ui")

47
backend/inkflow/api/ws.py Normal file
View File

@@ -0,0 +1,47 @@
"""Gestionnaire de connexions WebSocket avec diffusion thread-safe.
L'orchestrateur tourne dans un thread worker ; il appelle `broadcast_threadsafe`
qui replanifie l'envoi sur la boucle asyncio de l'API.
"""
from __future__ import annotations
import asyncio
from collections import defaultdict
from typing import Optional
from fastapi import WebSocket
class ConnectionManager:
def __init__(self) -> None:
self.active: dict[str, set[WebSocket]] = defaultdict(set)
self._loop: Optional[asyncio.AbstractEventLoop] = None
def bind_loop(self, loop: asyncio.AbstractEventLoop) -> None:
self._loop = loop
async def connect(self, slug: str, ws: WebSocket) -> None:
await ws.accept()
self.active[slug].add(ws)
def disconnect(self, slug: str, ws: WebSocket) -> None:
self.active[slug].discard(ws)
def broadcast_threadsafe(self, slug: str, data: dict) -> None:
"""Appelable depuis n'importe quel thread (worker orchestrateur)."""
if self._loop is None:
return
self._loop.call_soon_threadsafe(self._dispatch, slug, data)
def _dispatch(self, slug: str, data: dict) -> None:
for ws in list(self.active.get(slug, ())):
asyncio.create_task(self._safe_send(slug, ws, data))
async def _safe_send(self, slug: str, ws: WebSocket, data: dict) -> None:
try:
await ws.send_json({"type": "state", "state": data})
except Exception: # noqa: BLE001 — connexion fermee
self.disconnect(slug, ws)
manager = ConnectionManager()

View File

View File

@@ -0,0 +1,125 @@
"""Assemblage audio final : concat -> normalisation -> WAV -> MP3 taggue.
Pas de pydub (casse en Python 3.13) : concat/normalisation en numpy, encodage
mp3 + cover via ffmpeg CLI, tags via les metadonnees ffmpeg.
"""
from __future__ import annotations
import shutil
import subprocess
from pathlib import Path
from typing import Optional
import numpy as np
import soundfile as sf
from ..settings import get_settings
def _resample(audio: np.ndarray, src_sr: int, dst_sr: int) -> np.ndarray:
if src_sr == dst_sr or audio.size == 0:
return audio
duration = audio.size / src_sr
n_dst = int(round(duration * dst_sr))
x_src = np.linspace(0.0, duration, num=audio.size, endpoint=False)
x_dst = np.linspace(0.0, duration, num=n_dst, endpoint=False)
return np.interp(x_dst, x_src, audio).astype(np.float32)
def silence(seconds: float, sr: int) -> np.ndarray:
return np.zeros(int(seconds * sr), dtype=np.float32)
def concat_segments(
parts: list[tuple[np.ndarray, int]],
*,
target_sr: Optional[int] = None,
gap_seconds: float = 0.35,
intra_gap_seconds: float = 0.12,
glued: Optional[list[bool]] = None,
) -> tuple[np.ndarray, int]:
"""Concatene des segments (audio, sr) avec un silence entre chacun.
`glued[i] == True` (ex: une incise et sa replique, issues du meme paragraphe)
insere un silence court `intra_gap_seconds` au lieu de `gap_seconds`.
"""
if target_sr is None:
target_sr = get_settings().target_sample_rate
gap = silence(gap_seconds, target_sr)
intra_gap = silence(intra_gap_seconds, target_sr)
buf: list[np.ndarray] = []
first = True
for i, (audio, sr) in enumerate(parts):
if audio is None or audio.size == 0:
continue
if not first:
use_intra = glued is not None and i < len(glued) and glued[i]
buf.append(intra_gap if use_intra else gap)
first = False
buf.append(_resample(np.asarray(audio, dtype=np.float32), sr, target_sr))
if not buf:
return np.zeros(0, dtype=np.float32), target_sr
return np.concatenate(buf), target_sr
def normalize_loudness(audio: np.ndarray, target_dbfs: Optional[float] = None) -> np.ndarray:
"""Normalise le niveau RMS vers target_dbfs, avec garde anti-saturation."""
if audio.size == 0:
return audio
if target_dbfs is None:
target_dbfs = get_settings().target_dbfs
rms = float(np.sqrt(np.mean(audio.astype(np.float64) ** 2)))
if rms < 1e-6:
return audio
current_dbfs = 20.0 * np.log10(rms)
gain = 10.0 ** ((target_dbfs - current_dbfs) / 20.0)
out = audio * gain
peak = float(np.max(np.abs(out))) if out.size else 0.0
if peak > 0.99: # limiteur simple pour eviter le clipping
out *= 0.99 / peak
return out.astype(np.float32)
def write_wav(path: str | Path, audio: np.ndarray, sr: int) -> Path:
path = Path(path)
path.parent.mkdir(parents=True, exist_ok=True)
sf.write(str(path), audio, sr)
return path
def encode_mp3(
wav_path: str | Path,
mp3_path: str | Path,
*,
bitrate: Optional[str] = None,
title: Optional[str] = None,
album: Optional[str] = None,
artist: Optional[str] = None,
track: Optional[int] = None,
cover_path: Optional[str | Path] = None,
) -> Path:
"""Encode un WAV en MP3 (ffmpeg) avec tags ID3 et cover optionnelle."""
if bitrate is None:
bitrate = get_settings().mp3_bitrate
if not shutil.which("ffmpeg"):
raise RuntimeError("ffmpeg introuvable — brew install ffmpeg")
wav_path, mp3_path = Path(wav_path), Path(mp3_path)
mp3_path.parent.mkdir(parents=True, exist_ok=True)
cmd = ["ffmpeg", "-y", "-i", str(wav_path)]
has_cover = cover_path and Path(cover_path).exists()
if has_cover:
cmd += ["-i", str(cover_path), "-map", "0:a", "-map", "1:v",
"-c:v", "mjpeg", "-disposition:v", "attached_pic"]
cmd += ["-c:a", "libmp3lame", "-b:a", bitrate]
meta = {"title": title, "album": album, "artist": artist}
if track is not None:
meta["track"] = str(track)
for key, val in meta.items():
if val:
cmd += ["-metadata", f"{key}={val}"]
cmd += ["-id3v2_version", "3", str(mp3_path)]
subprocess.run(cmd, check=True, capture_output=True)
return mp3_path

View File

View File

@@ -0,0 +1,86 @@
"""Auto-casting : attribue une voix distincte a chaque personnage.
Strategie deterministe :
- Narrateur : voix FR native par defaut (ff_siwis), sinon premiere voix.
- Personnages : voix du meme genre, distinctes tant qu'il en reste ; au-dela on
recycle en repartissant le plus equitablement possible. Genre inconnu -> pool
mixte. L'ordre (tri par nom) garantit la reproductibilite.
L'utilisateur pourra surcharger ces choix dans l'UI.
"""
from __future__ import annotations
from collections import Counter
from typing import Optional
from ..models import Cast, Character, Voicebank
# Voix narrateur preferee (FR native).
PREFERRED_NARRATOR = "fr_f_siwis"
def _pick_pool(vb: Voicebank, gender: Optional[str], narrator_id: str) -> list[str]:
"""Voix candidates : on privilegie STRICTEMENT le genre (quitte a reutiliser).
On ne croise le genre que si aucune voix du bon genre n'existe. Le narrateur
est exclu tant qu'il reste d'autres options, pour le distinguer.
"""
same = [e.id for e in vb.by_gender(gender)] if gender in ("male", "female") else []
pool = same if same else [e.id for e in vb.entries]
non_narrator = [vid for vid in pool if vid != narrator_id]
return non_narrator or pool # garde le narrateur seulement s'il est seul
def assign_voices(
characters: list[Character],
vb: Voicebank,
*,
narrator_voice_id: Optional[str] = None,
respect_existing: bool = False,
) -> Cast:
"""Renvoie un Cast avec narrateur + voix par personnage (mutation des chars).
`respect_existing=True` conserve les voix deja attribuees (overrides UI) ;
sinon tout est re-calcule (auto-casting frais).
"""
if not vb.entries:
return Cast(narrator_voice_id=narrator_voice_id, characters=characters)
narrator_id = narrator_voice_id or (
PREFERRED_NARRATOR if vb.by_id(PREFERRED_NARRATOR) else vb.entries[0].id)
usage: Counter[str] = Counter()
usage[narrator_id] += 1 # le narrateur compte deja
for ch in sorted(characters, key=lambda c: c.name.lower()):
if respect_existing and ch.voice_id and vb.by_id(ch.voice_id):
usage[ch.voice_id] += 1
continue # respecte une attribution existante (override utilisateur)
pool = _pick_pool(vb, ch.gender, narrator_id)
# Choisit la voix la moins utilisee du pool (donc une voix neuve d'abord).
best = min(pool, key=lambda vid: (usage[vid], pool.index(vid)))
ch.voice_id = best
usage[best] += 1
return Cast(narrator_voice_id=narrator_id, characters=characters)
def resolve_speaker_voice(
speaker: str, cast: Cast, vb: Voicebank
) -> Optional[str]:
"""Mappe un nom de locuteur (segment) vers un id de voix.
Matche d'abord par nom/alias exact (rapide), puis en dernier recours par
rapprochement heuristique de tokens (ex: un "Jim" qui n'aurait pas encore
ete absorbe comme alias de "James Holden").
"""
if speaker == "narrateur":
return cast.narrator_voice_id
low = speaker.lower()
for ch in cast.characters:
if ch.name.lower() == low or low in (a.lower() for a in ch.aliases):
return ch.voice_id
from .dedup import heuristic_match
match = heuristic_match(speaker, cast.characters)
if isinstance(match, Character):
return match.voice_id
return None # inconnu -> le rendu repliera sur le narrateur

View File

@@ -0,0 +1,345 @@
"""Reconciliation du casting : deduplication des variantes de noms.
Probleme : un meme personnage apparait sous plusieurs formes ("Holden",
"James Holden", "James", "Jim"). Sans reconciliation, chaque forme devient un
personnage distinct avec sa propre voix -> incoherence a l'ecoute.
Strategie hybride :
1. Heuristique (sans LLM) : match exact sur nom/alias, puis sous-ensemble de
tokens ("Holden" contenu dans "James Holden").
2. Gemma tranche les cas ambigus (plusieurs candidats compatibles, ou variante
non evidente type "Jim" <-> "James") a l'aide des descriptions.
Chaque variante rencontree est conservee comme `alias` du personnage canonique ;
le nom canonique est la forme la plus complete vue ("James Holden"). Les
artefacts d'analyse (segments) ne sont PAS modifies : la resolution de voix au
rendu s'appuie sur les aliases (`casting/assign.py`).
"""
from __future__ import annotations
import re
from typing import Optional
from ..models import Character
from ..settings import get_settings
# Sentinelles internes.
_AMBIGUOUS = object() # heuristique : plusieurs candidats -> on delegue a Gemma
_NEW = object() # decision Gemma : nouveau personnage
# Mots vides / titres a ignorer pour le rapprochement par tokens.
_STOPWORDS = {
"le", "la", "les", "un", "une", "de", "du", "des", "monsieur", "madame",
"mademoiselle", "m", "mme", "mlle", "mr", "dr", "docteur", "capitaine",
"lieutenant", "sergent", "general", "amiral", "the", "of",
}
_SPLIT_RE = re.compile(r"[^\wÀ-ÿ]+")
# Garde-fou de contexte (caracteres) pour le prompt Gemma.
_MAX_PROMPT_CHARS = 24000
def _norm(name: str) -> str:
return name.strip().lower()
def _tokens(name: str) -> set[str]:
"""Tokens significatifs d'un nom (minuscules, sans titres ni mots vides)."""
parts = [p for p in _SPLIT_RE.split(name.strip()) if p]
return {p.lower() for p in parts
if len(p) >= 2 and p.lower() not in _STOPWORDS}
def _completeness(name: str) -> tuple[int, int]:
"""Cle de tri du nom le plus "complet" : plus de tokens, puis plus long."""
return (len(_tokens(name)), len(name.strip()))
def _forms(c: Character) -> list[str]:
return [c.name, *c.aliases]
def _token_freq(characters: list[Character], extra: Optional[list[str]] = None):
"""Compte, pour chaque token, le nb de surfaces distinctes le contenant.
Sert a juger la distinctivite d'un token : "holden" present dans une seule
famille est sur a fusionner ; "alex" present dans plusieurs ne l'est pas.
"""
from collections import Counter
freq: Counter[str] = Counter()
surfaces = {_norm(f) for c in characters for f in _forms(c)}
surfaces |= {_norm(s) for s in (extra or [])}
for s in surfaces:
for t in _tokens(s):
freq[t] += 1
return freq
def heuristic_match(surface: str, characters: list[Character], tokfreq=None):
"""Rapproche `surface` d'un personnage connu sans LLM (conservateur).
Renvoie le `Character` correspondant, `None` si aucun, ou `_AMBIGUOUS` si le
rapprochement est plausible mais incertain (decision laissee a Gemma).
Un lien par sous-ensemble de tokens n'est considere SUR que si le plus petit
cote a >=2 tokens, ou si les tokens partages sont globalement distinctifs
(presents dans <=2 surfaces). Sinon le lien est ambigu (ex: un prenom
courant "Alex" partage par plusieurs personnages).
"""
s_norm = _norm(surface)
for c in characters:
if _norm(c.name) == s_norm or any(_norm(a) == s_norm for a in c.aliases):
return c
s_tok = _tokens(surface)
if not s_tok:
return None
if tokfreq is None:
tokfreq = _token_freq(characters, [surface])
safe: list[Character] = []
ambiguous = False
for c in characters:
linked = is_safe = False
for form in _forms(c):
f_tok = _tokens(form)
if not f_tok or not (s_tok <= f_tok or f_tok <= s_tok):
continue
linked = True
shared = s_tok & f_tok
if min(len(s_tok), len(f_tok)) >= 2 or all(tokfreq[t] <= 2 for t in shared):
is_safe = True
if is_safe:
safe.append(c)
elif linked:
ambiguous = True
if len(safe) == 1 and not ambiguous:
return safe[0]
if safe or ambiguous:
return _AMBIGUOUS
return None
def canonical_of(a: str, b: str) -> str:
"""Forme canonique entre deux variantes : la plus complete."""
return a if _completeness(a) >= _completeness(b) else b
def _absorb(
target: Character,
name: str,
*,
gender: Optional[str] = None,
age: Optional[str] = None,
description: Optional[str] = None,
voice_id: Optional[str] = None,
) -> None:
"""Fusionne la variante `name` dans `target` (mutation en place).
Enrichit les attributs manquants, recalcule le nom canonique et range les
autres formes en aliases.
"""
target.gender = target.gender or gender
target.age = target.age or age
target.description = target.description or description
target.voice_id = target.voice_id or voice_id
forms: dict[str, str] = {} # norm -> graphie d'origine (1re vue conservee)
for f in [target.name, *target.aliases, name]:
f = (f or "").strip()
if f:
forms.setdefault(_norm(f), f)
canon = max(forms, key=lambda n: _completeness(forms[n]))
target.name = forms[canon]
target.aliases = sorted(v for k, v in forms.items() if k != canon)
def _item(c) -> dict:
"""Normalise un personnage ou un nom brut en entree de reconciliation."""
if isinstance(c, Character):
return {"name": c.name, "gender": c.gender, "age": c.age,
"description": c.description, "voice_id": c.voice_id}
return {"name": str(c), "gender": None, "age": None,
"description": None, "voice_id": None}
def _find(chars: list[Character], name: str) -> Optional[Character]:
n = _norm(name)
return next((c for c in chars
if _norm(c.name) == n or any(_norm(a) == n for a in c.aliases)),
None)
def _create(chars: list[Character], it: dict, name_map: dict[str, str]) -> None:
new = Character(name=it["name"].strip(), gender=it["gender"], age=it["age"],
description=it["description"], voice_id=it["voice_id"])
chars.append(new)
name_map[_norm(it["name"])] = new.name
def reconcile_characters(
book_chars: list[Character],
new_chars,
gemma=None,
*,
speaker_names: Optional[list[str]] = None,
) -> tuple[list[Character], dict[str, str]]:
"""Reconcilie de nouvelles detections dans le casting du livre.
`new_chars` : personnages extraits (objets `Character`) du/des chapitre(s).
`speaker_names` : formes de locuteur brutes vues dans les segments (absorbees
comme aliases pour que la resolution de voix matche au rendu).
`gemma` : si fourni, tranche les cas ambigus ; sinon heuristique seule.
Renvoie (liste canonique mise a jour, map nom_surface_normalise -> canonique).
"""
chars = [c.model_copy(deep=True) for c in book_chars]
name_map: dict[str, str] = {}
items = [_item(c) for c in new_chars]
seen = {_norm(it["name"]) for it in items}
for sp in (speaker_names or []):
n = _norm(sp or "")
if n and n not in seen and n not in {"narrateur", "inconnu", "?"}:
items.append(_item(sp))
seen.add(n)
# Fréquence globale des tokens (base + entrants) -> distinctivite stable,
# independante de l'ordre de traitement.
tokfreq = _token_freq(chars, [it["name"] for it in items])
pending: list[dict] = []
for it in items:
m = heuristic_match(it["name"], chars, tokfreq)
if m is _AMBIGUOUS:
pending.append(it)
elif m is not None:
_absorb(m, it["name"], gender=it["gender"], age=it["age"],
description=it["description"], voice_id=it["voice_id"])
name_map[_norm(it["name"])] = m.name
elif gemma is not None:
pending.append(it) # peut etre une variante non evidente ("Jim")
else:
_create(chars, it, name_map)
if pending and gemma is not None:
decisions = _gemma_reconcile(chars, pending, gemma)
for it in pending:
canon = decisions.get(_norm(it["name"]))
target = _find(chars, canon) if isinstance(canon, str) else None
if target is None: # Gemma dit NOUVEAU/inconnu : ultime essai heuristique
hm = heuristic_match(it["name"], chars, tokfreq)
target = hm if isinstance(hm, Character) else None
if target is not None:
_absorb(target, it["name"], gender=it["gender"], age=it["age"],
description=it["description"], voice_id=it["voice_id"])
name_map[_norm(it["name"])] = target.name
else:
_create(chars, it, name_map)
elif pending:
# Sans Gemma : on ne devine pas les cas ambigus, on les garde distincts.
for it in pending:
_create(chars, it, name_map)
return chars, name_map
def dedup_cast(characters: list[Character], gemma=None) -> list[Character]:
"""Replie les doublons d'un casting existant (conserve les voix attribuees).
Deux phases : (1) regroupement heuristique sur (gemma=None) -> liste reduite
et sure ; (2) si `gemma` fourni, passe de regroupement Gemma sur les seuls
noms candidats (partageant un token avec un autre), pour fusionner les
variantes que l'heuristique laisse de cote (ex: "Okoye" -> "Elvi Okoye").
"""
base, _ = reconcile_characters([], characters, gemma=None)
if gemma is None:
return base
return _gemma_merge_pass(base, gemma)
def _gemma_merge_pass(base: list[Character], gemma) -> list[Character]:
"""Rattache via Gemma les formes courtes a un nom complet (ancre).
Tache volontairement contrainte (et plus fiable qu'un regroupement libre) :
une "forme courte" est un nom dont les tokens sont strictement inclus dans
ceux d'un autre (ex: "Okoye" vs "Elvi Okoye"). Gemma mappe chaque forme
courte vers le nom canonique EXACT d'une ancre, ou "NOUVEAU". Traite par
petits lots pour rester dans la zone de fiabilite du modele.
"""
shorts: list[Character] = []
anchors: list[Character] = []
for i, c in enumerate(base):
ts = _tokens(c.name)
if ts and any(j != i and ts < _tokens(d.name) for j, d in enumerate(base)):
shorts.append(c)
else:
anchors.append(c)
if not shorts:
return base
result = [a.model_copy(deep=True) for a in anchors]
leftovers: list[Character] = []
for start in range(0, len(shorts), 12):
chunk = shorts[start:start + 12]
decisions = _gemma_reconcile(result, [_item(s) for s in chunk], gemma)
for s in chunk:
canon = decisions.get(_norm(s.name))
tgt = _find(result, canon) if isinstance(canon, str) else None
if tgt is None:
hm = heuristic_match(s.name, result)
tgt = hm if isinstance(hm, Character) else None
# Garde-fou : ne pas fusionner deux genres connus opposes.
if tgt is not None and s.gender and tgt.gender and s.gender != tgt.gender:
tgt = None
if tgt is not None:
_absorb(tgt, s.name, gender=s.gender, age=s.age,
description=s.description, voice_id=s.voice_id)
for a in s.aliases:
_absorb(tgt, a)
else:
leftovers.append(s)
return result + leftovers
def _gemma_reconcile(
chars: list[Character], pending: list[dict], gemma
) -> dict[str, object]:
"""Un appel groupe : pour chaque nom en attente, son canonique ou _NEW."""
known = []
for c in chars:
al = f" (alias: {', '.join(c.aliases)})" if c.aliases else ""
desc = f"{c.description}" if c.description else ""
known.append(f"- {c.name}{al}{desc}")
new_lines = []
for n, it in enumerate(pending):
desc = f"{it['description']}" if it.get("description") else ""
new_lines.append(f"[{n}] {it['name']}{desc}")
prompt = (
"Personnages DEJA connus du livre :\n"
+ ("\n".join(known) if known else "(aucun)")
+ "\n\nNoms DETECTES a classer :\n" + "\n".join(new_lines)
+ "\n\nPour chaque nom detecte, indique s'il designe un personnage deja "
"connu (donne alors son nom canonique EXACT tel qu'ecrit ci-dessus) ou "
"s'il s'agit d'un nouveau personnage (\"NOUVEAU\"). Ne fusionne que si "
"c'est, avec certitude, la meme personne. EN CAS DE DOUTE, ou si "
"plusieurs personnages connus pourraient correspondre, reponds "
"\"NOUVEAU\". Ne rapproche jamais deux personnes differentes qui "
"partagent seulement un prenom ou un nom de famille.\n\n"
'Reponds par un tableau JSON: '
'[{"i":0,"canonical":"James Holden"},{"i":1,"canonical":"NOUVEAU"}]'
)
if len(prompt) > _MAX_PROMPT_CHARS:
prompt = prompt[:_MAX_PROMPT_CHARS]
result = gemma.generate_json(prompt, system=get_settings().prompt_dedup)
decisions: dict[str, object] = {}
for item in result:
if not isinstance(item, dict) or "i" not in item:
continue
n = item["i"]
canon = str(item.get("canonical") or "").strip()
if isinstance(n, int) and 0 <= n < len(pending) and canon:
decisions[_norm(pending[n]["name"])] = (
_NEW if canon.upper() == "NOUVEAU" else canon)
return decisions

View File

@@ -0,0 +1,91 @@
"""Banque de voix : un jeu de voix variees (genre/age) auto-suffisant.
Chaque voix s'appuie sur une voix Kokoro (identite + clip de reference). Le clip
de reference est genere une fois en lisant un passage francais standard ; il sert
de reference de timbre pour le clonage Qwen3 (rendu final). Aucune ressource
externe a sourcer.
Resolution moteur :
- Kokoro -> VoiceSpec(preset=kokoro_voice) (rapide, preview / draft)
- Qwen3 -> VoiceSpec(ref_audio=clip, ref_text=…) (qualite, clonage)
"""
from __future__ import annotations
from pathlib import Path
import soundfile as sf
from ..config import VOICEBANK_DIR
from ..models import VoiceEntry, Voicebank
from ..tts.base import VoiceSpec
# Passage de reference lu par chaque voix pour creer son clip de clonage.
REFERENCE_TEXT = (
"L'univers est toujours plus étrange qu'on ne le croit. "
"Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
)
# Jeu de voix par defaut (varie en genre). ff_siwis est la seule voix FR native ;
# les autres empruntent un timbre anglais mais lisent un texte phonemise en FR.
SEED: list[VoiceEntry] = [
VoiceEntry(id="fr_f_siwis", kokoro_voice="ff_siwis", gender="female", age="adult", label="Siwis (FR)"),
VoiceEntry(id="f_bella", kokoro_voice="af_bella", gender="female", age="adult", label="Bella"),
VoiceEntry(id="f_heart", kokoro_voice="af_heart", gender="female", age="young", label="Heart"),
VoiceEntry(id="f_emma", kokoro_voice="bf_emma", gender="female", age="adult", label="Emma"),
VoiceEntry(id="f_nicole", kokoro_voice="af_nicole", gender="female", age="adult", label="Nicole"),
VoiceEntry(id="m_fenrir", kokoro_voice="am_fenrir", gender="male", age="adult", label="Fenrir"),
VoiceEntry(id="m_michael", kokoro_voice="am_michael", gender="male", age="adult", label="Michael"),
VoiceEntry(id="m_george", kokoro_voice="bm_george", gender="male", age="adult", label="George"),
VoiceEntry(id="m_lewis", kokoro_voice="bm_lewis", gender="male", age="adult", label="Lewis"),
VoiceEntry(id="m_eric", kokoro_voice="am_eric", gender="male", age="young", label="Eric"),
VoiceEntry(id="m_santa", kokoro_voice="am_santa", gender="male", age="old", label="Santa"),
]
def metadata_path() -> Path:
return VOICEBANK_DIR / "metadata.json"
def clips_dir() -> Path:
return VOICEBANK_DIR / "clips"
def load_voicebank() -> Voicebank:
path = metadata_path()
if path.exists():
return Voicebank.model_validate_json(path.read_text(encoding="utf-8"))
return Voicebank(entries=list(SEED))
def save_voicebank(vb: Voicebank) -> Path:
VOICEBANK_DIR.mkdir(parents=True, exist_ok=True)
metadata_path().write_text(vb.model_dump_json(indent=2), encoding="utf-8")
return metadata_path()
def build_voicebank(*, regenerate: bool = False) -> Voicebank:
"""Genere les clips de reference manquants et ecrit metadata.json."""
from ..tts.kokoro import KokoroBackend
clips_dir().mkdir(parents=True, exist_ok=True)
backend = KokoroBackend()
entries: list[VoiceEntry] = []
for seed in SEED:
clip_rel = f"clips/{seed.id}.wav"
clip_abs = VOICEBANK_DIR / clip_rel
if regenerate or not clip_abs.exists():
audio, sr = backend.synthesize(REFERENCE_TEXT, VoiceSpec(preset=seed.kokoro_voice))
sf.write(str(clip_abs), audio, sr)
entry = seed.model_copy(update={"ref_audio": clip_rel, "ref_text": REFERENCE_TEXT})
entries.append(entry)
vb = Voicebank(entries=entries)
save_voicebank(vb)
return vb
def voice_spec_for(entry: VoiceEntry, engine: str, *, speed: float = 1.0) -> VoiceSpec:
"""Construit la VoiceSpec adaptee au moteur cible."""
if engine == "qwen3" and entry.ref_audio:
ref_abs = str(VOICEBANK_DIR / entry.ref_audio)
return VoiceSpec(ref_audio=ref_abs, ref_text=entry.ref_text, speed=speed)
return VoiceSpec(preset=entry.kokoro_voice, speed=speed)

239
backend/inkflow/cli.py Normal file
View File

@@ -0,0 +1,239 @@
"""CLI InkFlow (typer).
Commandes :
- parse : EPUB -> book.json + chapters/chNN.json
- analyze : analyse Gemma d'un (ou de tous les) chapitre(s) -> analysis + cast
- info : affiche la structure d'un livre deja parse
"""
from __future__ import annotations
from typing import Optional
import typer
from rich.console import Console
from rich.table import Table
from .config import ensure_dirs
from .epub.parser import load_book, load_chapter_text, parse_epub
from .models import Cast
from .store import artifacts
app = typer.Typer(add_completion=False, help="InkFlow : EPUB -> livre audio (local, MLX).")
console = Console()
@app.command()
def parse(epub_path: str, slug: Optional[str] = typer.Option(None, help="Slug interne (def: depuis le titre).")):
"""Parse un EPUB en structure normalisee."""
ensure_dirs()
book = parse_epub(epub_path, slug=slug)
console.print(f"[green]Parse:[/] {book.title} — slug=[cyan]{book.slug}[/]")
console.print(f" {len(book.chapters)} items, {len(book.render_chapters)} a rendre.")
_print_chapters(book)
@app.command()
def info(slug: str):
"""Affiche la structure d'un livre deja parse."""
_print_chapters(load_book(slug))
@app.command()
def serve(host: str = "127.0.0.1", port: int = 8000):
"""Lance l'API + l'UI web (sert frontend/dist si build)."""
import uvicorn
ensure_dirs()
console.print(f"[green]InkFlow[/] sur http://{host}:{port}")
uvicorn.run("inkflow.api.app:app", host=host, port=port, log_level="info")
@app.command()
def analyze(
slug: str,
chapter: Optional[int] = typer.Option(None, help="Index de chapitre unique (def: tous)."),
limit: Optional[int] = typer.Option(None, help="Limiter au N premiers chapitres rendus."),
force: bool = typer.Option(False, help="Re-analyser meme si un artefact existe."),
):
"""Analyse Gemma : segments narration/dialogue + locuteurs + casting."""
from .analysis.gemma import Gemma
from .analysis.segmenter import analyze_chapter
from .settings import get_settings
book = load_book(slug)
gemma = Gemma()
dedup_gemma = gemma if get_settings().dedup_use_gemma else None
cast = artifacts.load_cast(slug)
chars = list(cast.characters)
targets = [c for c in book.render_chapters]
if chapter is not None:
targets = [c for c in book.chapters if c.index == chapter]
elif limit:
targets = targets[:limit]
for ch in targets:
if not force and artifacts.analysis_path(slug, ch.index).exists():
console.print(f"[dim]ch{ch.index:02d} deja analyse — ignore.[/]")
continue
ct = load_chapter_text(slug, ch)
console.print(f"[blue]Analyse[/] ch{ch.index:02d}{ch.title} ({ct.word_count} mots)…")
try:
# La dedup est faite dans analyze_chapter : `chars` recoit le cast
# cumule reconcilie.
analysis, chars = analyze_chapter(
ch, ct, gemma, book_chars=chars, dedup_gemma=dedup_gemma)
except Exception as exc: # noqa: BLE001 — un chapitre ne doit pas tout stopper
console.print(f" [yellow]! echec, chapitre ignore: {exc}[/]")
continue
artifacts.save_analysis(slug, analysis)
n_dlg = sum(1 for s in analysis.segments if s.type.value == "dialogue")
console.print(f" -> {len(analysis.segments)} segments ({n_dlg} repliques), "
f"{len(chars)} personnages cumules.")
cast = Cast(narrator_voice_id=cast.narrator_voice_id, characters=chars)
artifacts.save_cast(slug, cast)
console.print(f"[green]Casting[/] : {len(chars)} personnages -> cast.json")
@app.command()
def pronounce(
slug: str,
chapter: Optional[int] = typer.Option(None, help="Index de chapitre (def: 1er rendu)."),
):
"""Propose des candidats de prononciation (Gemma) -> pronunciation.json."""
from .analysis.gemma import Gemma
from .analysis.pronunciation import merge_pronunciations, propose_pronunciations
book = load_book(slug)
ch = (next((c for c in book.chapters if c.index == chapter), None)
if chapter is not None else (book.render_chapters[0] if book.render_chapters else None))
if ch is None or not ch.text_file:
console.print("[red]Chapitre introuvable.[/]"); raise typer.Exit(1)
ct = load_chapter_text(slug, ch)
gemma = Gemma()
with console.status("Recherche des mots a risque…"):
new = propose_pronunciations("\n".join(ct.paragraphs), gemma)
pron = merge_pronunciations(artifacts.load_pronunciation(slug), new)
artifacts.save_pronunciation(slug, pron)
table = Table("terme", "prononciation", "note")
for e in pron.entries:
table.add_row(e.term, e.replacement, e.note or "")
console.print(table)
console.print(f"[green]{len(pron.entries)} entrees[/] -> pronunciation.json")
@app.command()
def cast(
slug: str,
rebuild_voicebank: bool = typer.Option(False, help="Regenere les clips de la voicebank."),
dedup: bool = typer.Option(False, help="Deduplique d'abord les variantes de noms (heuristique)."),
llm: bool = typer.Option(False, "--llm", help="Ajoute la passe Gemma a la dedup (moins sur)."),
):
"""Construit la voicebank (si besoin) et auto-assigne les voix au casting."""
from .casting.assign import assign_voices
from .casting.voicebank import build_voicebank, load_voicebank
cast = artifacts.load_cast(slug)
if not cast.characters:
console.print("[yellow]Aucun personnage — lance d'abord `analyze`.[/]")
raise typer.Exit(1)
if dedup:
from .casting.dedup import dedup_cast
from .models import Cast
gemma = None
if llm:
from .analysis.gemma import Gemma
gemma = Gemma()
before = len(cast.characters)
with console.status("Deduplication du casting…"):
chars = dedup_cast(cast.characters, gemma)
cast = Cast(narrator_voice_id=cast.narrator_voice_id, characters=chars)
artifacts.save_cast(slug, cast)
console.print(f"[green]Dedup[/] : {before} -> {len(chars)} personnages.")
vb = load_voicebank()
if rebuild_voicebank or not vb.entries or not any(e.ref_audio for e in vb.entries):
with console.status("Generation des clips de la voicebank…"):
vb = build_voicebank(regenerate=rebuild_voicebank)
console.print(f"[green]Voicebank[/] : {len(vb.entries)} voix, clips generes.")
cast = assign_voices(cast.characters, vb, narrator_voice_id=cast.narrator_voice_id)
artifacts.save_cast(slug, cast)
table = Table("personnage", "genre", "voix")
table.add_row("[narrateur]", "", cast.narrator_voice_id or "")
for ch in cast.characters:
table.add_row(ch.name, ch.gender or "?", ch.voice_id or "")
console.print(table)
@app.command()
def render(
slug: str,
chapter: int = typer.Argument(..., help="Index du chapitre a synthetiser."),
backend: str = typer.Option("kokoro", help="Moteur TTS: kokoro | qwen3."),
mono: bool = typer.Option(True, help="Mono-narrateur (sinon multi-voix via cast)."),
max_paragraphs: Optional[int] = typer.Option(None, help="Limiter (test rapide)."),
):
"""Synthetise un chapitre en MP3 dans output/<livre>/."""
from .pipeline.render import (
build_units_mono,
build_units_multi,
render_chapter_to_mp3,
)
from .tts.base import VoiceSpec
from .tts.factory import get_backend
book = load_book(slug)
ch = next((c for c in book.chapters if c.index == chapter), None)
if ch is None or not ch.text_file:
console.print(f"[red]Chapitre {chapter} introuvable ou non rendu.[/]")
raise typer.Exit(1)
ct = load_chapter_text(slug, ch)
if max_paragraphs:
ct.paragraphs = ct.paragraphs[:max_paragraphs]
tts = get_backend(backend)
pron = artifacts.load_pronunciation(slug)
if mono:
units = build_units_mono(ct, tts.default_voice())
else:
from .casting.voicebank import load_voicebank, voice_spec_for
from .pipeline.render import make_voice_resolver
analysis = artifacts.load_analysis(slug, chapter)
cast_data = artifacts.load_cast(slug)
vb = load_voicebank()
# Voix narrateur par defaut depuis la voicebank si disponible.
narrator_entry = vb.by_id(cast_data.narrator_voice_id) if cast_data.narrator_voice_id else None
default_voice = (voice_spec_for(narrator_entry, backend)
if narrator_entry else tts.default_voice())
resolver = make_voice_resolver(cast_data, vb, backend)
units = build_units_multi(analysis, resolver, default_voice)
with console.status(f"Synthese de {len(units)} unites ({backend})…"):
def _p(done, total):
console.print(f" unite {done}/{total}", end="\r")
track = (book.render_chapters.index(ch) + 1) if ch in book.render_chapters else None
mp3 = render_chapter_to_mp3(book, ch, units, tts, pron=pron, track=track, progress=_p)
console.print(f"\n[green]MP3:[/] {mp3}")
def _print_chapters(book) -> None:
table = Table(show_header=True, header_style="bold")
for col in ("idx", "kind", "render", "pov", "mots", "sortie", "titre"):
table.add_column(col)
for c in book.chapters:
table.add_row(
str(c.index), c.kind.value, "" if c.render else "·",
c.pov or "", str(c.word_count), c.output_name or "",
c.title)
console.print(table)
if __name__ == "__main__":
app()

96
backend/inkflow/config.py Normal file
View File

@@ -0,0 +1,96 @@
"""Configuration centrale d'InkFlow.
Toutes les constantes (chemins, identifiants de modeles MLX, parametres par
defaut) sont regroupees ici pour rester facilement surchargeables via variables
d'environnement.
"""
from __future__ import annotations
import os
from pathlib import Path
# --- Racines du projet -------------------------------------------------------
# config.py est dans backend/inkflow/, la racine projet est donc deux niveaux
# au-dessus de backend/.
BACKEND_DIR = Path(__file__).resolve().parents[1]
PROJECT_ROOT = BACKEND_DIR.parent
def _env_path(var: str, default: Path) -> Path:
return Path(os.environ.get(var, default)).expanduser().resolve()
# Donnees de travail (etat par livre : json, db, wav intermediaires)
DATA_DIR = _env_path("INKFLOW_DATA_DIR", PROJECT_ROOT / "data")
# Sortie finale (1 dossier par livre, 1 mp3 par chapitre)
OUTPUT_DIR = _env_path("INKFLOW_OUTPUT_DIR", PROJECT_ROOT / "output")
# Banque de voix de reference (clips + metadata.json)
VOICEBANK_DIR = _env_path("INKFLOW_VOICEBANK_DIR", PROJECT_ROOT / "voicebank")
# Echantillons fournis
SAMPLES_DIR = PROJECT_ROOT / "samples"
# --- Modeles MLX (HuggingFace mlx-community) ---------------------------------
# Analyse de texte : Gemma via mlx-lm.
GEMMA_MODEL = os.environ.get(
"INKFLOW_GEMMA_MODEL", "mlx-community/gemma-3-4b-it-4bit"
)
# TTS : Qwen3-TTS (rendu final, clonage) et Kokoro (preview rapide).
QWEN3_TTS_MODEL = os.environ.get(
"INKFLOW_QWEN3_MODEL", "mlx-community/Qwen3-TTS-12Hz-1.7B-Base-8bit"
)
KOKORO_MODEL = os.environ.get(
"INKFLOW_KOKORO_MODEL", "mlx-community/Kokoro-82M-bf16"
)
# --- Parametres TTS ----------------------------------------------------------
DEFAULT_LANGUAGE = os.environ.get("INKFLOW_LANGUAGE", "French")
# Code langue Kokoro (misaki) : 'f' = francais.
KOKORO_LANG_CODE = os.environ.get("INKFLOW_KOKORO_LANG", "f")
# Voix Kokoro par defaut pour les previews / mono-narrateur rapide.
KOKORO_DEFAULT_VOICE = os.environ.get("INKFLOW_KOKORO_VOICE", "ff_siwis")
# Voix Qwen3 par defaut (narrateur) si aucun clip de reference fourni.
QWEN3_DEFAULT_VOICE = os.environ.get("INKFLOW_QWEN3_VOICE", "Chelsie")
# Frequence d'echantillonnage cible pour la concatenation (Hz). Les backends
# renvoient leur propre sr ; postprocess reechantillonne au besoin.
TARGET_SAMPLE_RATE = int(os.environ.get("INKFLOW_SAMPLE_RATE", "24000"))
# Encodage mp3 final.
MP3_BITRATE = os.environ.get("INKFLOW_MP3_BITRATE", "128k")
# Cible de normalisation loudness (LUFS approx via pydub gain).
TARGET_DBFS = float(os.environ.get("INKFLOW_TARGET_DBFS", "-18.0"))
def book_data_dir(book_slug: str) -> Path:
"""Dossier de travail pour un livre (artefacts intermediaires)."""
return DATA_DIR / book_slug
def book_output_dir(book_title: str) -> Path:
"""Dossier de sortie final pour un livre (mp3 par chapitre)."""
return OUTPUT_DIR / book_title
def ensure_dirs() -> None:
for d in (DATA_DIR, OUTPUT_DIR, VOICEBANK_DIR):
d.mkdir(parents=True, exist_ok=True)
def setup_espeak() -> None:
"""Localise libespeak-ng pour phonemizer (requis par Kokoro non-anglais).
phonemizer ne trouve pas toujours la lib installee via brew ; on pointe
explicitement PHONEMIZER_ESPEAK_LIBRARY si la variable n'est pas deja fixee.
"""
if os.environ.get("PHONEMIZER_ESPEAK_LIBRARY"):
return
candidates = [
"/opt/homebrew/lib/libespeak-ng.dylib",
"/usr/local/lib/libespeak-ng.dylib",
"/opt/homebrew/lib/libespeak-ng.1.dylib",
]
for path in candidates:
if os.path.exists(path):
os.environ["PHONEMIZER_ESPEAK_LIBRARY"] = path
return

View File

View File

@@ -0,0 +1,267 @@
"""Parsing EPUB -> structure de livre normalisee.
Strategie :
- ebooklib lit l'archive (manifest + spine + ncx).
- L'ordre de lecture vient du spine.
- Les titres viennent de la table des matieres (ncx/nav), mappes par href.
- Le texte de chaque document est extrait via BeautifulSoup (paragraphes).
- On classe chaque item en front / chapter / back et on decide s'il faut le lire.
Sorties ecrites dans data/<slug>/ :
- book.json : metadonnees + liste des chapitres (modele Book)
- chapters/chNN.json : texte normalise par chapitre (modele ChapterText)
- cover.<ext> : couverture extraite (si presente)
"""
from __future__ import annotations
import re
import warnings
from pathlib import Path
from typing import Optional
from urllib.parse import unquote, urldefrag
import ebooklib
from bs4 import BeautifulSoup
from ebooklib import epub
# Les xhtml d'epub declenchent un avertissement bs4 inoffensif ; on le tait.
try:
from bs4 import XMLParsedAsHTMLWarning
warnings.filterwarnings("ignore", category=XMLParsedAsHTMLWarning)
except ImportError: # pragma: no cover
pass
from ..config import book_data_dir
from ..models import Book, Chapter, ChapterKind, ChapterText
from ..util import safe_filename, slugify
# Un titre de chapitre commence par un numero, PROLOGUE ou EPILOGUE.
_CHAPTER_RE = re.compile(r"^\s*(\d+|prologue|[ée]pilogue)\b", re.IGNORECASE)
# Capture "<numero> - <POV>" ou juste "<numero>".
_TITLE_PARTS_RE = re.compile(r"^\s*([^-\n]+?)(?:\s*[-–—]\s*(.+))?\s*$")
# Seuil de mots pour qu'un element de back matter (remerciements...) soit lu.
_BACK_MATTER_MIN_WORDS = 40
def _build_toc_titles(book: epub.EpubBook) -> dict[str, str]:
"""Mappe href (sans fragment) -> titre, en aplatissant la toc ncx/nav."""
titles: dict[str, str] = {}
def walk(items) -> None:
for it in items:
if isinstance(it, tuple): # (Section, [children])
section, children = it
if isinstance(section, epub.Link):
_add(section)
walk(children)
elif isinstance(it, list):
walk(it)
elif isinstance(it, epub.Link):
_add(it)
def _add(link: epub.Link) -> None:
href = unquote(urldefrag(link.href)[0])
if href and href not in titles and link.title:
titles[href] = link.title.strip()
walk(book.toc)
return titles
def _extract_paragraphs(html: bytes) -> list[str]:
"""Extrait les paragraphes lisibles d'un document xhtml."""
soup = BeautifulSoup(html, "lxml")
# Retire les elements non narratifs.
for tag in soup(["script", "style", "sup", "table"]):
tag.decompose()
paragraphs: list[str] = []
blocks = soup.find_all(["p", "h1", "h2", "h3", "h4", "blockquote", "li"])
if not blocks and soup.body:
blocks = [soup.body]
for block in blocks:
text = block.get_text(" ", strip=True)
text = re.sub(r"\s+", " ", text).strip()
if text:
paragraphs.append(text)
return paragraphs
def _parse_title(title: str) -> tuple[Optional[str], Optional[str]]:
"""Decoupe un titre de chapitre en (numero, pov)."""
m = _TITLE_PARTS_RE.match(title)
if not m:
return None, None
number = (m.group(1) or "").strip() or None
pov = (m.group(2) or "").strip() or None
return number, pov
def _output_name(seq: int, kind: ChapterKind, number: Optional[str], title: str) -> str:
"""Nom de mp3 calque sur le format du sample (NN-<libelle>.mp3)."""
prefix = f"{seq:02d}"
label: str
if kind is ChapterKind.CHAPTER and number:
low = number.lower()
if low == "prologue":
label = "Prologue"
elif low in ("epilogue", "épilogue"):
label = "Épilogue"
elif number.isdigit():
label = f"Chapitre {int(number)}"
else:
label = number.capitalize()
else:
label = title
if label.isupper(): # titres tout-majuscule (ex "REMERCIEMENTS")
label = label.capitalize()
return safe_filename(f"{prefix}-{label}") + ".mp3"
def _classify(ordered: list[dict]) -> None:
"""Affecte kind/render a chaque item (mutation en place).
front = avant le premier chapitre numerote (couverture, page de titre...)
chapter = correspond au motif de titre de chapitre
back = apres le dernier chapitre (remerciements, glossaire...)
"""
chapter_idxs = [
i for i, it in enumerate(ordered)
if it["title"] and _CHAPTER_RE.match(it["title"])
]
first = chapter_idxs[0] if chapter_idxs else len(ordered)
last = chapter_idxs[-1] if chapter_idxs else -1
for i, it in enumerate(ordered):
is_chapter = bool(it["title"]) and bool(_CHAPTER_RE.match(it["title"]))
if is_chapter:
it["kind"] = ChapterKind.CHAPTER
it["render"] = it["word_count"] > 0
elif i < first:
it["kind"] = ChapterKind.FRONT
it["render"] = False
else: # i > last (back matter)
it["kind"] = ChapterKind.BACK
it["render"] = it["word_count"] >= _BACK_MATTER_MIN_WORDS
def _extract_cover(book: epub.EpubBook, dest_dir: Path) -> Optional[str]:
cover_item = None
for item in book.get_items_of_type(ebooklib.ITEM_COVER):
cover_item = item
break
if cover_item is None: # fallback : item nomme "cover"
for item in book.get_items_of_type(ebooklib.ITEM_IMAGE):
if "cover" in item.get_name().lower():
cover_item = item
break
if cover_item is None:
return None
ext = Path(cover_item.get_name()).suffix or ".jpg"
dest = dest_dir / f"cover{ext}"
dest.write_bytes(cover_item.get_content())
return dest.name
def parse_epub(epub_path: str | Path, slug: Optional[str] = None) -> Book:
"""Parse un EPUB et ecrit book.json + chapters/chNN.json dans data/<slug>/."""
epub_path = Path(epub_path)
book_ml = epub.read_epub(str(epub_path), options={"ignore_ncx": False})
title = _meta(book_ml, "title") or epub_path.stem
author = _meta(book_ml, "creator")
description = _meta(book_ml, "description")
language = _meta(book_ml, "language") or "fr"
slug = slug or slugify(title)
data_dir = book_data_dir(slug)
chapters_dir = data_dir / "chapters"
chapters_dir.mkdir(parents=True, exist_ok=True)
toc_titles = _build_toc_titles(book_ml)
# Documents dans l'ordre du spine.
id_to_item = {it.get_id(): it for it in book_ml.get_items()}
ordered: list[dict] = []
for idref, _linear in book_ml.spine:
item = id_to_item.get(idref)
if item is None or item.get_type() != ebooklib.ITEM_DOCUMENT:
continue
href = unquote(item.get_name())
paragraphs = _extract_paragraphs(item.get_content())
title_txt = toc_titles.get(href, "")
ordered.append({
"item_id": idref,
"src": href,
"title": title_txt,
"paragraphs": paragraphs,
"word_count": sum(len(p.split()) for p in paragraphs),
})
_classify(ordered)
cover_file = _extract_cover(book_ml, data_dir)
chapters: list[Chapter] = []
seq = 0 # compteur de prefixe sur les seuls chapitres rendus
for index, it in enumerate(ordered):
number = pov = None
if it["kind"] is ChapterKind.CHAPTER:
number, pov = _parse_title(it["title"])
text_file = None
output_name = None
if it["render"]:
seq += 1
ct = ChapterText(index=index, title=it["title"] or it["src"],
paragraphs=it["paragraphs"])
text_file = f"chapters/ch{index:02d}.json"
(data_dir / text_file).write_text(
ct.model_dump_json(indent=2), encoding="utf-8")
output_name = _output_name(seq, it["kind"], number, it["title"] or "")
chapters.append(Chapter(
index=index,
item_id=it["item_id"],
src=it["src"],
title=it["title"] or it["src"],
kind=it["kind"],
render=it["render"],
number=number,
pov=pov,
word_count=it["word_count"],
text_file=text_file,
output_name=output_name,
))
book = Book(
slug=slug,
title=title,
author=author,
language=(language[:2] if language else "fr"),
description=description,
cover_file=cover_file,
chapters=chapters,
)
(data_dir / "book.json").write_text(
book.model_dump_json(indent=2), encoding="utf-8")
return book
def _meta(book: epub.EpubBook, name: str) -> Optional[str]:
values = book.get_metadata("DC", name)
if values:
return values[0][0]
return None
def load_book(slug: str) -> Book:
path = book_data_dir(slug) / "book.json"
return Book.model_validate_json(path.read_text(encoding="utf-8"))
def load_chapter_text(slug: str, chapter: Chapter) -> ChapterText:
path = book_data_dir(slug) / chapter.text_file
return ChapterText.model_validate_json(path.read_text(encoding="utf-8"))

176
backend/inkflow/models.py Normal file
View File

@@ -0,0 +1,176 @@
"""Schemas de donnees partages dans tout le pipeline (pydantic v2).
Ces modeles sont serialises en JSON sur disque (book.json, analysis/chNN.json,
cast.json, pronunciation.json) et constituent le contrat entre les etapes du
pipeline. Chaque etape lit l'artefact de la precedente et ecrit le sien.
"""
from __future__ import annotations
from enum import Enum
from typing import Optional
from pydantic import BaseModel, Field
class ChapterKind(str, Enum):
FRONT = "front" # couverture, page de titre, mentions editeur (non lu)
CHAPTER = "chapter" # prologue, chapitres numerotes, epilogue (lu)
BACK = "back" # remerciements, glossaire... (lu si texte significatif)
class Chapter(BaseModel):
index: int # ordre dans le spine (0-based)
item_id: str # idref du manifest opf
src: str # chemin interne xhtml
title: str # titre toc brut, ex "1 - ELVI"
kind: ChapterKind
render: bool # doit-on synthetiser l'audio ?
number: Optional[str] = None # "1", "PROLOGUE", "EPILOGUE"...
pov: Optional[str] = None # personnage point de vue, ex "ELVI"
word_count: int = 0
text_file: Optional[str] = None # chemin relatif du json de texte (chapters/chNN.json)
output_name: Optional[str] = None # nom du mp3 final, ex "02-Chapitre 1.mp3"
class Book(BaseModel):
slug: str # identifiant interne (dossier data)
title: str
author: Optional[str] = None
language: str = "fr"
description: Optional[str] = None
cover_file: Optional[str] = None # chemin du cover extrait dans data/<slug>/
chapters: list[Chapter] = Field(default_factory=list)
@property
def render_chapters(self) -> list[Chapter]:
return [c for c in self.chapters if c.render]
class ChapterText(BaseModel):
"""Texte brut normalise d'un chapitre (sortie du parser)."""
index: int
title: str
paragraphs: list[str] = Field(default_factory=list)
@property
def word_count(self) -> int:
return sum(len(p.split()) for p in self.paragraphs)
# --- Analyse (etape Gemma) ---------------------------------------------------
class SegmentType(str, Enum):
NARRATION = "narration"
DIALOGUE = "dialogue"
class Incise(BaseModel):
"""Borne d'une incise de narration inseree dans une replique de dialogue.
Offsets (caracteres) dans `Segment.text` : la sous-chaine `text[start:end]`
est de la narration (ex: "dit-il", "lanca Drummer") a porter par la voix du
narrateur au rendu, sans fragmenter la replique persistee.
"""
start: int # offset inclus
end: int # offset exclu
class Segment(BaseModel):
"""Unite de synthese : un bout de texte attribue a un locuteur."""
type: SegmentType
text: str
speaker: str = "narrateur" # "narrateur" ou nom de personnage
glued_to_prev: bool = False # sous-segment issu du meme paragraphe (incise)
# -> gap audio reduit avec le segment precedent
incises: list[Incise] = Field(default_factory=list) # spans narrateur DANS text
class ChapterAnalysis(BaseModel):
index: int
title: str
segments: list[Segment] = Field(default_factory=list)
class Character(BaseModel):
name: str # nom canonique
aliases: list[str] = Field(default_factory=list)
gender: Optional[str] = None # "male" | "female" | "unknown"
age: Optional[str] = None # "child" | "young" | "adult" | "old"
description: Optional[str] = None
voice_id: Optional[str] = None # id dans la voicebank (assigne au casting)
class Cast(BaseModel):
narrator_voice_id: Optional[str] = None
characters: list[Character] = Field(default_factory=list)
class VoiceEntry(BaseModel):
"""Une voix de la banque, agnostique du moteur.
`kokoro_voice` est l'identite (rendu Kokoro direct + clip de reference) ;
`ref_audio`/`ref_text` servent au clonage Qwen3 (rendu final).
"""
id: str # ex "fr_f_siwis"
kokoro_voice: str # ex "ff_siwis"
gender: str = "unknown" # male | female | unknown
age: str = "adult" # child | young | adult | old
lang: str = "fr"
label: Optional[str] = None # libelle lisible
ref_audio: Optional[str] = None # chemin du clip (relatif a voicebank/)
ref_text: Optional[str] = None # transcription du clip
class Voicebank(BaseModel):
entries: list[VoiceEntry] = Field(default_factory=list)
def by_id(self, voice_id: str) -> Optional[VoiceEntry]:
return next((e for e in self.entries if e.id == voice_id), None)
def by_gender(self, gender: str) -> list[VoiceEntry]:
return [e for e in self.entries if e.gender == gender]
class PronunciationEntry(BaseModel):
term: str # graphie d'origine, ex "Tiamat"
replacement: str # graphie phonetique guidee, ex "Tia-mat"
note: Optional[str] = None
enabled: bool = True
class Pronunciation(BaseModel):
entries: list[PronunciationEntry] = Field(default_factory=list)
# --- Etat du projet (orchestration / UI) ------------------------------------
class StageStatus(str, Enum):
PENDING = "pending"
RUNNING = "running"
DONE = "done"
ERROR = "error"
class ChapterRenderState(BaseModel):
index: int
status: StageStatus = StageStatus.PENDING
progress: float = 0.0 # 0..1
mp3: Optional[str] = None # nom du fichier de sortie
backend: Optional[str] = None
error: Optional[str] = None
class ProjectState(BaseModel):
"""Etat persistant d'un livre, pilote par l'orchestrateur et lu par l'UI."""
slug: str
title: str
stages: dict[str, StageStatus] = Field(default_factory=dict) # parse/analyze/cast/pronounce
analyzed_chapters: list[int] = Field(default_factory=list)
render: dict[int, ChapterRenderState] = Field(default_factory=dict)
# Job courant (pour l'affichage temps reel).
active_stage: Optional[str] = None
active_detail: Optional[str] = None
active_progress: float = 0.0
def stage(self, name: str) -> StageStatus:
return self.stages.get(name, StageStatus.PENDING)

View File

View File

@@ -0,0 +1,364 @@
"""Orchestrateur : execute les etapes du pipeline en tache de fond, piste l'etat
et diffuse l'etat complet a l'UI a chaque changement.
- Un seul worker thread execute les jobs en serie (un Mac = une charge MLX a la
fois). Les jobs sont enfiles et rendent la main immediatement a l'API.
- L'etat (ProjectState) est persiste dans data/<slug>/state.json -> reprenable.
- La diffusion passe par un `broadcaster` injecte par la couche API (pour rester
independant de FastAPI). Il recoit (slug, dict_etat).
"""
from __future__ import annotations
import queue
import threading
import traceback
from pathlib import Path
from typing import Callable, Optional
from ..config import book_data_dir, book_output_dir
from ..epub.parser import load_book, load_chapter_text
from ..models import ChapterRenderState, ProjectState, StageStatus
from ..store import artifacts
Broadcaster = Callable[[str, dict], None]
def state_path(slug: str) -> Path:
return book_data_dir(slug) / "state.json"
def load_state(slug: str) -> ProjectState:
path = state_path(slug)
if path.exists():
state = ProjectState.model_validate_json(path.read_text(encoding="utf-8"))
else:
book = load_book(slug)
state = ProjectState(slug=slug, title=book.title,
stages={"parse": StageStatus.DONE})
return _reconcile(slug, state)
def _reconcile(slug: str, state: ProjectState) -> ProjectState:
"""Aligne l'etat sur les artefacts presents sur disque (reprise robuste).
Permet a l'UI de refleter ce qui a deja ete fait, meme via la CLI ou apres
un redemarrage, sans rejouer les etapes.
"""
book = load_book(slug)
state.stages.setdefault("parse", StageStatus.DONE)
# Analyse : chapitres possedant un artefact d'analyse.
analyzed = [c.index for c in book.render_chapters
if artifacts.analysis_path(slug, c.index).exists()]
if analyzed:
for idx in analyzed:
if idx not in state.analyzed_chapters:
state.analyzed_chapters.append(idx)
if state.stage("analyze") == StageStatus.PENDING:
state.stages["analyze"] = (
StageStatus.DONE if len(analyzed) == len(book.render_chapters)
else StageStatus.RUNNING)
# Casting : au moins une voix attribuee.
cast = artifacts.load_cast(slug)
if cast.narrator_voice_id or any(c.voice_id for c in cast.characters):
state.stages.setdefault("cast", StageStatus.DONE)
# Prononciation : au moins une entree.
if artifacts.load_pronunciation(slug).entries:
state.stages.setdefault("pronounce", StageStatus.DONE)
# Rendu : mp3 presents en sortie.
out_dir = book_output_dir(book.title)
for ch in book.render_chapters:
existing = state.render.get(ch.index)
if existing and existing.mp3:
continue
if ch.output_name and (out_dir / ch.output_name).exists():
state.render[ch.index] = ChapterRenderState(
index=ch.index, status=StageStatus.DONE, progress=1.0,
mp3=ch.output_name)
return state
class Orchestrator:
def __init__(self) -> None:
self._q: "queue.Queue[tuple[str, Callable[[], None]]]" = queue.Queue()
self._worker: Optional[threading.Thread] = None
self._broadcaster: Optional[Broadcaster] = None
self._lock = threading.Lock()
self.busy_slug: Optional[str] = None
# --- infra ---------------------------------------------------------------
def set_broadcaster(self, fn: Broadcaster) -> None:
self._broadcaster = fn
def _ensure_worker(self) -> None:
if self._worker is None or not self._worker.is_alive():
self._worker = threading.Thread(target=self._loop, daemon=True)
self._worker.start()
def _loop(self) -> None:
while True:
slug, job = self._q.get()
self.busy_slug = slug
try:
job()
except Exception: # noqa: BLE001
traceback.print_exc()
finally:
self.busy_slug = None
self._q.task_done()
def _save_and_emit(self, state: ProjectState) -> None:
path = state_path(state.slug)
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(state.model_dump_json(indent=2), encoding="utf-8")
if self._broadcaster:
self._broadcaster(state.slug, state.model_dump(mode="json"))
def enqueue(self, slug: str, job: Callable[[], None]) -> None:
self._ensure_worker()
self._q.put((slug, job))
# --- etapes --------------------------------------------------------------
def run_analyze(self, slug: str, chapter_indexes: Optional[list[int]] = None) -> None:
def job() -> None:
from ..analysis.gemma import Gemma
from ..analysis.segmenter import analyze_chapter
from ..models import Cast
from ..settings import get_settings
state = load_state(slug)
book = load_book(slug)
targets = [c for c in book.render_chapters
if chapter_indexes is None or c.index in chapter_indexes]
state.stages["analyze"] = StageStatus.RUNNING
state.active_stage = "analyze"
self._save_and_emit(state)
gemma = Gemma()
dedup_gemma = gemma if get_settings().dedup_use_gemma else None
cast = artifacts.load_cast(slug)
chars = list(cast.characters)
total = len(targets)
for i, ch in enumerate(targets):
state.active_detail = f"Analyse {ch.title}"
state.active_progress = i / max(total, 1)
self._save_and_emit(state)
ct = load_chapter_text(slug, ch)
try:
# La dedup est faite dans analyze_chapter : `chars` recoit le
# cast cumule reconcilie.
analysis, chars = analyze_chapter(
ch, ct, gemma, book_chars=chars, dedup_gemma=dedup_gemma)
except Exception: # noqa: BLE001 — chapitre ignore, on continue
traceback.print_exc()
continue
artifacts.save_analysis(slug, analysis)
if ch.index not in state.analyzed_chapters:
state.analyzed_chapters.append(ch.index)
self._save_and_emit(state)
artifacts.save_cast(slug, Cast(
narrator_voice_id=cast.narrator_voice_id, characters=chars))
state.stages["analyze"] = StageStatus.DONE
self._finish(state)
self.enqueue(slug, job)
def run_cast(self, slug: str) -> None:
def job() -> None:
from ..casting.assign import assign_voices
from ..casting.voicebank import build_voicebank, load_voicebank
state = load_state(slug)
state.stages["cast"] = StageStatus.RUNNING
state.active_stage = "cast"
state.active_detail = "Preparation de la voicebank"
self._save_and_emit(state)
vb = load_voicebank()
if not vb.entries or not any(e.ref_audio for e in vb.entries):
vb = build_voicebank()
cast = artifacts.load_cast(slug)
cast = assign_voices(cast.characters, vb,
narrator_voice_id=cast.narrator_voice_id)
artifacts.save_cast(slug, cast)
state.stages["cast"] = StageStatus.DONE
self._finish(state)
self.enqueue(slug, job)
def run_cast_analyze(self, slug: str, chapter_indexes: Optional[list[int]] = None) -> None:
"""(Re)extrait les personnages d'un/des chapitre(s) et les reconcilie.
Plus leger que `run_analyze` : ne re-segmente pas (les artefacts d'analyse
existants restent intacts). Sert le casting "a l'echelle d'un chapitre"
tout en maintenant la coherence du livre (deduplication).
"""
def job() -> None:
from ..analysis.gemma import Gemma
from ..analysis.segmenter import extract_characters
from ..casting.dedup import reconcile_characters
from ..models import Cast
from ..settings import get_settings
state = load_state(slug)
book = load_book(slug)
targets = [c for c in book.render_chapters
if chapter_indexes is None or c.index in chapter_indexes]
state.active_stage = "cast"
self._save_and_emit(state)
gemma = Gemma()
dedup_gemma = gemma if get_settings().dedup_use_gemma else None
cast = artifacts.load_cast(slug)
chars = list(cast.characters)
total = len(targets)
for i, ch in enumerate(targets):
state.active_detail = f"Casting — {ch.title}"
state.active_progress = i / max(total, 1)
self._save_and_emit(state)
ct = load_chapter_text(slug, ch)
try:
found = extract_characters("\n".join(ct.paragraphs), gemma)
speakers: list[str] = []
if artifacts.analysis_path(slug, ch.index).exists():
analysis = artifacts.load_analysis(slug, ch.index)
speakers = [s.speaker for s in analysis.segments]
chars, _ = reconcile_characters(
chars, found, dedup_gemma, speaker_names=speakers)
except Exception: # noqa: BLE001 — chapitre ignore, on continue
traceback.print_exc()
continue
artifacts.save_cast(slug, Cast(
narrator_voice_id=cast.narrator_voice_id, characters=chars))
self._save_and_emit(state)
self._finish(state)
self.enqueue(slug, job)
def run_dedup_cast(self, slug: str) -> None:
"""Replie les doublons d'un casting deja constitue (Holden/James Holden...)."""
def job() -> None:
from ..analysis.gemma import Gemma
from ..casting.dedup import dedup_cast
from ..models import Cast
from ..settings import get_settings
state = load_state(slug)
state.active_stage = "cast"
state.active_detail = "Deduplication du casting"
self._save_and_emit(state)
cast = artifacts.load_cast(slug)
gemma = Gemma() if get_settings().dedup_use_gemma else None
chars = dedup_cast(cast.characters, gemma)
artifacts.save_cast(slug, Cast(
narrator_voice_id=cast.narrator_voice_id, characters=chars))
self._finish(state)
self.enqueue(slug, job)
def run_pronounce(self, slug: str) -> None:
def job() -> None:
from ..analysis.gemma import Gemma
from ..analysis.pronunciation import (
merge_pronunciations,
propose_pronunciations,
)
state = load_state(slug)
book = load_book(slug)
state.stages["pronounce"] = StageStatus.RUNNING
state.active_stage = "pronounce"
self._save_and_emit(state)
gemma = Gemma()
pron = artifacts.load_pronunciation(slug)
targets = book.render_chapters[:3] # echantillon de chapitres
for i, ch in enumerate(targets):
state.active_detail = f"Mots a risque — {ch.title}"
state.active_progress = i / max(len(targets), 1)
self._save_and_emit(state)
ct = load_chapter_text(slug, ch)
pron = merge_pronunciations(
pron, propose_pronunciations("\n".join(ct.paragraphs), gemma))
artifacts.save_pronunciation(slug, pron)
state.stages["pronounce"] = StageStatus.DONE
self._finish(state)
self.enqueue(slug, job)
def run_render(self, slug: str, chapter_indexes: list[int],
backend: Optional[str] = None, mono: bool = False) -> None:
from ..settings import get_settings
backend = backend or get_settings().default_backend
def job() -> None:
from ..casting.voicebank import load_voicebank, voice_spec_for
from ..pipeline.render import (
build_units_mono,
build_units_multi,
make_voice_resolver,
render_chapter_to_mp3,
)
from ..tts.factory import get_backend
state = load_state(slug)
book = load_book(slug)
state.stages["render"] = StageStatus.RUNNING
state.active_stage = "render"
self._save_and_emit(state)
tts = get_backend(backend)
pron = artifacts.load_pronunciation(slug)
cast = artifacts.load_cast(slug)
vb = load_voicebank()
render_list = [c for c in book.render_chapters if c.index in chapter_indexes]
for ch in render_list:
rs = state.render.get(ch.index) or ChapterRenderState(index=ch.index)
rs.status = StageStatus.RUNNING
rs.progress = 0.0
rs.backend = backend
state.render[ch.index] = rs
state.active_detail = f"Synthese — {ch.title}"
self._save_and_emit(state)
try:
ct = load_chapter_text(slug, ch)
if mono or ch.index not in state.analyzed_chapters:
units = build_units_mono(ct, tts.default_voice())
else:
analysis = artifacts.load_analysis(slug, ch.index)
narr = vb.by_id(cast.narrator_voice_id) if cast.narrator_voice_id else None
default_voice = (voice_spec_for(narr, backend)
if narr else tts.default_voice())
resolver = make_voice_resolver(cast, vb, backend)
units = build_units_multi(analysis, resolver, default_voice)
def _p(done: int, total: int, _rs=rs, _state=state) -> None:
_rs.progress = done / max(total, 1)
_state.active_progress = _rs.progress
self._save_and_emit(_state)
track = book.render_chapters.index(ch) + 1
mp3 = render_chapter_to_mp3(book, ch, units, tts, pron=pron,
track=track, progress=_p)
rs.status = StageStatus.DONE
rs.progress = 1.0
rs.mp3 = mp3.name
except Exception as exc: # noqa: BLE001
rs.status = StageStatus.ERROR
rs.error = str(exc)
self._save_and_emit(state)
state.stages["render"] = StageStatus.DONE
self._finish(state)
self.enqueue(slug, job)
def _finish(self, state: ProjectState) -> None:
state.active_stage = None
state.active_detail = None
state.active_progress = 0.0
self._save_and_emit(state)
# Singleton partage par l'API.
orchestrator = Orchestrator()

View File

@@ -0,0 +1,158 @@
"""Rendu audio d'un chapitre : (segments + voix) -> WAV -> MP3.
Une `RenderUnit` = un bout de texte + la voix a employer. On construit la liste
d'unites (mono-narrateur ou multi-voix selon le casting), on synthetise chacune,
on concatene avec des silences, on normalise puis on encode en MP3.
"""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
from typing import Callable, Optional
from ..analysis.pronunciation import apply_pronunciation
from ..audio.postprocess import concat_segments, encode_mp3, normalize_loudness, write_wav
from ..config import book_data_dir, book_output_dir
from ..models import (
Book,
Chapter,
ChapterAnalysis,
ChapterText,
Pronunciation,
SegmentType,
)
from ..tts.base import TTSBackend, VoiceSpec
# Resout un nom de locuteur en une voix concrete.
VoiceResolver = Callable[[str], VoiceSpec]
@dataclass
class RenderUnit:
text: str
voice: VoiceSpec
speaker: str = "narrateur"
glued_to_prev: bool = False # incise -> gap reduit avec l'unite precedente
def build_units_mono(ct: ChapterText, narrator: VoiceSpec) -> list[RenderUnit]:
"""Mono-narrateur : chaque paragraphe est lu par la voix du narrateur."""
return [RenderUnit(text=p, voice=narrator) for p in ct.paragraphs if p.strip()]
def make_voice_resolver(cast, voicebank, engine: str) -> VoiceResolver:
"""Construit un resolver locuteur -> VoiceSpec via le casting + la voicebank.
Replie sur la voix du narrateur si le locuteur n'a pas de voix attribuee.
"""
from ..casting.assign import resolve_speaker_voice
from ..casting.voicebank import voice_spec_for
def resolve(speaker: str):
vid = resolve_speaker_voice(speaker, cast, voicebank)
if vid is None:
vid = cast.narrator_voice_id
entry = voicebank.by_id(vid) if vid else None
if entry is None:
return None # le backend utilisera sa voix par defaut
return voice_spec_for(entry, engine)
return resolve
def build_units_multi(
analysis: ChapterAnalysis,
resolve: VoiceResolver,
default_voice: "VoiceSpec",
) -> list[RenderUnit]:
"""Multi-voix : narration -> narrateur, dialogue -> voix du personnage.
Les incises annotees sur une replique (bornes dans le texte) sont detachees
ici, au dernier moment : la sous-chaine d'incise est portee par la voix du
narrateur (`glued_to_prev` pour reduire le silence), le reste par la voix du
personnage. Les repliques sans incise sont rendues entieres.
"""
from ..analysis.segmenter import iter_incise_pieces
narrator = resolve("narrateur") or default_voice
units: list[RenderUnit] = []
for seg in analysis.segments:
if not seg.text.strip():
continue
if seg.type is SegmentType.NARRATION:
units.append(RenderUnit(text=seg.text, voice=narrator,
speaker="narrateur",
glued_to_prev=seg.glued_to_prev))
continue
char_voice = resolve(seg.speaker) or default_voice
if not seg.incises:
units.append(RenderUnit(text=seg.text, voice=char_voice,
speaker=seg.speaker,
glued_to_prev=seg.glued_to_prev))
continue
for k, (is_incise, piece) in enumerate(
iter_incise_pieces(seg.text, seg.incises)):
glued = seg.glued_to_prev if k == 0 else True
if is_incise:
units.append(RenderUnit(text=piece, voice=narrator,
speaker="narrateur", glued_to_prev=glued))
else:
units.append(RenderUnit(text=piece, voice=char_voice,
speaker=seg.speaker, glued_to_prev=glued))
return units
def render_units(
units: list[RenderUnit],
backend: TTSBackend,
*,
pron: Optional[Pronunciation] = None,
progress: Optional[Callable[[int, int], None]] = None,
) -> tuple["list", int]:
"""Synthetise toutes les unites et renvoie (liste (audio,sr), n_units)."""
parts = []
total = len(units)
for i, unit in enumerate(units):
text = apply_pronunciation(unit.text, pron) if pron else unit.text
audio, sr = backend.synthesize(text, unit.voice)
parts.append((audio, sr))
if progress:
progress(i + 1, total)
return parts, total
def render_chapter_to_mp3(
book: Book,
chapter: Chapter,
units: list[RenderUnit],
backend: TTSBackend,
*,
pron: Optional[Pronunciation] = None,
track: Optional[int] = None,
progress: Optional[Callable[[int, int], None]] = None,
) -> Path:
"""Pipeline complet pour un chapitre -> output/<livre>/NN-...mp3."""
parts, _ = render_units(units, backend, pron=pron, progress=progress)
# parts est aligne 1:1 avec units -> on transmet les marqueurs d'incise.
audio, sr = concat_segments(parts, glued=[u.glued_to_prev for u in units])
audio = normalize_loudness(audio)
# WAV intermediaire dans data/, MP3 final dans output/.
wav_path = book_data_dir(book.slug) / "audio" / f"ch{chapter.index:02d}.wav"
write_wav(wav_path, audio, sr)
out_dir = book_output_dir(book.title)
mp3_path = out_dir / (chapter.output_name or f"ch{chapter.index:02d}.mp3")
cover = None
if book.cover_file:
candidate = book_data_dir(book.slug) / book.cover_file
cover = candidate if candidate.exists() else None
encode_mp3(
wav_path, mp3_path,
title=chapter.title, album=book.title, artist=book.author,
track=track, cover_path=cover,
)
return mp3_path

170
backend/inkflow/settings.py Normal file
View File

@@ -0,0 +1,170 @@
"""Reglages techniques editables au runtime (globaux a l'app).
Contrairement a `config.py` (constantes figees lues a l'import, surchargeables
seulement par variables d'environnement au demarrage), ce module expose un objet
`Settings` *persiste* dans `data/settings.json` et modifiable depuis l'UI.
Les valeurs par defaut reprennent celles de `config.py`. Le code du pipeline
consulte `get_settings()` au moment de l'execution ; une sauvegarde invalide les
caches de modeles (backends TTS, chargement Gemma) pour que les nouveaux
identifiants/parametres prennent effet sans redemarrage.
"""
from __future__ import annotations
import threading
from typing import Optional
from pydantic import BaseModel, Field
from . import config
# --- Prompts systeme par defaut (source canonique) ---------------------------
# Ces chaines pilotent les trois taches Gemma. L'utilisateur peut les editer.
DEFAULT_PROMPT_SPEAKERS = (
"Tu es un assistant d'analyse litteraire. Tu identifies QUI prononce chaque "
"replique de dialogue dans un extrait de roman en francais. Une liste des "
"personnages du chapitre t'est fournie : choisis le locuteur dans cette "
"liste en recopiant son nom EXACTEMENT. Appuie-toi sur la narration qui "
"PRECEDE et qui SUIT chaque replique (incise d'attribution type 'dit "
"Marie'), sur les vocatifs (le personnage a qui l'on s'adresse) et sur "
"l'alternance des tours de parole. Mets 'inconnu' si tu n'es pas sur. Tu "
"reponds UNIQUEMENT en JSON valide, sans texte autour."
)
DEFAULT_PROMPT_SPEAKERS_REFINE = (
"Tu es un assistant d'analyse litteraire. On te donne des repliques dont le "
"locuteur est reste indetermine, avec le locuteur DEJA identifie des "
"repliques voisines. Deduis qui parle en exploitant l'alternance des tours "
"de parole et le contexte narratif autour. Choisis le nom dans la liste des "
"personnages fournie, en le recopiant exactement, ou 'inconnu' si vraiment "
"indeterminable. Tu reponds UNIQUEMENT en JSON valide, sans texte autour."
)
DEFAULT_PROMPT_CHARACTERS = (
"Tu es un assistant d'analyse litteraire. Tu extrais la liste des "
"personnages d'un extrait de roman et leurs attributs vocaux. Tu reponds "
"UNIQUEMENT en JSON valide."
)
DEFAULT_PROMPT_PRONUNCIATION = (
"Tu es un assistant de preparation de livre audio en francais. Tu reperes "
"les mots dont la prononciation par un synthetiseur vocal francais risque "
"d'etre incorrecte (noms propres etrangers, termes de science-fiction, "
"acronymes). Tu reponds UNIQUEMENT en JSON valide."
)
DEFAULT_PROMPT_INCISES = (
"Tu es un assistant d'analyse litteraire. Tu reperes les INCISES de "
"narration inserees dans une replique de dialogue (ex: 'dit Mamie', "
"'repondit le capitaine'). Tu reponds UNIQUEMENT en JSON valide, sans "
"texte autour."
)
DEFAULT_PROMPT_DEDUP = (
"Tu es un assistant d'analyse litteraire. Tu rapproches les differentes "
"facons de nommer un meme personnage (nom complet, prenom, surnom, "
"diminutif) pour eviter les doublons dans le casting d'un livre audio. Tu "
"ne fusionnes deux noms que si c'est, avec certitude, la meme personne. Tu "
"reponds UNIQUEMENT en JSON valide, sans texte autour."
)
class Settings(BaseModel):
"""Reglages techniques globaux, persistes dans data/settings.json."""
# --- Modeles MLX (identifiants HuggingFace) ---
gemma_model: str = config.GEMMA_MODEL
qwen3_model: str = config.QWEN3_TTS_MODEL
kokoro_model: str = config.KOKORO_MODEL
# --- Generation Gemma ---
gemma_temperature: float = Field(0.1, ge=0.0, le=2.0)
gemma_max_tokens: int = Field(2048, ge=64, le=8192)
# --- Prompts systeme (analyse) ---
prompt_speakers: str = DEFAULT_PROMPT_SPEAKERS
prompt_speakers_refine: str = DEFAULT_PROMPT_SPEAKERS_REFINE
prompt_characters: str = DEFAULT_PROMPT_CHARACTERS
prompt_pronunciation: str = DEFAULT_PROMPT_PRONUNCIATION
prompt_incises: str = DEFAULT_PROMPT_INCISES # DEPRECIE (detection deterministe)
prompt_dedup: str = DEFAULT_PROMPT_DEDUP
# --- Incises ---
# DEPRECIE : la detection d'incises est desormais deterministe et conscience
# du casting (analysis.segmenter.detect_incises), sans fallback Gemma. Champ
# conserve pour charger les settings.json existants sans erreur.
split_incises_use_gemma: bool = True
# --- Attribution retroactive (2e passe sur les repliques indeterminees) ---
# Apres la 1re passe, une 2e passe ciblee re-resout les repliques restees
# 'inconnu' (ou peu sures) en s'appuyant sur les voisins deja identifies.
# Declenchee seulement s'il reste des doutes -> cout nul sinon.
retro_pass_use_gemma: bool = True
# --- Deduplication du casting ---
# Heuristique (sure, deterministe) par defaut. La passe Gemma rattache en
# plus les variantes non evidentes (diminutifs, titres) mais, avec un petit
# modele local, produit des fusions erronees -> opt-in.
dedup_use_gemma: bool = False
# --- TTS ---
default_backend: str = "kokoro"
language: str = config.DEFAULT_LANGUAGE
kokoro_lang_code: str = config.KOKORO_LANG_CODE
kokoro_default_voice: str = config.KOKORO_DEFAULT_VOICE
qwen3_default_voice: str = config.QWEN3_DEFAULT_VOICE
# --- Audio (encodage final) ---
target_sample_rate: int = Field(config.TARGET_SAMPLE_RATE, ge=8000, le=48000)
mp3_bitrate: str = config.MP3_BITRATE
target_dbfs: float = Field(config.TARGET_DBFS, ge=-40.0, le=0.0)
_LOCK = threading.Lock()
_cache: Optional[Settings] = None
def settings_path():
return config.DATA_DIR / "settings.json"
def get_settings() -> Settings:
"""Renvoie les reglages courants (charges depuis le disque une seule fois)."""
global _cache
with _LOCK:
if _cache is None:
path = settings_path()
if path.exists():
try:
_cache = Settings.model_validate_json(
path.read_text(encoding="utf-8"))
except Exception: # noqa: BLE001 — fichier corrompu -> defauts
_cache = Settings()
else:
_cache = Settings()
return _cache
def save_settings(settings: Settings) -> Settings:
"""Persiste les reglages et invalide les caches de modeles."""
global _cache
with _LOCK:
_cache = settings
path = settings_path()
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(settings.model_dump_json(indent=2), encoding="utf-8")
_invalidate_model_caches()
return settings
def _invalidate_model_caches() -> None:
"""Force le rechargement des modeles apres un changement d'identifiant/param.
`get_backend` est cache par *nom* de backend, pas par id de modele ; sans
purge, un changement d'id serait ignore. Idem pour le chargement Gemma.
"""
try:
from .tts.factory import get_backend
get_backend.cache_clear()
except Exception: # noqa: BLE001
pass
try:
from .analysis.gemma import _load
_load.cache_clear()
except Exception: # noqa: BLE001
pass

View File

View File

@@ -0,0 +1,63 @@
"""Lecture/ecriture des artefacts du pipeline dans data/<slug>/.
Chaque etape ecrit un JSON ; les etapes suivantes les relisent. C'est aussi ce
qui rend le pipeline reprenable : on peut detecter qu'un artefact existe deja.
"""
from __future__ import annotations
from pathlib import Path
from ..config import book_data_dir
from ..models import Cast, ChapterAnalysis, Pronunciation
def analysis_path(slug: str, chapter_index: int) -> Path:
return book_data_dir(slug) / "analysis" / f"ch{chapter_index:02d}.json"
def cast_path(slug: str) -> Path:
return book_data_dir(slug) / "cast.json"
def pronunciation_path(slug: str) -> Path:
return book_data_dir(slug) / "pronunciation.json"
def save_analysis(slug: str, analysis: ChapterAnalysis) -> Path:
path = analysis_path(slug, analysis.index)
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(analysis.model_dump_json(indent=2), encoding="utf-8")
return path
def load_analysis(slug: str, chapter_index: int) -> ChapterAnalysis:
path = analysis_path(slug, chapter_index)
return ChapterAnalysis.model_validate_json(path.read_text(encoding="utf-8"))
def save_cast(slug: str, cast: Cast) -> Path:
path = cast_path(slug)
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(cast.model_dump_json(indent=2), encoding="utf-8")
return path
def load_cast(slug: str) -> Cast:
path = cast_path(slug)
if not path.exists():
return Cast()
return Cast.model_validate_json(path.read_text(encoding="utf-8"))
def save_pronunciation(slug: str, pron: Pronunciation) -> Path:
path = pronunciation_path(slug)
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(pron.model_dump_json(indent=2), encoding="utf-8")
return path
def load_pronunciation(slug: str) -> Pronunciation:
path = pronunciation_path(slug)
if not path.exists():
return Pronunciation()
return Pronunciation.model_validate_json(path.read_text(encoding="utf-8"))

View File

View File

@@ -0,0 +1,48 @@
"""Abstraction des moteurs TTS (backend pluggable).
Deux implementations : Kokoro (rapide, voix preglees -> previews) et Qwen3-TTS
(qualite + clonage par audio de reference -> rendu final). Toutes deux renvoient
de l'audio mono float32 + une frequence d'echantillonnage.
"""
from __future__ import annotations
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Optional
import numpy as np
@dataclass
class VoiceSpec:
"""Decrit la voix a utiliser pour une synthese.
- `preset` : nom d'une voix preglee (Kokoro: "ff_siwis" ; Qwen3: "Chelsie").
- `ref_audio` / `ref_text` : clip de reference pour le clonage (Qwen3).
"""
preset: Optional[str] = None
ref_audio: Optional[str] = None
ref_text: Optional[str] = None
speed: float = 1.0
class TTSBackend(ABC):
"""Interface commune a tous les moteurs TTS."""
name: str = "base"
@abstractmethod
def synthesize(self, text: str, voice: VoiceSpec) -> tuple[np.ndarray, int]:
"""Synthetise `text` et renvoie (audio mono float32, sample_rate)."""
def default_voice(self) -> VoiceSpec:
return VoiceSpec()
def to_mono_float32(audio) -> np.ndarray:
"""Normalise une sortie de modele (mx.array / np / list) en mono float32."""
arr = np.asarray(audio, dtype=np.float32)
if arr.ndim > 1:
# (channels, n) ou (n, channels) -> moyenne sur l'axe des canaux.
arr = arr.mean(axis=0) if arr.shape[0] < arr.shape[-1] else arr.mean(axis=-1)
return np.ascontiguousarray(arr.reshape(-1))

View File

@@ -0,0 +1,62 @@
"""Decoupage de texte en morceaux synthese-friendly.
Les modeles TTS (Kokoro notamment) tronquent les textes trop longs. On decoupe
donc sur les frontieres de phrases en respectant une longueur max par morceau.
"""
from __future__ import annotations
import re
# Fin de phrase : ponctuation forte suivie d'un espace.
_SENTENCE_END_RE = re.compile(r"(?<=[.!?…])\s+|\n+")
# Pour les phrases tres longues, on coupe aussi sur les virgules / points-virgules.
_SOFT_BREAK_RE = re.compile(r"(?<=[,;:])\s+")
DEFAULT_MAX_CHARS = 350
def split_sentences(text: str) -> list[str]:
parts = [p.strip() for p in _SENTENCE_END_RE.split(text)]
return [p for p in parts if p]
def _split_long(sentence: str, max_chars: int) -> list[str]:
"""Coupe une phrase trop longue sur les virgules, puis par fenetre dure."""
if len(sentence) <= max_chars:
return [sentence]
out: list[str] = []
buf = ""
for piece in _SOFT_BREAK_RE.split(sentence):
cand = f"{buf} {piece}".strip()
if len(cand) <= max_chars:
buf = cand
else:
if buf:
out.append(buf)
if len(piece) <= max_chars:
buf = piece
else: # mot/segment plus long que la fenetre : coupe brute
for i in range(0, len(piece), max_chars):
out.append(piece[i:i + max_chars])
buf = ""
if buf:
out.append(buf)
return out
def chunk_text(text: str, max_chars: int = DEFAULT_MAX_CHARS) -> list[str]:
"""Regroupe les phrases en morceaux <= max_chars, sans couper une phrase."""
chunks: list[str] = []
buf = ""
for sentence in split_sentences(text):
for part in _split_long(sentence, max_chars):
cand = f"{buf} {part}".strip()
if len(cand) <= max_chars:
buf = cand
else:
if buf:
chunks.append(buf)
buf = part
if buf:
chunks.append(buf)
return chunks

View File

@@ -0,0 +1,20 @@
"""Selection du backend TTS par nom (pluggable)."""
from __future__ import annotations
from functools import lru_cache
from .base import TTSBackend
BACKENDS = ("kokoro", "qwen3")
@lru_cache(maxsize=4)
def get_backend(name: str = "kokoro") -> TTSBackend:
name = name.lower()
if name == "kokoro":
from .kokoro import KokoroBackend
return KokoroBackend()
if name == "qwen3":
from .qwen3 import Qwen3Backend
return Qwen3Backend()
raise ValueError(f"Backend TTS inconnu: {name!r} (dispo: {', '.join(BACKENDS)})")

View File

@@ -0,0 +1,93 @@
"""Backend Kokoro (rapide, voix preglees) — ideal pour les previews.
Kokoro tronque les textes longs : on synthetise morceau par morceau (decoupage
par phrases) puis on concatene. Le francais passe par espeak-ng via phonemizer.
"""
from __future__ import annotations
import logging
import numpy as np
from ..config import setup_espeak
from ..settings import get_settings
from .base import TTSBackend, VoiceSpec, to_mono_float32
from .chunk import chunk_text
logger = logging.getLogger(__name__)
# Le port MLX de Kokoro a un bug d'alignement intermittent (mx.random.normal
# dans le generateur harmonique) qui leve un broadcast_shapes sur certains
# tirages. Comme c'est aleatoire, un simple retry suffit le plus souvent ;
# en dernier recours on coupe le morceau en deux.
_KOKORO_RETRIES = 8
class KokoroBackend(TTSBackend):
name = "kokoro"
def __init__(self, model_id: str | None = None, lang_code: str | None = None):
setup_espeak()
settings = get_settings()
self.model_id = model_id or settings.kokoro_model
self.lang_code = lang_code or settings.kokoro_lang_code
self._model = None
self._sample_rate = 24000
def _ensure_loaded(self) -> None:
if self._model is None:
from mlx_audio.tts.utils import load_model
self._model = load_model(self.model_id)
def default_voice(self) -> VoiceSpec:
return VoiceSpec(preset=get_settings().kokoro_default_voice)
def synthesize(self, text: str, voice: VoiceSpec) -> tuple[np.ndarray, int]:
self._ensure_loaded()
preset = voice.preset or get_settings().kokoro_default_voice
pieces: list[np.ndarray] = []
for chunk in chunk_text(text):
pieces.extend(self._gen_resilient(chunk, preset, voice.speed))
if not pieces:
return np.zeros(0, dtype=np.float32), self._sample_rate
return np.concatenate(pieces), self._sample_rate
def _gen_once(self, text: str, preset: str, speed: float) -> list[np.ndarray]:
out: list[np.ndarray] = []
for result in self._model.generate(
text=text, voice=preset, speed=speed, lang_code=self.lang_code,
):
self._sample_rate = getattr(result, "sample_rate", self._sample_rate)
out.append(to_mono_float32(result.audio))
return out
def _gen_resilient(self, text: str, preset: str, speed: float,
depth: int = 0) -> list[np.ndarray]:
"""Genere un morceau avec retries, puis re-decoupe en secours."""
for _ in range(_KOKORO_RETRIES):
try:
return self._gen_once(text, preset, speed)
except Exception: # noqa: BLE001 — bug intermittent du vocoder
continue
# Toujours en echec : on coupe en deux et on reessaie chaque moitie.
if depth < 3 and len(text) > 40:
mid = _split_point(text)
left = self._gen_resilient(text[:mid].strip(), preset, speed, depth + 1)
right = self._gen_resilient(text[mid:].strip(), preset, speed, depth + 1)
return left + right
logger.warning("Kokoro: morceau abandonne apres echecs: %r", text[:60])
return []
def _split_point(text: str) -> int:
"""Point de coupe au plus proche du milieu (espace de preference)."""
mid = len(text) // 2
left = text.rfind(" ", 0, mid)
right = text.find(" ", mid)
if left == -1 and right == -1:
return mid
if left == -1:
return right
if right == -1:
return left
return left if (mid - left) <= (right - mid) else right

View File

@@ -0,0 +1,58 @@
"""Backend Qwen3-TTS (qualite + clonage par audio de reference) — rendu final.
Deux modes :
- voix preglee : `voice` (ex "Chelsie") + `language` ("French").
- clonage : `ref_audio` (+ `ref_text` transcription du clip) pour imiter une
voix de la voicebank, attribuee a un personnage.
"""
from __future__ import annotations
import numpy as np
from ..settings import get_settings
from .base import TTSBackend, VoiceSpec, to_mono_float32
from .chunk import chunk_text
# Qwen3 tolere des sequences plus longues que Kokoro, mais on borne quand meme.
_QWEN_MAX_CHARS = 500
class Qwen3Backend(TTSBackend):
name = "qwen3"
def __init__(self, model_id: str | None = None, language: str | None = None):
settings = get_settings()
self.model_id = model_id or settings.qwen3_model
self.language = language or settings.language
self._model = None
self._sample_rate = 24000
def _ensure_loaded(self) -> None:
if self._model is None:
from mlx_audio.tts.utils import load_model
self._model = load_model(self.model_id)
def default_voice(self) -> VoiceSpec:
return VoiceSpec(preset=get_settings().qwen3_default_voice)
def _gen_kwargs(self, voice: VoiceSpec) -> dict:
kwargs: dict = {"language": self.language, "speed": voice.speed}
if voice.ref_audio: # mode clonage
kwargs["ref_audio"] = voice.ref_audio
if voice.ref_text:
kwargs["ref_text"] = voice.ref_text
else: # mode voix preglee
kwargs["voice"] = voice.preset or get_settings().qwen3_default_voice
return kwargs
def synthesize(self, text: str, voice: VoiceSpec) -> tuple[np.ndarray, int]:
self._ensure_loaded()
kwargs = self._gen_kwargs(voice)
pieces: list[np.ndarray] = []
for chunk in chunk_text(text, max_chars=_QWEN_MAX_CHARS):
for result in self._model.generate(text=chunk, **kwargs):
self._sample_rate = getattr(result, "sample_rate", self._sample_rate)
pieces.append(to_mono_float32(result.audio))
if not pieces:
return np.zeros(0, dtype=np.float32), self._sample_rate
return np.concatenate(pieces), self._sample_rate

22
backend/inkflow/util.py Normal file
View File

@@ -0,0 +1,22 @@
"""Petits utilitaires partages (slug, noms de fichiers surs)."""
from __future__ import annotations
import re
import unicodedata
_SLUG_STRIP = re.compile(r"[^a-z0-9]+")
_FS_UNSAFE = re.compile(r'[<>:"/\\|?*\x00-\x1f]')
def slugify(text: str) -> str:
"""Slug ascii minuscule, utilise pour les identifiants de dossiers internes."""
norm = unicodedata.normalize("NFKD", text)
norm = norm.encode("ascii", "ignore").decode("ascii").lower()
return _SLUG_STRIP.sub("-", norm).strip("-") or "livre"
def safe_filename(name: str) -> str:
"""Nettoie un nom de fichier en conservant les accents (sortie utilisateur)."""
name = _FS_UNSAFE.sub("", name).strip()
name = re.sub(r"\s+", " ", name)
return name or "sans-titre"

40
backend/pyproject.toml Normal file
View File

@@ -0,0 +1,40 @@
[project]
name = "inkflow"
version = "0.1.0"
description = "EPUB -> livre audio, 100% local sur Mac (MLX). Analyse Gemma + TTS Qwen3/Kokoro."
requires-python = ">=3.11"
dependencies = [
# MLX (Apple Silicon)
"mlx",
"mlx-lm",
"mlx-audio",
"misaki", # phonemizer pour Kokoro (français inclus)
# Parsing EPUB
"ebooklib",
"beautifulsoup4",
"lxml",
# Audio
"soundfile", # lecture/ecriture wav
"numpy", # concat audio + normalisation
"mutagen", # tags id3 + cover (encodage mp3 via ffmpeg CLI)
# API web
"fastapi",
"uvicorn[standard]",
"websockets",
"python-multipart", # upload de fichiers
# Divers
"pydantic>=2",
"rich", # logs CLI lisibles
"typer", # CLI
]
[project.scripts]
inkflow = "inkflow.cli:app"
[build-system]
requires = ["setuptools>=68"]
build-backend = "setuptools.build_meta"
[tool.setuptools.packages.find]
where = ["."]
include = ["inkflow*"]

View File

@@ -0,0 +1,87 @@
#!/usr/bin/env python
"""Verifie l'environnement InkFlow et pre-telecharge les modeles MLX.
Usage :
python scripts/setup_models.py # tout verifier + telecharger
python scripts/setup_models.py --check # verifier sans telecharger
Pre-requis systeme : Apple Silicon, Python >= 3.11, ffmpeg (brew install ffmpeg).
"""
from __future__ import annotations
import argparse
import platform
import shutil
import sys
# Permet de lancer le script directement depuis backend/.
sys.path.insert(0, str(__import__("pathlib").Path(__file__).resolve().parents[1]))
from inkflow.config import ( # noqa: E402
GEMMA_MODEL,
KOKORO_MODEL,
QWEN3_TTS_MODEL,
ensure_dirs,
)
def check_env() -> bool:
ok = True
print(f"• Plateforme : {platform.platform()} ({platform.machine()})")
if platform.machine() != "arm64":
print(" ! Attendu arm64 (Apple Silicon) — MLX ne sera pas optimal.")
print(f"• Python : {sys.version.split()[0]}")
if sys.version_info < (3, 11):
print(" ! Python >= 3.11 requis."); ok = False
for mod in ("mlx", "mlx_lm", "mlx_audio", "ebooklib", "bs4",
"soundfile", "mutagen", "fastapi"):
try:
__import__(mod)
print(f"• import {mod:12s}: OK")
except Exception as exc: # noqa: BLE001
print(f"• import {mod:12s}: ECHEC ({exc})"); ok = False
ff = shutil.which("ffmpeg")
print(f"• ffmpeg : {ff or 'INTROUVABLE — brew install ffmpeg'}")
ok = ok and bool(ff)
return ok
def download_lm(model_id: str) -> None:
from mlx_lm import load
print(f" -> LM {model_id}")
load(model_id)
def download_tts(model_id: str) -> None:
from mlx_audio.tts.utils import load_model
print(f" -> TTS {model_id}")
load_model(model_id)
def main() -> int:
ap = argparse.ArgumentParser()
ap.add_argument("--check", action="store_true", help="verifier sans telecharger")
args = ap.parse_args()
ensure_dirs()
print("== Verification de l'environnement ==")
env_ok = check_env()
if args.check:
return 0 if env_ok else 1
if not env_ok:
print("\nEnvironnement incomplet — corrige les points ci-dessus avant de continuer.")
return 1
print("\n== Telechargement des modeles (peut etre long la 1re fois) ==")
download_lm(GEMMA_MODEL)
download_tts(KOKORO_MODEL)
download_tts(QWEN3_TTS_MODEL)
print("\nTout est pret.")
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,204 @@
"""Tests de la detection deterministe des incises.
`detect_incises` / `incise_speaker` / `iter_incise_pieces` sont pures et
testables sans Gemma. Deux passes : inversion verbe-pronom ("dit-il") et
nominale consciente du casting ("compatit Holden", "informa le soldat").
"""
from __future__ import annotations
from inkflow.analysis.segmenter import (
detect_incises,
incise_speaker,
iter_incise_pieces,
)
NAMES = {"Holden", "Kajri", "Camina Drummer"}
def _pieces(text: str, names=NAMES) -> list[tuple[bool, str]]:
return iter_incise_pieces(text, detect_incises(text, names=names))
# --- Passe inversion (verbe-pronom) -----------------------------------------
def test_inversion_au_milieu():
assert _pieces("James Holden, coupa-t-elle. Je sais qui vous êtes.") == [
(False, "James Holden,"),
(True, "coupa-t-elle."),
(False, "Je sais qui vous êtes."),
]
def test_inversion_en_fin():
assert _pieces("C'est fini, dit-elle.") == [
(False, "C'est fini,"),
(True, "dit-elle."),
]
def test_inversion_reflechi_exclamation():
assert _pieces("Viens ici, s'écria-t-il !") == [
(False, "Viens ici,"),
(True, "s'écria-t-il !"),
]
def test_inversion_fermee_par_virgule():
assert _pieces("Pars, répondit-elle, et ne reviens pas.") == [
(False, "Pars,"),
(True, "répondit-elle,"),
(False, "et ne reviens pas."),
]
def test_inversion_complements_apres_pronom():
assert _pieces("Trop tard, murmura-t-il en souriant. Partons.") == [
(False, "Trop tard,"),
(True, "murmura-t-il en souriant."),
(False, "Partons."),
]
def test_double_inversion():
assert _pieces("Stop, dit-il. Non, reprit-elle.") == [
(False, "Stop,"),
(True, "dit-il."),
(False, "Non,"),
(True, "reprit-elle."),
]
# --- Incise en fin de parole : tout le reste de la replique est narration ----
def test_incise_apres_fin_de_phrase_va_jusqu_au_bout():
# Apres "…" la parole est close : "dit-il ... provisoires." est narration.
text = ("Dans une minute, oui. Je voudrais juste… dit-il avec un geste vague, "
"comme si tout cela n'avait plus d'importance.")
assert _pieces(text) == [
(False, "Dans une minute, oui. Je voudrais juste…"),
(True, "dit-il avec un geste vague, comme si tout cela n'avait plus "
"d'importance."),
]
def test_incise_apres_virgule_reprend_le_dialogue():
# Apres une simple virgule, le dialogue reprend (contraste avec ci-dessus).
assert _pieces("Pars, répondit-elle, et ne reviens pas.") == [
(False, "Pars,"),
(True, "répondit-elle,"),
(False, "et ne reviens pas."),
]
def test_incise_nominale_apres_point_interrogation_va_au_bout():
text = "Vraiment ? demanda-t-il en se levant. Il s'éloigna."
assert _pieces(text) == [
(False, "Vraiment ?"),
(True, "demanda-t-il en se levant. Il s'éloigna."),
]
# --- Passe nominale (verbe + sujet connu) -----------------------------------
def test_nominale_nom_propre():
assert _pieces("Toutes mes condoléances, compatit Holden.") == [
(False, "Toutes mes condoléances,"),
(True, "compatit Holden."),
]
def test_nominale_alias_apres_ponctuation_forte():
# "?" comme delimiteur a gauche + sujet = alias d'un personnage connu.
assert _pieces("Flippant, cet enfoiré, hein ? lança Drummer.") == [
(False, "Flippant, cet enfoiré, hein ?"),
(True, "lança Drummer."),
]
def test_nominale_clitic_et_nom_de_role():
assert _pieces("Vous venez, monsieur ? lui demanda un garde.") == [
(False, "Vous venez, monsieur ?"),
(True, "lui demanda un garde."),
]
# --- incise_speaker : seeding du locuteur explicite -------------------------
def test_seed_speaker_nom_propre():
text = "Toutes mes condoléances, compatit Holden."
inc = detect_incises(text, names=NAMES)[0]
assert incise_speaker(text, inc, NAMES) == "Holden"
def test_seed_speaker_alias_vers_canonique():
text = "Hein ? lança Drummer."
inc = detect_incises(text, names=NAMES)[0]
assert incise_speaker(text, inc, NAMES) == "Camina Drummer"
def test_seed_speaker_role_non_nomme_est_none():
# Un nom de role ("un garde") n'est pas un personnage du casting -> pas de seed.
text = "Vous venez ? lui demanda un garde."
inc = detect_incises(text, names=NAMES)[0]
assert incise_speaker(text, inc, NAMES) is None
def test_seed_speaker_inversion_est_none():
text = "C'est fini, dit-elle."
inc = detect_incises(text, names=NAMES)[0]
assert incise_speaker(text, inc, NAMES) is None
def test_seed_nom_propre_absent_du_casting():
# Le nom est ecrit dans l'incise -> seede meme si l'extraction l'a rate.
text = "Bonjour, lança Drummer."
inc = detect_incises(text, names=set())[0]
assert incise_speaker(text, inc, set()) == "Drummer"
assert _pieces(text, names=set()) == [
(False, "Bonjour,"),
(True, "lança Drummer."),
]
# --- Faux positifs a NE PAS detecter ----------------------------------------
def test_vocatif_adresse_pas_incise():
# Le personnage est interpelle, pas une incise (aucun verbe de parole).
text = "Vous n'avez pas l'air en mesure de rendre service, capitaine Holden."
assert detect_incises(text, names=NAMES) == []
def test_imperatif_sans_incise():
assert detect_incises("Donne-le-moi.", names=NAMES) == []
def test_pronom_tu_exclu():
assert detect_incises("Crois-tu ?", names=NAMES) == []
def test_replique_simple_sans_incise():
assert detect_incises("Bonjour à tous.", names=NAMES) == []
def test_sans_noms_inversion_seule():
# Sans casting fourni, la passe inversion fonctionne toujours.
assert _pieces("C'est fini, dit-elle.", names=set()) == [
(False, "C'est fini,"),
(True, "dit-elle."),
]
# --- Invariants -------------------------------------------------------------
def test_texte_preserve_modulo_espaces():
text = "James Holden, coupa-t-elle. Je sais qui vous êtes."
joined = "".join(p for _, p in _pieces(text))
assert joined.replace(" ", "") == text.replace(" ", "")
def test_bornes_non_chevauchantes_et_triees():
text = "Stop, dit-il. Non, reprit-elle."
incs = detect_incises(text, names=NAMES)
assert all(incs[i].end <= incs[i + 1].start for i in range(len(incs) - 1))
for inc in incs:
assert 0 <= inc.start < inc.end <= len(text)

40
frontend/dist/assets/index-CMUl6Yfl.js vendored Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

13
frontend/dist/index.html vendored Normal file
View File

@@ -0,0 +1,13 @@
<!doctype html>
<html lang="fr">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>InkFlow — EPUB → Livre audio</title>
<script type="module" crossorigin src="/assets/index-CMUl6Yfl.js"></script>
<link rel="stylesheet" crossorigin href="/assets/index-DlPmWkkU.css">
</head>
<body>
<div id="root"></div>
</body>
</html>

12
frontend/index.html Normal file
View File

@@ -0,0 +1,12 @@
<!doctype html>
<html lang="fr">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>InkFlow — EPUB → Livre audio</title>
</head>
<body>
<div id="root"></div>
<script type="module" src="/src/main.jsx"></script>
</body>
</html>

2767
frontend/package-lock.json generated Normal file

File diff suppressed because it is too large Load Diff

22
frontend/package.json Normal file
View File

@@ -0,0 +1,22 @@
{
"name": "inkflow-frontend",
"private": true,
"version": "0.1.0",
"type": "module",
"scripts": {
"dev": "vite",
"build": "vite build",
"preview": "vite preview"
},
"dependencies": {
"react": "^18.3.1",
"react-dom": "^18.3.1"
},
"devDependencies": {
"@vitejs/plugin-react": "^4.3.4",
"autoprefixer": "^10.4.20",
"postcss": "^8.4.49",
"tailwindcss": "^3.4.17",
"vite": "^6.0.7"
}
}

View File

@@ -0,0 +1,6 @@
export default {
plugins: {
tailwindcss: {},
autoprefixer: {},
},
};

View File

@@ -0,0 +1,245 @@
import React, { useEffect, useMemo, useState } from "react";
import { api } from "./api.js";
import { Spinner } from "./ui.jsx";
const NARRATOR = "narrateur";
let _seq = 0;
const nextId = () => ++_seq;
export default function AnalysisEditor({ slug, book, state }) {
// Chapitres analysés (intersection ordre du livre x analyzed_chapters).
const analyzed = useMemo(() => {
const set = new Set(state.analyzed_chapters || []);
return book.chapters.filter((c) => set.has(c.index));
}, [book, state.analyzed_chapters]);
const [index, setIndex] = useState(() => analyzed[0]?.index ?? null);
const [analysis, setAnalysis] = useState(null); // { index, title, segments:[{_id,type,text,speaker}] }
const [names, setNames] = useState([]); // noms de personnages pour la datalist
const [loading, setLoading] = useState(false);
const [saved, setSaved] = useState(false);
// Derniere selection de texte dans une replique (pour "marquer comme incise").
const [sel, setSel] = useState({ id: null, start: 0, end: 0 });
// Filtres d'affichage (n'altèrent pas la sauvegarde).
const [query, setQuery] = useState("");
const [typeFilter, setTypeFilter] = useState("all");
const [speakerFilter, setSpeakerFilter] = useState("all");
// Si la liste des chapitres analysés change et que l'index courant disparaît.
useEffect(() => {
if (index == null || !analyzed.some((c) => c.index === index)) {
setIndex(analyzed[0]?.index ?? null);
}
}, [analyzed]); // eslint-disable-line react-hooks/exhaustive-deps
// Noms des personnages du casting (une fois).
useEffect(() => {
api.getCast(slug)
.then((d) => setNames((d.cast?.characters || []).map((c) => c.name)))
.catch(() => setNames([]));
}, [slug]);
// Chargement de l'analyse du chapitre sélectionné.
useEffect(() => {
if (index == null) { setAnalysis(null); return; }
setLoading(true);
setSaved(false);
api.getChapter(slug, index).then((d) => {
if (d.analysis) {
setAnalysis({
index: d.analysis.index,
title: d.analysis.title,
segments: (d.analysis.segments || []).map((s) => ({ ...s, _id: nextId() })),
});
} else {
setAnalysis({ index, title: d.chapter?.title || "", segments: null });
}
}).finally(() => setLoading(false));
}, [slug, index]);
const speakerOptions = useMemo(() => {
const set = new Set([NARRATOR, ...names]);
(analysis?.segments || []).forEach((s) => s.speaker && set.add(s.speaker));
return [...set];
}, [names, analysis]);
if (!analyzed.length)
return <p className="text-ink-muted">Lancez d'abord l'<b>Analyse</b> sur un chapitre.</p>;
const touch = (segments) => { setAnalysis((a) => ({ ...a, segments })); setSaved(false); };
const setSeg = (id, patch) =>
touch(analysis.segments.map((s) => {
if (s._id !== id) return s;
const next = { ...s, ...patch };
if (next.type === "narration") { next.speaker = NARRATOR; next.incises = []; }
// Edition du texte : on ecarte les incises devenues hors-bornes.
if (patch.text !== undefined) {
const len = next.text.length;
next.incises = (next.incises || []).filter(
(inc) => inc.start < inc.end && inc.end <= len);
}
return next;
}));
// Marque la portion [start,end) d'une replique comme incise (voix narrateur).
const addIncise = (id, start, end) =>
touch(analysis.segments.map((s) => {
if (s._id !== id) return s;
const incises = [...(s.incises || []), { start, end }]
.sort((a, b) => a.start - b.start)
.filter((inc, i, arr) => i === 0 || inc.start >= arr[i - 1].end);
return { ...s, incises };
}));
const removeIncise = (id, i) =>
touch(analysis.segments.map((s) =>
s._id !== id ? s : { ...s, incises: (s.incises || []).filter((_, k) => k !== i) }));
const removeSeg = (id) => touch(analysis.segments.filter((s) => s._id !== id));
const insertAfter = (id) => {
const segs = analysis.segments;
const pos = id == null ? segs.length : segs.findIndex((s) => s._id === id) + 1;
const next = [...segs];
next.splice(pos, 0, { _id: nextId(), type: "narration", text: "", speaker: NARRATOR });
touch(next);
};
const save = async () => {
const payload = {
index: analysis.index,
title: analysis.title,
segments: analysis.segments.map(({ _id, ...s }) => s),
};
await api.putAnalysis(slug, analysis.index, payload);
setSaved(true);
};
const segments = analysis?.segments;
const visible = (segments || []).filter((s) => {
if (typeFilter !== "all" && s.type !== typeFilter) return false;
if (speakerFilter !== "all" && s.speaker !== speakerFilter) return false;
if (query && !s.text.toLowerCase().includes(query.toLowerCase())) return false;
return true;
});
const dialogueCount = (segments || []).filter((s) => s.type === "dialogue").length;
return (
<div className="space-y-4">
<datalist id="speaker-list">
{speakerOptions.map((n) => <option key={n} value={n} />)}
</datalist>
{/* Barre de contrôle */}
<div className="card flex flex-wrap items-center gap-3 p-3">
<label className="text-sm text-ink-muted">Chapitre</label>
<select className="input" value={index ?? ""}
onChange={(e) => setIndex(Number(e.target.value))}>
{analyzed.map((c) => (
<option key={c.index} value={c.index}>{c.index} {c.title}</option>
))}
</select>
{segments && (
<span className="text-xs text-ink-muted">
{segments.length} segments · {dialogueCount} dialogues
</span>
)}
<button className="btn-primary ml-auto" disabled={!segments} onClick={save}>
{saved ? "✓ enregistré" : "Enregistrer"}
</button>
</div>
{loading && <p className="text-ink-muted"><Spinner /> chargement de l'analyse…</p>}
{!loading && segments === null && (
<p className="text-ink-muted">Ce chapitre n'a pas encore d'analyse. Lancez l'<b>Analyse</b>.</p>
)}
{!loading && segments && (
<>
{/* Filtres d'affichage */}
<div className="card flex flex-wrap items-center gap-3 p-3">
<input className="input flex-1 min-w-[12rem]" placeholder="Rechercher dans le texte…"
value={query} onChange={(e) => setQuery(e.target.value)} />
<select className="input" value={typeFilter} onChange={(e) => setTypeFilter(e.target.value)}>
<option value="all">tous types</option>
<option value="narration">narration</option>
<option value="dialogue">dialogue</option>
</select>
<select className="input" value={speakerFilter} onChange={(e) => setSpeakerFilter(e.target.value)}>
<option value="all">tous locuteurs</option>
{speakerOptions.map((n) => <option key={n} value={n}>{n}</option>)}
</select>
{visible.length !== segments.length && (
<span className="text-xs text-ink-muted">{visible.length} affichés</span>
)}
</div>
<div className="card divide-y divide-ink-edge">
{visible.map((s) => {
const canMark = s.type === "dialogue"
&& sel.id === s._id && sel.end > sel.start;
const incises = s.incises || [];
return (
<div key={s._id} className="px-4 py-2.5">
<div className="flex items-start gap-3">
<select className="input w-28 shrink-0" value={s.type}
onChange={(e) => setSeg(s._id, { type: e.target.value })}>
<option value="narration">narration</option>
<option value="dialogue">dialogue</option>
</select>
<textarea className="input flex-1 min-h-[2.5rem] resize-y font-serif text-sm"
rows={Math.min(6, Math.ceil((s.text.length || 1) / 80))}
value={s.text}
onSelect={(e) => s.type === "dialogue" && setSel({
id: s._id, start: e.target.selectionStart, end: e.target.selectionEnd })}
onChange={(e) => setSeg(s._id, { text: e.target.value })} />
<input className="input w-40 shrink-0" list="speaker-list"
placeholder="locuteur"
value={s.speaker} disabled={s.type === "narration"}
onChange={(e) => setSeg(s._id, { speaker: e.target.value })} />
<div className="flex shrink-0 gap-1">
<button className="btn-ghost" title="Insérer après"
onClick={() => insertAfter(s._id)}>+</button>
<button className="btn-ghost" title="Supprimer"
onClick={() => removeSeg(s._id)}></button>
</div>
</div>
{/* Incises : portions lues par le narrateur dans la réplique */}
{s.type === "dialogue" && (incises.length > 0 || canMark) && (
<div className="mt-1.5 ml-[7.75rem] flex flex-wrap items-center gap-1.5">
<span className="text-[11px] uppercase tracking-wide text-ink-muted">incises</span>
{incises.map((inc, i) => (
<span key={i}
className="inline-flex items-center gap-1 rounded bg-ink-edge/40 px-1.5 py-0.5 text-xs"
title="Lu par la voix du narrateur">
<span className="text-ink-muted">🎙</span>
<span className="font-serif">{s.text.slice(inc.start, inc.end)}</span>
<button className="text-ink-muted hover:text-ink"
title="Retirer l'incise"
onClick={() => removeIncise(s._id, i)}></button>
</span>
))}
{canMark && (
<button className="btn-ghost text-xs"
onClick={() => { addIncise(s._id, sel.start, sel.end);
setSel({ id: null, start: 0, end: 0 }); }}>
+ marquer la sélection
</button>
)}
</div>
)}
</div>
); })}
<div className="px-4 py-2.5">
<button className="btn-ghost" onClick={() => insertAfter(null)}>+ ajouter un segment</button>
</div>
</div>
</>
)}
</div>
);
}

44
frontend/src/App.jsx Normal file
View File

@@ -0,0 +1,44 @@
import React, { useState } from "react";
import Library from "./Library.jsx";
import BookView from "./BookView.jsx";
import Settings from "./Settings.jsx";
export default function App() {
// Permet d'ouvrir un livre directement via #slug (deep-link).
const [slug, setSlug] = useState(
() => (location.hash ? decodeURIComponent(location.hash.slice(1)) : null)
);
const [showSettings, setShowSettings] = useState(false);
const goHome = () => { setShowSettings(false); setSlug(null); };
return (
<div className="min-h-screen bg-ink-bg text-ink-text">
<header className="border-b border-ink-edge">
<div className="mx-auto flex max-w-6xl items-center gap-3 px-6 py-4">
<button onClick={goHome} className="flex items-center gap-2">
<span className="text-2xl">🖋</span>
<span className="font-serif text-xl tracking-wide">
Ink<span className="text-ink-accent">Flow</span>
</span>
</button>
<span className="ml-2 hidden text-sm text-ink-muted sm:inline">
EPUB livre audio · local · MLX
</span>
<button onClick={() => setShowSettings(true)} title="Réglages techniques"
className="ml-auto text-xl text-ink-muted hover:text-ink-text"></button>
</div>
</header>
<main className="mx-auto max-w-6xl px-6 py-8">
{showSettings ? (
<Settings onBack={goHome} />
) : slug ? (
<BookView slug={slug} onBack={() => setSlug(null)} />
) : (
<Library onOpen={setSlug} />
)}
</main>
</div>
);
}

99
frontend/src/BookView.jsx Normal file
View File

@@ -0,0 +1,99 @@
import React, { useEffect, useState } from "react";
import { api, subscribeState } from "./api.js";
import { StatusChip, ProgressBar, Spinner } from "./ui.jsx";
import Chapters from "./Chapters.jsx";
import AnalysisEditor from "./AnalysisEditor.jsx";
import CastEditor from "./CastEditor.jsx";
import PronunciationEditor from "./PronunciationEditor.jsx";
const STAGES = [
{ key: "analyze", label: "Analyse", action: (s) => api.analyze(s), hint: "Découpe le texte, détecte les locuteurs et le casting." },
{ key: "cast", label: "Casting", action: (s) => api.castAuto(s), hint: "Attribue une voix à chaque personnage." },
{ key: "pronounce", label: "Prononciations", action: (s) => api.pronounce(s), hint: "Repère les mots à risque de mauvaise prononciation." },
];
export default function BookView({ slug, onBack }) {
const [data, setData] = useState(null);
const [state, setState] = useState(null);
const [tab, setTab] = useState("chapters");
useEffect(() => {
api.getBook(slug).then((d) => { setData(d); setState(d.state); });
const unsub = subscribeState(slug, setState);
return unsub;
}, [slug]);
if (!data) return <p className="text-ink-muted"><Spinner /> chargement</p>;
const { book } = data;
const st = state || data.state;
const busy = !!st.active_stage;
return (
<div className="space-y-6">
<button onClick={onBack} className="text-sm text-ink-muted hover:text-ink-text"> Bibliothèque</button>
<div className="flex gap-5">
{book.cover_file && (
<img src={api.coverUrl(slug)} alt="" className="h-44 rounded-md border border-ink-edge object-cover" />
)}
<div className="flex-1">
<h1 className="font-serif text-2xl">{book.title}</h1>
<p className="text-ink-muted">{book.author}</p>
<p className="mt-1 text-sm text-ink-muted">{book.chapters.filter((c) => c.render).length} chapitres à narrer</p>
{busy && (
<div className="mt-4 max-w-md space-y-1">
<div className="flex justify-between text-xs text-ink-accent">
<span>{st.active_detail || st.active_stage}</span>
<span>{Math.round((st.active_progress || 0) * 100)}%</span>
</div>
<ProgressBar value={st.active_progress} />
</div>
)}
</div>
</div>
{/* Pipeline */}
<div className="grid grid-cols-1 gap-3 sm:grid-cols-3">
{STAGES.map((stage) => {
const status = st.stages?.[stage.key] || "pending";
return (
<div key={stage.key} className="card p-4">
<div className="flex items-center justify-between">
<span className="font-medium">{stage.label}</span>
<StatusChip status={status} />
</div>
<p className="mt-1 text-xs text-ink-muted">{stage.hint}</p>
<button className="btn-ghost mt-3" disabled={busy}
onClick={() => stage.action(slug)}>
{status === "done" ? "Relancer" : "Lancer"}
</button>
</div>
);
})}
</div>
{/* Onglets */}
<div className="flex gap-1 border-b border-ink-edge">
{[
["chapters", "Chapitres"],
["analysis", "Analyse"],
["cast", "Casting"],
["pron", "Prononciation"],
].map(([key, label]) => (
<button key={key} onClick={() => setTab(key)}
className={`px-4 py-2 text-sm ${tab === key
? "border-b-2 border-ink-accent text-ink-text"
: "text-ink-muted hover:text-ink-text"}`}>
{label}
</button>
))}
</div>
{tab === "chapters" && <Chapters slug={slug} book={book} state={st} busy={busy} />}
{tab === "analysis" && <AnalysisEditor slug={slug} book={book} state={st} />}
{tab === "cast" && <CastEditor slug={slug} busy={busy} />}
{tab === "pron" && <PronunciationEditor slug={slug} />}
</div>
);
}

119
frontend/src/CastEditor.jsx Normal file
View File

@@ -0,0 +1,119 @@
import React, { useEffect, useState } from "react";
import { api } from "./api.js";
import { Spinner } from "./ui.jsx";
function VoiceSelect({ voices, value, onChange }) {
return (
<select className="input" value={value || ""} onChange={(e) => onChange(e.target.value)}>
<option value=""> aucune </option>
{voices.map((v) => (
<option key={v.id} value={v.id}>
{v.label || v.id} ({v.gender === "male" ? "H" : v.gender === "female" ? "F" : "?"})
</option>
))}
</select>
);
}
export default function CastEditor({ slug, busy }) {
const [cast, setCast] = useState(null);
const [voices, setVoices] = useState([]);
const [saved, setSaved] = useState(false);
const [playing, setPlaying] = useState(null);
const [msg, setMsg] = useState(null);
const dedupPending = React.useRef(false);
const reload = () =>
api.getCast(slug).then((d) => { setCast(d.cast); setVoices(d.voicebank.entries); });
useEffect(() => { reload(); }, [slug]);
// Recharge le casting quand un job de fond (dédup / casting chapitre) se termine.
useEffect(() => {
if (busy) return;
reload().then(() => {
if (dedupPending.current) {
dedupPending.current = false;
api.getCast(slug).then((d) =>
setMsg(`✓ déduplication terminée — ${d.cast.characters.length} personnages`));
}
});
}, [busy]);
const dedup = async () => {
setMsg(null);
try {
dedupPending.current = true;
await api.castDedup(slug);
setMsg("Déduplication lancée…");
} catch (e) {
dedupPending.current = false;
setMsg("Échec : " + e + " (le serveur backend est-il à jour ? redémarre-le)");
}
};
if (!cast) return <p className="text-ink-muted"><Spinner /> chargement du casting</p>;
if (!cast.characters.length)
return <p className="text-ink-muted">Lancez d'abord l'<b>Analyse</b> puis le <b>Casting</b>.</p>;
const update = (patch) => { setCast({ ...cast, ...patch }); setSaved(false); };
const setChar = (name, voiceId) =>
update({ characters: cast.characters.map((c) => c.name === name ? { ...c, voice_id: voiceId } : c) });
const preview = async (voiceId) => {
if (!voiceId) return;
setPlaying(voiceId);
try {
const url = await api.previewVoice(voiceId, "Bonjour, voici un aperçu de cette voix.");
const a = new Audio(url);
a.onended = () => setPlaying(null);
a.play();
} catch { setPlaying(null); }
};
const save = async () => { await api.putCast(slug, cast); setSaved(true); };
return (
<div className="space-y-4">
<div className="card flex items-center gap-3 p-3">
<span className="text-sm text-ink-muted">Narrateur</span>
<VoiceSelect voices={voices} value={cast.narrator_voice_id}
onChange={(v) => update({ narrator_voice_id: v })} />
<button className="btn-ghost" onClick={() => preview(cast.narrator_voice_id)}>
{playing === cast.narrator_voice_id ? "♪" : "▶"} écouter
</button>
<button className="btn-ghost ml-auto" disabled={busy}
title="Fusionne les variantes d'un même personnage (Holden / James Holden / James)"
onClick={dedup}>
{busy ? "…" : "Dédupliquer"}
</button>
<button className="btn-primary" onClick={save}>
{saved ? "✓ enregistré" : "Enregistrer"}
</button>
</div>
{msg && <p className="px-1 text-sm text-ink-muted">{msg}</p>}
<div className="card divide-y divide-ink-edge">
{cast.characters.map((c) => (
<div key={c.name} className="flex items-center gap-3 px-4 py-2.5">
<div className="flex-1 min-w-0">
<p className="truncate font-serif text-sm">{c.name}</p>
{c.aliases?.length > 0 && (
<p className="truncate text-xs text-ink-muted">alias : {c.aliases.join(", ")}</p>
)}
{c.description && <p className="truncate text-xs text-ink-muted">{c.description}</p>}
</div>
<span className="chip bg-ink-edge text-ink-muted">
{c.gender === "male" ? "homme" : c.gender === "female" ? "femme" : "?"}
</span>
<VoiceSelect voices={voices} value={c.voice_id}
onChange={(v) => setChar(c.name, v)} />
<button className="btn-ghost" onClick={() => preview(c.voice_id)}>
{playing === c.voice_id ? "♪" : "▶"}
</button>
</div>
))}
</div>
</div>
);
}

98
frontend/src/Chapters.jsx Normal file
View File

@@ -0,0 +1,98 @@
import React, { useEffect, useState } from "react";
import { api } from "./api.js";
import { StatusChip, ProgressBar } from "./ui.jsx";
export default function Chapters({ slug, book, state, busy }) {
const chapters = book.chapters.filter((c) => c.render);
const [backend, setBackend] = useState("kokoro");
const [mono, setMono] = useState(false);
const [selected, setSelected] = useState(() => new Set());
// Initialise le moteur sur le backend par defaut des reglages.
useEffect(() => {
api.getSettings().then((s) => s?.default_backend && setBackend(s.default_backend)).catch(() => {});
}, []);
const toggle = (idx) => {
const next = new Set(selected);
next.has(idx) ? next.delete(idx) : next.add(idx);
setSelected(next);
};
const renderChapters = (indexes) => {
if (!indexes.length) return;
api.render(slug, indexes, backend, mono);
};
return (
<div className="space-y-4">
<div className="card flex flex-wrap items-center gap-3 p-3">
<label className="text-sm text-ink-muted">Moteur</label>
<select className="input" value={backend} onChange={(e) => setBackend(e.target.value)}>
<option value="kokoro">Kokoro (rapide)</option>
<option value="qwen3">Qwen3 (qualité + clonage)</option>
</select>
<label className="flex items-center gap-2 text-sm text-ink-muted">
<input type="checkbox" checked={mono} onChange={(e) => setMono(e.target.checked)} />
mono-narrateur
</label>
<div className="ml-auto flex gap-2">
<button className="btn-ghost" disabled={busy || !selected.size}
onClick={() => renderChapters([...selected])}>
Rendre la sélection ({selected.size})
</button>
<button className="btn-primary" disabled={busy}
onClick={() => renderChapters(chapters.map((c) => c.index))}>
Rendre tout
</button>
</div>
</div>
<div className="card divide-y divide-ink-edge">
{chapters.map((c) => {
const rs = state.render?.[c.index] || state.render?.[String(c.index)] || {};
const analyzed = (state.analyzed_chapters || []).includes(c.index);
return (
<div key={c.index} className="flex items-center gap-3 px-4 py-2.5">
<input type="checkbox" checked={selected.has(c.index)}
onChange={() => toggle(c.index)} />
<div className="w-9 text-center text-xs text-ink-muted">{c.index}</div>
<div className="flex-1 min-w-0">
<p className="truncate font-serif text-sm">{c.title}</p>
<div className="mt-0.5 flex items-center gap-2 text-xs text-ink-muted">
<span>{c.word_count} mots</span>
{c.pov && <span className="chip bg-ink-edge text-ink-muted">{c.pov}</span>}
{analyzed && <span className="text-emerald-400">analysé</span>}
</div>
{rs.status === "running" && (
<div className="mt-1.5 max-w-xs"><ProgressBar value={rs.progress} /></div>
)}
</div>
{rs.status && <StatusChip status={rs.status} />}
{rs.mp3 && (
<>
<audio controls src={api.audioUrl(slug, c.index)} className="h-8" />
<a className="btn-ghost" href={api.audioUrl(slug, c.index)} download></a>
</>
)}
{!busy && (
<>
<button className="btn-ghost" title={analyzed ? "Ré-analyser ce chapitre" : "Analyser ce chapitre"}
onClick={() => api.analyze(slug, [c.index])}>
{analyzed ? "Ré-analyser" : "Analyser"}
</button>
<button className="btn-ghost" title="Ré-analyser le casting de ce chapitre (sans re-segmenter)"
onClick={() => api.castAnalyze(slug, [c.index])}>
Casting
</button>
<button className="btn-ghost" title="Rendre ce chapitre"
onClick={() => renderChapters([c.index])}></button>
</>
)}
</div>
);
})}
</div>
</div>
);
}

80
frontend/src/Library.jsx Normal file
View File

@@ -0,0 +1,80 @@
import React, { useEffect, useRef, useState } from "react";
import { api } from "./api.js";
import { Spinner } from "./ui.jsx";
export default function Library({ onOpen }) {
const [books, setBooks] = useState(null);
const [uploading, setUploading] = useState(false);
const [error, setError] = useState(null);
const fileRef = useRef();
const refresh = () => api.listBooks().then(setBooks).catch((e) => setError(String(e)));
useEffect(() => { refresh(); }, []);
const upload = async (file) => {
if (!file) return;
setUploading(true);
setError(null);
try {
const { slug } = await api.uploadBook(file);
await refresh();
onOpen(slug);
} catch (e) {
setError("Échec de l'import : " + e);
} finally {
setUploading(false);
}
};
return (
<div className="space-y-8">
<section
onDragOver={(e) => e.preventDefault()}
onDrop={(e) => { e.preventDefault(); upload(e.dataTransfer.files[0]); }}
className="card flex flex-col items-center justify-center gap-3 border-dashed py-12 text-center"
>
<div className="text-4xl">📖</div>
<p className="font-serif text-lg">Déposez un fichier EPUB</p>
<p className="text-sm text-ink-muted">ou</p>
<button className="btn-primary" disabled={uploading}
onClick={() => fileRef.current?.click()}>
{uploading ? <Spinner /> : null}
{uploading ? "Import en cours…" : "Choisir un fichier"}
</button>
<input ref={fileRef} type="file" accept=".epub" className="hidden"
onChange={(e) => upload(e.target.files[0])} />
</section>
{error && <p className="text-sm text-red-400">{error}</p>}
<section>
<h2 className="mb-3 font-serif text-lg text-ink-muted">Bibliothèque</h2>
{books === null ? (
<p className="text-ink-muted"><Spinner /> chargement</p>
) : books.length === 0 ? (
<p className="text-ink-muted">Aucun livre pour l'instant.</p>
) : (
<div className="grid grid-cols-2 gap-4 sm:grid-cols-3 lg:grid-cols-4">
{books.map((b) => (
<button key={b.slug} onClick={() => onOpen(b.slug)}
className="card group overflow-hidden text-left transition-transform hover:-translate-y-1">
<div className="aspect-[2/3] w-full bg-ink-edge">
{b.cover && (
<img src={b.cover} alt="" className="h-full w-full object-cover" />
)}
</div>
<div className="p-3">
<p className="line-clamp-2 font-serif text-sm">{b.title}</p>
<p className="mt-1 text-xs text-ink-muted">{b.author}</p>
<p className="mt-2 text-xs text-ink-accent">
{b.rendered}/{b.chapters} chapitres rendus
</p>
</div>
</button>
))}
</div>
)}
</section>
</div>
);
}

View File

@@ -0,0 +1,59 @@
import React, { useEffect, useState } from "react";
import { api } from "./api.js";
import { Spinner } from "./ui.jsx";
export default function PronunciationEditor({ slug }) {
const [entries, setEntries] = useState(null);
const [saved, setSaved] = useState(false);
useEffect(() => {
api.getPron(slug).then((d) => setEntries(d.entries || []));
}, [slug]);
if (entries === null) return <p className="text-ink-muted"><Spinner /> chargement</p>;
const dirty = () => setSaved(false);
const setRow = (i, patch) => {
setEntries(entries.map((e, j) => (j === i ? { ...e, ...patch } : e)));
dirty();
};
const add = () => { setEntries([...entries, { term: "", replacement: "", enabled: true }]); dirty(); };
const remove = (i) => { setEntries(entries.filter((_, j) => j !== i)); dirty(); };
const save = async () => {
await api.putPron(slug, { entries: entries.filter((e) => e.term) });
setSaved(true);
};
return (
<div className="space-y-4">
<div className="flex items-center gap-3">
<p className="text-sm text-ink-muted">
Corrigez la graphie des mots mal prononcés. La colonne « prononciation » remplace le terme avant la synthèse.
</p>
<button className="btn-ghost ml-auto" onClick={add}>+ ajouter</button>
<button className="btn-primary" onClick={save}>{saved ? "✓ enregistré" : "Enregistrer"}</button>
</div>
{entries.length === 0 ? (
<p className="text-ink-muted">Aucune entrée. Lancez l'étape <b>Prononciations</b> ou ajoutez-en.</p>
) : (
<div className="card divide-y divide-ink-edge">
<div className="grid grid-cols-[1fr_1fr_auto_auto] gap-3 px-4 py-2 text-xs uppercase text-ink-muted">
<span>Terme</span><span>Prononciation</span><span>Actif</span><span></span>
</div>
{entries.map((e, i) => (
<div key={i} className="grid grid-cols-[1fr_1fr_auto_auto] items-center gap-3 px-4 py-2">
<input className="input" value={e.term}
onChange={(ev) => setRow(i, { term: ev.target.value })} />
<input className="input" value={e.replacement}
onChange={(ev) => setRow(i, { replacement: ev.target.value })} />
<input type="checkbox" checked={e.enabled !== false}
onChange={(ev) => setRow(i, { enabled: ev.target.checked })} />
<button className="text-ink-muted hover:text-red-400" onClick={() => remove(i)}></button>
</div>
))}
</div>
)}
</div>
);
}

142
frontend/src/Settings.jsx Normal file
View File

@@ -0,0 +1,142 @@
import React, { useEffect, useState } from "react";
import { api } from "./api.js";
import { Spinner } from "./ui.jsx";
// Description declarative des champs, groupes par section.
const SECTIONS = [
{
title: "Modèles (identifiants MLX / HuggingFace)",
hint: "Changer un identifiant recharge un autre modèle (peut déclencher un téléchargement au prochain usage).",
fields: [
{ key: "gemma_model", label: "Gemma (analyse)", type: "text" },
{ key: "qwen3_model", label: "Qwen3-TTS (rendu)", type: "text" },
{ key: "kokoro_model", label: "Kokoro (preview)", type: "text" },
],
},
{
title: "Génération Gemma",
hint: "Paramètres d'échantillonnage de l'analyse (locuteurs, personnages, prononciations).",
fields: [
{ key: "gemma_temperature", label: "Température", type: "number", step: 0.05, min: 0, max: 2 },
{ key: "gemma_max_tokens", label: "Max tokens", type: "number", step: 1, min: 64, max: 8192 },
],
},
{
title: "Prompts système (analyse)",
hint: "Instructions envoyées à Gemma avant chaque tâche. Le modèle doit répondre en JSON.",
fields: [
{ key: "prompt_speakers", label: "Attribution des locuteurs", type: "textarea" },
{ key: "prompt_characters", label: "Extraction des personnages", type: "textarea" },
{ key: "prompt_pronunciation", label: "Mots à risque (prononciation)", type: "textarea" },
],
},
{
title: "Casting (déduplication)",
hint: "Le rapprochement des variantes de noms (Holden / James Holden / James) est heuristique et sûr. La passe Gemma ajoute les variantes non évidentes (diminutifs, titres) mais, avec un petit modèle local, produit des fusions erronées.",
fields: [
{ key: "dedup_use_gemma", label: "Affiner la déduplication avec Gemma (moins sûr)", type: "checkbox" },
],
},
{
title: "TTS (voix par défaut)",
hint: "Backend et voix utilisés par défaut pour le rendu et les replis.",
fields: [
{ key: "default_backend", label: "Backend par défaut", type: "select",
options: [["kokoro", "Kokoro (rapide)"], ["qwen3", "Qwen3 (qualité + clonage)"]] },
{ key: "language", label: "Langue (Qwen3)", type: "text" },
{ key: "kokoro_lang_code", label: "Code langue Kokoro", type: "text" },
{ key: "kokoro_default_voice", label: "Voix Kokoro par défaut", type: "text" },
{ key: "qwen3_default_voice", label: "Voix Qwen3 par défaut", type: "text" },
],
},
{
title: "Audio (encodage final)",
hint: "Appliqué à la concaténation et à l'export MP3.",
fields: [
{ key: "target_sample_rate", label: "Sample rate (Hz)", type: "number", step: 1000, min: 8000, max: 48000 },
{ key: "mp3_bitrate", label: "Bitrate MP3", type: "text" },
{ key: "target_dbfs", label: "Normalisation (dBFS)", type: "number", step: 0.5, min: -40, max: 0 },
],
},
];
function Field({ field, value, onChange }) {
const common = "input w-full";
if (field.type === "checkbox")
return <input type="checkbox" className="h-4 w-4"
checked={!!value} onChange={(e) => onChange(e.target.checked)} />;
if (field.type === "textarea")
return <textarea className={`${common} min-h-[5rem] resize-y text-sm`} rows={4}
value={value ?? ""} onChange={(e) => onChange(e.target.value)} />;
if (field.type === "select")
return <select className={common} value={value ?? ""} onChange={(e) => onChange(e.target.value)}>
{field.options.map(([v, lbl]) => <option key={v} value={v}>{lbl}</option>)}
</select>;
if (field.type === "number")
return <input className={common} type="number"
step={field.step} min={field.min} max={field.max}
value={value ?? ""} onChange={(e) => onChange(e.target.value === "" ? "" : Number(e.target.value))} />;
return <input className={common} type="text"
value={value ?? ""} onChange={(e) => onChange(e.target.value)} />;
}
export default function Settings({ onBack }) {
const [settings, setSettings] = useState(null);
const [saved, setSaved] = useState(false);
const [error, setError] = useState(null);
useEffect(() => {
api.getSettings().then(setSettings).catch((e) => setError(String(e)));
}, []);
if (error) return <p className="text-sm text-red-400">{error}</p>;
if (!settings) return <p className="text-ink-muted"><Spinner /> chargement des réglages</p>;
const set = (key, val) => { setSettings({ ...settings, [key]: val }); setSaved(false); };
const save = async () => {
setError(null);
try { await api.putSettings(settings); setSaved(true); }
catch (e) { setError("Échec de l'enregistrement : " + e); }
};
return (
<div className="space-y-6">
<div className="flex items-center gap-3">
<button onClick={onBack} className="text-sm text-ink-muted hover:text-ink-text"> Bibliothèque</button>
<h1 className="font-serif text-2xl">Réglages techniques</h1>
<button className="btn-primary ml-auto" onClick={save}>
{saved ? "✓ enregistré" : "Enregistrer"}
</button>
</div>
<p className="text-sm text-ink-muted">
Réglages globaux appliqués à toute l'app. Les changements de modèle prennent effet au
prochain lancement d'analyse ou de rendu.
</p>
{SECTIONS.map((sec) => (
<section key={sec.title} className="card p-4 space-y-3">
<div>
<h2 className="font-medium">{sec.title}</h2>
{sec.hint && <p className="text-xs text-ink-muted">{sec.hint}</p>}
</div>
<div className="grid gap-3">
{sec.fields.map((f) => (
<label key={f.key} className="grid gap-1">
<span className="text-sm text-ink-muted">{f.label}</span>
<Field field={f} value={settings[f.key]} onChange={(v) => set(f.key, v)} />
</label>
))}
</div>
</section>
))}
<div className="flex justify-end">
<button className="btn-primary" onClick={save}>
{saved ? "✓ enregistré" : "Enregistrer"}
</button>
</div>
</div>
);
}

64
frontend/src/api.js Normal file
View File

@@ -0,0 +1,64 @@
// Client API InkFlow : wrappers fetch + abonnement WebSocket a l'etat.
async function j(url, opts) {
const res = await fetch(url, opts);
if (!res.ok) throw new Error(`${res.status} ${await res.text()}`);
const ct = res.headers.get("content-type") || "";
return ct.includes("application/json") ? res.json() : res;
}
const json = (method, body) => ({
method,
headers: { "Content-Type": "application/json" },
body: body ? JSON.stringify(body) : undefined,
});
export const api = {
listBooks: () => j("/api/books"),
uploadBook: (file) => {
const fd = new FormData();
fd.append("file", file);
return j("/api/books", { method: "POST", body: fd });
},
getBook: (slug) => j(`/api/books/${slug}`),
getChapter: (slug, idx) => j(`/api/books/${slug}/chapters/${idx}`),
putAnalysis: (slug, idx, analysis) =>
j(`/api/books/${slug}/chapters/${idx}/analysis`, json("PUT", analysis)),
analyze: (slug, chapters) => j(`/api/books/${slug}/analyze`, json("POST", { chapters })),
pronounce: (slug) => j(`/api/books/${slug}/pronounce`, json("POST")),
castAuto: (slug) => j(`/api/books/${slug}/cast/auto`, json("POST")),
castAnalyze: (slug, chapters) =>
j(`/api/books/${slug}/cast/analyze`, json("POST", { chapters })),
castDedup: (slug) => j(`/api/books/${slug}/cast/dedup`, json("POST")),
render: (slug, chapters, backend, mono) =>
j(`/api/books/${slug}/render`, json("POST", { chapters, backend, mono })),
getCast: (slug) => j(`/api/books/${slug}/cast`),
putCast: (slug, cast) => j(`/api/books/${slug}/cast`, json("PUT", cast)),
getPron: (slug) => j(`/api/books/${slug}/pronunciation`),
putPron: (slug, pron) => j(`/api/books/${slug}/pronunciation`, json("PUT", pron)),
getSettings: () => j("/api/settings"),
putSettings: (settings) => j("/api/settings", json("PUT", settings)),
audioUrl: (slug, idx) => `/api/books/${slug}/audio/${idx}`,
coverUrl: (slug) => `/api/books/${slug}/cover`,
previewVoice: async (voiceId, text) => {
const res = await fetch("/api/voicebank/preview", json("POST", { voice_id: voiceId, text }));
if (!res.ok) throw new Error("preview");
return URL.createObjectURL(await res.blob());
},
};
// Abonnement temps reel a l'etat d'un livre. Reconnecte automatiquement.
export function subscribeState(slug, onState) {
let ws, closed = false;
const connect = () => {
const proto = location.protocol === "https:" ? "wss" : "ws";
ws = new WebSocket(`${proto}://${location.host}/ws/${slug}`);
ws.onmessage = (e) => {
const msg = JSON.parse(e.data);
if (msg.type === "state") onState(msg.state);
};
ws.onclose = () => { if (!closed) setTimeout(connect, 1500); };
};
connect();
return () => { closed = true; ws && ws.close(); };
}

37
frontend/src/index.css Normal file
View File

@@ -0,0 +1,37 @@
@tailwind base;
@tailwind components;
@tailwind utilities;
:root {
color-scheme: dark;
}
body {
margin: 0;
background: #14110f;
color: #ede4d8;
font-family: system-ui, -apple-system, "Segoe UI", sans-serif;
}
@layer components {
.btn {
@apply inline-flex items-center gap-2 rounded-md px-3 py-1.5 text-sm font-medium
transition-colors disabled:opacity-40 disabled:cursor-not-allowed;
}
.btn-primary {
@apply btn bg-ink-accent text-ink-bg hover:bg-ink-accent2;
}
.btn-ghost {
@apply btn border border-ink-edge text-ink-text hover:bg-ink-edge;
}
.card {
@apply rounded-lg border border-ink-edge bg-ink-panel;
}
.chip {
@apply inline-flex items-center rounded-full px-2 py-0.5 text-xs font-medium;
}
.input {
@apply rounded-md border border-ink-edge bg-ink-bg px-2 py-1 text-sm
text-ink-text outline-none focus:border-ink-accent;
}
}

6
frontend/src/main.jsx Normal file
View File

@@ -0,0 +1,6 @@
import React from "react";
import { createRoot } from "react-dom/client";
import App from "./App.jsx";
import "./index.css";
createRoot(document.getElementById("root")).render(<App />);

35
frontend/src/ui.jsx Normal file
View File

@@ -0,0 +1,35 @@
// Petits widgets partages.
import React from "react";
const STATUS_STYLE = {
done: "bg-emerald-900/50 text-emerald-300",
running: "bg-ink-accent/20 text-ink-accent",
error: "bg-red-900/50 text-red-300",
pending: "bg-ink-edge text-ink-muted",
};
const STATUS_LABEL = { done: "terminé", running: "en cours", error: "erreur", pending: "en attente" };
export function StatusChip({ status }) {
return (
<span className={`chip ${STATUS_STYLE[status] || STATUS_STYLE.pending}`}>
{STATUS_LABEL[status] || status}
</span>
);
}
export function ProgressBar({ value }) {
return (
<div className="h-1.5 w-full overflow-hidden rounded-full bg-ink-edge">
<div
className="h-full bg-ink-accent transition-all duration-300"
style={{ width: `${Math.round((value || 0) * 100)}%` }}
/>
</div>
);
}
export function Spinner() {
return (
<span className="inline-block h-3.5 w-3.5 animate-spin rounded-full border-2 border-ink-accent border-t-transparent" />
);
}

View File

@@ -0,0 +1,23 @@
/** @type {import('tailwindcss').Config} */
export default {
content: ["./index.html", "./src/**/*.{js,jsx}"],
theme: {
extend: {
colors: {
ink: {
bg: "#14110f",
panel: "#1d1916",
edge: "#2c2622",
muted: "#9a8c7d",
text: "#ede4d8",
accent: "#d9a441",
accent2: "#b9763f",
},
},
fontFamily: {
serif: ["Georgia", "Cambria", "serif"],
},
},
},
plugins: [],
};

14
frontend/vite.config.js Normal file
View File

@@ -0,0 +1,14 @@
import { defineConfig } from "vite";
import react from "@vitejs/plugin-react";
// En dev, l'UI tourne sur 5173 et proxifie l'API/WS vers le backend (8000).
export default defineConfig({
plugins: [react()],
server: {
port: 5173,
proxy: {
"/api": { target: "http://127.0.0.1:8000", changeOrigin: true },
"/ws": { target: "ws://127.0.0.1:8000", ws: true },
},
},
});

BIN
voicebank/clips/f_bella.wav Normal file

Binary file not shown.

BIN
voicebank/clips/f_emma.wav Normal file

Binary file not shown.

BIN
voicebank/clips/f_heart.wav Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

BIN
voicebank/clips/m_eric.wav Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

BIN
voicebank/clips/m_lewis.wav Normal file

Binary file not shown.

Binary file not shown.

BIN
voicebank/clips/m_santa.wav Normal file

Binary file not shown.

114
voicebank/metadata.json Normal file
View File

@@ -0,0 +1,114 @@
{
"entries": [
{
"id": "fr_f_siwis",
"kokoro_voice": "ff_siwis",
"gender": "female",
"age": "adult",
"lang": "fr",
"label": "Siwis (FR)",
"ref_audio": "clips/fr_f_siwis.wav",
"ref_text": "L'univers est toujours plus étrange qu'on ne le croit. Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
},
{
"id": "f_bella",
"kokoro_voice": "af_bella",
"gender": "female",
"age": "adult",
"lang": "fr",
"label": "Bella",
"ref_audio": "clips/f_bella.wav",
"ref_text": "L'univers est toujours plus étrange qu'on ne le croit. Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
},
{
"id": "f_heart",
"kokoro_voice": "af_heart",
"gender": "female",
"age": "young",
"lang": "fr",
"label": "Heart",
"ref_audio": "clips/f_heart.wav",
"ref_text": "L'univers est toujours plus étrange qu'on ne le croit. Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
},
{
"id": "f_emma",
"kokoro_voice": "bf_emma",
"gender": "female",
"age": "adult",
"lang": "fr",
"label": "Emma",
"ref_audio": "clips/f_emma.wav",
"ref_text": "L'univers est toujours plus étrange qu'on ne le croit. Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
},
{
"id": "f_nicole",
"kokoro_voice": "af_nicole",
"gender": "female",
"age": "adult",
"lang": "fr",
"label": "Nicole",
"ref_audio": "clips/f_nicole.wav",
"ref_text": "L'univers est toujours plus étrange qu'on ne le croit. Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
},
{
"id": "m_fenrir",
"kokoro_voice": "am_fenrir",
"gender": "male",
"age": "adult",
"lang": "fr",
"label": "Fenrir",
"ref_audio": "clips/m_fenrir.wav",
"ref_text": "L'univers est toujours plus étrange qu'on ne le croit. Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
},
{
"id": "m_michael",
"kokoro_voice": "am_michael",
"gender": "male",
"age": "adult",
"lang": "fr",
"label": "Michael",
"ref_audio": "clips/m_michael.wav",
"ref_text": "L'univers est toujours plus étrange qu'on ne le croit. Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
},
{
"id": "m_george",
"kokoro_voice": "bm_george",
"gender": "male",
"age": "adult",
"lang": "fr",
"label": "George",
"ref_audio": "clips/m_george.wav",
"ref_text": "L'univers est toujours plus étrange qu'on ne le croit. Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
},
{
"id": "m_lewis",
"kokoro_voice": "bm_lewis",
"gender": "male",
"age": "adult",
"lang": "fr",
"label": "Lewis",
"ref_audio": "clips/m_lewis.wav",
"ref_text": "L'univers est toujours plus étrange qu'on ne le croit. Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
},
{
"id": "m_eric",
"kokoro_voice": "am_eric",
"gender": "male",
"age": "young",
"lang": "fr",
"label": "Eric",
"ref_audio": "clips/m_eric.wav",
"ref_text": "L'univers est toujours plus étrange qu'on ne le croit. Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
},
{
"id": "m_santa",
"kokoro_voice": "am_santa",
"gender": "male",
"age": "old",
"lang": "fr",
"label": "Santa",
"ref_audio": "clips/m_santa.wav",
"ref_text": "L'univers est toujours plus étrange qu'on ne le croit. Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
}
]
}