Initial commit: InkFlow — EPUB vers livre audio local (MLX/Kokoro)
This commit is contained in:
23
.gitignore
vendored
Normal file
23
.gitignore
vendored
Normal file
@@ -0,0 +1,23 @@
|
|||||||
|
# Python
|
||||||
|
.venv/
|
||||||
|
__pycache__/
|
||||||
|
*.pyc
|
||||||
|
*.egg-info/
|
||||||
|
.pytest_cache/
|
||||||
|
|
||||||
|
# InkFlow : artefacts générés et sorties
|
||||||
|
data/
|
||||||
|
output/
|
||||||
|
|
||||||
|
# Node
|
||||||
|
node_modules/
|
||||||
|
|
||||||
|
# Échantillons audio (volumineux, non versionnés)
|
||||||
|
samples/
|
||||||
|
|
||||||
|
# Modèles / caches HF (au cas où téléchargés localement)
|
||||||
|
.cache/
|
||||||
|
models/
|
||||||
|
|
||||||
|
# OS
|
||||||
|
.DS_Store
|
||||||
10
.idea/.gitignore
generated
vendored
Normal file
10
.idea/.gitignore
generated
vendored
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
# Default ignored files
|
||||||
|
/shelf/
|
||||||
|
/workspace.xml
|
||||||
|
# Editor-based HTTP Client requests
|
||||||
|
/httpRequests/
|
||||||
|
# Ignored default folder with query files
|
||||||
|
/queries/
|
||||||
|
# Datasource local storage ignored files
|
||||||
|
/dataSources/
|
||||||
|
/dataSources.local.xml
|
||||||
105
README.md
Normal file
105
README.md
Normal file
@@ -0,0 +1,105 @@
|
|||||||
|
# InkFlow
|
||||||
|
|
||||||
|
Transforme un **EPUB** en **livre audio**, 100 % en local sur Mac (Apple Silicon / MLX),
|
||||||
|
avec des modèles open-source. Sortie : **1 dossier par livre, 1 MP3 par chapitre**
|
||||||
|
(tags ID3 + cover), au format calqué sur un audiobook classique.
|
||||||
|
|
||||||
|
- **Analyse de texte** : Gemma via `mlx-lm` (segmentation narration/dialogue,
|
||||||
|
attribution des locuteurs, extraction du casting, prononciations).
|
||||||
|
- **Synthèse vocale** : backend pluggable —
|
||||||
|
- **Kokoro** : rapide, voix préréglées → previews / mono-narrateur.
|
||||||
|
- **Qwen3-TTS** : qualité + clonage par audio de référence → rendu final, casting par personnage.
|
||||||
|
- **Langue** : optimisé français (puis multilingue).
|
||||||
|
|
||||||
|
## Pré-requis
|
||||||
|
|
||||||
|
- macOS Apple Silicon (arm64), Python ≥ 3.11
|
||||||
|
- `ffmpeg` et `espeak-ng` :
|
||||||
|
```bash
|
||||||
|
brew install ffmpeg espeak-ng
|
||||||
|
```
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3.13 -m venv .venv
|
||||||
|
source .venv/bin/activate
|
||||||
|
pip install -e backend # installe inkflow + dépendances
|
||||||
|
python backend/scripts/setup_models.py # vérifie l'env + télécharge les modèles MLX
|
||||||
|
```
|
||||||
|
|
||||||
|
> Kokoro en français nécessite `espeak-ng` ; InkFlow localise automatiquement
|
||||||
|
> `libespeak-ng.dylib` (sinon, exporter `PHONEMIZER_ESPEAK_LIBRARY`).
|
||||||
|
|
||||||
|
## Utilisation (CLI)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Parser l'EPUB -> data/<slug>/book.json + chapters/chNN.json
|
||||||
|
inkflow parse "samples/Colère de Tiamat, La - James S.A. Corey.epub"
|
||||||
|
|
||||||
|
# 2. Analyser (Gemma) -> analysis/chNN.json + cast.json
|
||||||
|
inkflow analyze la-colere-de-tiamat --chapter 5 # un chapitre
|
||||||
|
inkflow analyze la-colere-de-tiamat # tous les chapitres
|
||||||
|
|
||||||
|
# 3. Synthétiser un chapitre -> output/<livre>/NN-....mp3
|
||||||
|
inkflow render la-colere-de-tiamat 5 --backend kokoro # rapide
|
||||||
|
inkflow render la-colere-de-tiamat 5 --backend qwen3 --no-mono # qualité + multi-voix (M3)
|
||||||
|
|
||||||
|
# Infos
|
||||||
|
inkflow info la-colere-de-tiamat
|
||||||
|
```
|
||||||
|
|
||||||
|
(Sans installation `-e`, lancer depuis `backend/` via `python -m inkflow.cli …`.)
|
||||||
|
|
||||||
|
## Interface web
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Build du frontend (une fois)
|
||||||
|
cd frontend && npm install && npm run build && cd ..
|
||||||
|
|
||||||
|
# 2. Lancer l'app (API + UI servie sur le même port)
|
||||||
|
inkflow serve # http://127.0.0.1:8000
|
||||||
|
```
|
||||||
|
|
||||||
|
L'UI permet : import EPUB par glisser-déposer, suivi temps réel des étapes
|
||||||
|
(WebSocket), édition du casting (personnage → voix, avec preview), édition du
|
||||||
|
dictionnaire de prononciation, choix du moteur (Kokoro/Qwen3) et rendu des
|
||||||
|
chapitres avec lecteur audio + téléchargement.
|
||||||
|
|
||||||
|
Pour le développement frontend avec rechargement à chaud :
|
||||||
|
```bash
|
||||||
|
inkflow serve # backend sur :8000
|
||||||
|
cd frontend && npm run dev # UI sur :5173 (proxy API/WS vers :8000)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
backend/inkflow/
|
||||||
|
epub/parser.py EPUB -> book.json + texte par chapitre
|
||||||
|
analysis/gemma.py wrapper mlx-lm (Gemma)
|
||||||
|
analysis/segmenter.py narration/dialogue + locuteurs + casting
|
||||||
|
analysis/pronunciation.py
|
||||||
|
tts/base.py interface TTSBackend + VoiceSpec
|
||||||
|
tts/kokoro.py tts/qwen3.py tts/factory.py
|
||||||
|
audio/postprocess.py concat + normalisation + MP3 (ffmpeg) + cover
|
||||||
|
pipeline/render.py (segments + voix) -> MP3
|
||||||
|
store/artifacts.py persistance JSON (reprenable)
|
||||||
|
data/<slug>/ artefacts intermédiaires (json, wav, cover)
|
||||||
|
output/<livre>/ MP3 finaux (1 par chapitre)
|
||||||
|
voicebank/ clips de référence pour le clonage (M3)
|
||||||
|
```
|
||||||
|
|
||||||
|
## État d'avancement
|
||||||
|
|
||||||
|
- [x] **M1** — Parsing EPUB, analyse Gemma (segments + casting), CLI.
|
||||||
|
- [x] **M2** — TTS bout-en-bout (Kokoro/Qwen3), mono-narrateur → MP3 taggé + cover.
|
||||||
|
- [x] **M3** — Multi-voix : voice bank + auto-casting personnage → voix (clonage Qwen3).
|
||||||
|
- [x] **M4** — Interface web (FastAPI + WebSocket + React) : suivi, éditeurs casting/prononciation, previews.
|
||||||
|
- [x] **M5** — État reprenable (réconciliation avec les artefacts), run par lots via UI/CLI.
|
||||||
|
|
||||||
|
### Note sur les moteurs
|
||||||
|
- **Kokoro** : ~30 s/chapitre, voix distinctes par timbre (rendu rapide, brouillons).
|
||||||
|
- **Qwen3-TTS** : clonage des voix de la banque par personnage, qualité supérieure,
|
||||||
|
nettement plus lent — réservé au rendu final. Tout rendu est **repris** chapitre
|
||||||
|
par chapitre (relancer ne refait pas les MP3 déjà produits).
|
||||||
0
backend/inkflow/__init__.py
Normal file
0
backend/inkflow/__init__.py
Normal file
0
backend/inkflow/analysis/__init__.py
Normal file
0
backend/inkflow/analysis/__init__.py
Normal file
123
backend/inkflow/analysis/gemma.py
Normal file
123
backend/inkflow/analysis/gemma.py
Normal file
@@ -0,0 +1,123 @@
|
|||||||
|
"""Wrapper mlx-lm autour de Gemma pour l'analyse de texte.
|
||||||
|
|
||||||
|
Charge le modele paresseusement (une seule fois par process) et expose des
|
||||||
|
helpers de generation, dont un `generate_json` tolerant qui extrait le premier
|
||||||
|
objet/array JSON valide de la sortie du modele.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import re
|
||||||
|
from functools import lru_cache
|
||||||
|
from typing import Any, Optional
|
||||||
|
|
||||||
|
from ..settings import get_settings
|
||||||
|
|
||||||
|
# Bornes d'un bloc JSON dans une reponse potentiellement bavarde.
|
||||||
|
_JSON_SPAN_RE = re.compile(r"(\{.*\}|\[.*\])", re.DOTALL)
|
||||||
|
_FENCE_RE = re.compile(r"```(?:json)?\s*(.*?)```", re.DOTALL)
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=2)
|
||||||
|
def _load(model_id: str):
|
||||||
|
# Import paresseux : evite de charger mlx tant qu'on n'analyse pas.
|
||||||
|
from mlx_lm import load
|
||||||
|
return load(model_id)
|
||||||
|
|
||||||
|
|
||||||
|
class Gemma:
|
||||||
|
"""Petite facade autour de mlx-lm pour piloter Gemma."""
|
||||||
|
|
||||||
|
def __init__(self, model_id: Optional[str] = None):
|
||||||
|
self.model_id = model_id or get_settings().gemma_model
|
||||||
|
self._model = None
|
||||||
|
self._tokenizer = None
|
||||||
|
|
||||||
|
def _ensure_loaded(self) -> None:
|
||||||
|
if self._model is None:
|
||||||
|
self._model, self._tokenizer = _load(self.model_id)
|
||||||
|
|
||||||
|
def generate(
|
||||||
|
self,
|
||||||
|
prompt: str,
|
||||||
|
*,
|
||||||
|
system: Optional[str] = None,
|
||||||
|
max_tokens: Optional[int] = None,
|
||||||
|
temperature: Optional[float] = None,
|
||||||
|
) -> str:
|
||||||
|
"""Genere une reponse texte a partir d'un prompt (template de chat).
|
||||||
|
|
||||||
|
`max_tokens`/`temperature` non fournis -> valeurs des reglages courants.
|
||||||
|
"""
|
||||||
|
self._ensure_loaded()
|
||||||
|
settings = get_settings()
|
||||||
|
if max_tokens is None:
|
||||||
|
max_tokens = settings.gemma_max_tokens
|
||||||
|
if temperature is None:
|
||||||
|
temperature = settings.gemma_temperature
|
||||||
|
from mlx_lm import generate
|
||||||
|
from mlx_lm.sample_utils import make_sampler
|
||||||
|
|
||||||
|
messages = []
|
||||||
|
if system:
|
||||||
|
messages.append({"role": "system", "content": system})
|
||||||
|
messages.append({"role": "user", "content": prompt})
|
||||||
|
formatted = self._tokenizer.apply_chat_template(
|
||||||
|
messages, add_generation_prompt=True, tokenize=False
|
||||||
|
)
|
||||||
|
sampler = make_sampler(temp=temperature)
|
||||||
|
return generate(
|
||||||
|
self._model,
|
||||||
|
self._tokenizer,
|
||||||
|
prompt=formatted,
|
||||||
|
max_tokens=max_tokens,
|
||||||
|
sampler=sampler,
|
||||||
|
verbose=False,
|
||||||
|
)
|
||||||
|
|
||||||
|
def generate_json(
|
||||||
|
self,
|
||||||
|
prompt: str,
|
||||||
|
*,
|
||||||
|
system: Optional[str] = None,
|
||||||
|
max_tokens: Optional[int] = None,
|
||||||
|
temperature: Optional[float] = None,
|
||||||
|
retries: int = 1,
|
||||||
|
) -> Any:
|
||||||
|
"""Genere puis parse un JSON. Reessaie en cas d'echec de parsing.
|
||||||
|
|
||||||
|
`max_tokens`/`temperature` non fournis -> valeurs des reglages courants.
|
||||||
|
"""
|
||||||
|
last_err: Optional[Exception] = None
|
||||||
|
for attempt in range(retries + 1):
|
||||||
|
raw = self.generate(
|
||||||
|
prompt, system=system, max_tokens=max_tokens,
|
||||||
|
temperature=temperature if attempt == 0 else 0.0,
|
||||||
|
)
|
||||||
|
try:
|
||||||
|
return _extract_json(raw)
|
||||||
|
except Exception as exc: # noqa: BLE001
|
||||||
|
last_err = exc
|
||||||
|
raise ValueError(f"Reponse JSON invalide apres {retries + 1} essais: {last_err}")
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_json(text: str) -> Any:
|
||||||
|
"""Extrait le premier objet/array JSON d'une reponse libre du modele.
|
||||||
|
|
||||||
|
Tolere le texte parasite avant/apres (y compris un 2e bloc) grace a
|
||||||
|
raw_decode, qui s'arrete au premier JSON complet.
|
||||||
|
"""
|
||||||
|
text = text.strip()
|
||||||
|
fence = _FENCE_RE.search(text)
|
||||||
|
if fence:
|
||||||
|
text = fence.group(1).strip()
|
||||||
|
decoder = json.JSONDecoder()
|
||||||
|
# Cherche le 1er debut de structure JSON et decode a partir de la.
|
||||||
|
for i, ch in enumerate(text):
|
||||||
|
if ch in "[{":
|
||||||
|
try:
|
||||||
|
obj, _ = decoder.raw_decode(text[i:])
|
||||||
|
return obj
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
continue
|
||||||
|
raise ValueError("aucun JSON trouve dans la reponse")
|
||||||
59
backend/inkflow/analysis/pronunciation.py
Normal file
59
backend/inkflow/analysis/pronunciation.py
Normal file
@@ -0,0 +1,59 @@
|
|||||||
|
"""Dictionnaire de prononciation : application + proposition de candidats.
|
||||||
|
|
||||||
|
L'application est une simple reecriture de surface du texte (graphie guidee)
|
||||||
|
avant synthese. Les candidats (noms propres, termes SF) peuvent etre proposes
|
||||||
|
par Gemma puis valides par l'utilisateur dans l'UI.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import re
|
||||||
|
from typing import Iterable
|
||||||
|
|
||||||
|
from ..models import Pronunciation, PronunciationEntry
|
||||||
|
from ..settings import get_settings
|
||||||
|
from .gemma import Gemma
|
||||||
|
|
||||||
|
|
||||||
|
def apply_pronunciation(text: str, pron: Pronunciation) -> str:
|
||||||
|
"""Remplace chaque terme actif par sa graphie phonetique (mot entier)."""
|
||||||
|
for entry in pron.entries:
|
||||||
|
if not entry.enabled or not entry.term:
|
||||||
|
continue
|
||||||
|
pattern = re.compile(rf"\b{re.escape(entry.term)}\b")
|
||||||
|
text = pattern.sub(entry.replacement, text)
|
||||||
|
return text
|
||||||
|
|
||||||
|
|
||||||
|
# Le prompt systeme est editable dans les reglages (settings.prompt_pronunciation).
|
||||||
|
|
||||||
|
|
||||||
|
def propose_pronunciations(text: str, gemma: Gemma, *, max_chars: int = 16000) -> list[PronunciationEntry]:
|
||||||
|
"""Propose des candidats de prononciation a valider."""
|
||||||
|
sample = text[:max_chars]
|
||||||
|
prompt = (
|
||||||
|
"Repere dans cet extrait les mots a risque de mauvaise prononciation par "
|
||||||
|
"une voix de synthese francaise. Pour chacun, propose une graphie "
|
||||||
|
"phonetique francaise (replacement) qui guide la prononciation.\n\n"
|
||||||
|
f"EXTRAIT:\n{sample}\n\n"
|
||||||
|
'Reponds par un tableau JSON: '
|
||||||
|
'[{"term":"Tiamat","replacement":"Tia-matt","note":"nom propre"}]'
|
||||||
|
)
|
||||||
|
result = gemma.generate_json(prompt, system=get_settings().prompt_pronunciation)
|
||||||
|
entries: list[PronunciationEntry] = []
|
||||||
|
for item in result:
|
||||||
|
if isinstance(item, dict) and item.get("term") and item.get("replacement"):
|
||||||
|
entries.append(PronunciationEntry(
|
||||||
|
term=str(item["term"]).strip(),
|
||||||
|
replacement=str(item["replacement"]).strip(),
|
||||||
|
note=item.get("note"),
|
||||||
|
))
|
||||||
|
return entries
|
||||||
|
|
||||||
|
|
||||||
|
def merge_pronunciations(
|
||||||
|
existing: Pronunciation, new: Iterable[PronunciationEntry]
|
||||||
|
) -> Pronunciation:
|
||||||
|
by_term = {e.term.lower(): e for e in existing.entries}
|
||||||
|
for e in new:
|
||||||
|
by_term.setdefault(e.term.lower(), e)
|
||||||
|
return Pronunciation(entries=list(by_term.values()))
|
||||||
622
backend/inkflow/analysis/segmenter.py
Normal file
622
backend/inkflow/analysis/segmenter.py
Normal file
@@ -0,0 +1,622 @@
|
|||||||
|
"""Segmentation narration/dialogue + attribution de locuteur + casting.
|
||||||
|
|
||||||
|
Approche hybride :
|
||||||
|
1. Pre-segmentation deterministe au niveau paragraphe (regles de ponctuation
|
||||||
|
francaise : un paragraphe commencant par un cadratin "—" est une replique).
|
||||||
|
2. Gemma attribue un locuteur a chaque replique, en un seul appel par chapitre
|
||||||
|
(liste numerotee + contexte), et extrait le casting (personnages + attributs).
|
||||||
|
|
||||||
|
Le decoupage fin des incises ("..., dit-il") est laisse a une passe ulterieure ;
|
||||||
|
en v1 la replique entiere est portee par la voix du personnage.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import re
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from ..models import (
|
||||||
|
Cast,
|
||||||
|
Chapter,
|
||||||
|
ChapterAnalysis,
|
||||||
|
ChapterText,
|
||||||
|
Character,
|
||||||
|
Incise,
|
||||||
|
Segment,
|
||||||
|
SegmentType,
|
||||||
|
)
|
||||||
|
from ..settings import get_settings
|
||||||
|
from .gemma import Gemma
|
||||||
|
|
||||||
|
# Un paragraphe de dialogue commence par un cadratin (U+2014) ou un tiret long.
|
||||||
|
_DIALOGUE_LEAD_RE = re.compile(r"^\s*[—―]\s*")
|
||||||
|
|
||||||
|
# --- Detection des incises (inversion verbe-sujet francaise) ------------------
|
||||||
|
# Une incise est un groupe de narration insere dans une replique ("..., dit-il.").
|
||||||
|
# On exclut tu/nous/vous (imperatifs "Donne-le-moi", "Crois-tu ?") pour limiter
|
||||||
|
# les faux positifs. Voir `detect_incises` plus bas pour les deux passes
|
||||||
|
# (inversion verbe-pronom + nominale "lanca Drummer", conscience du casting).
|
||||||
|
_INCISE_PRON = r"(?:il|elle|on|ils|elles|je)"
|
||||||
|
# Verbe de parole, eventuellement reflechi ("s'ecria", "s'exclama").
|
||||||
|
_INCISE_VERB = r"(?:[A-Za-zÀ-ÿ]+['’])?[A-Za-zÀ-ÿ]{2,}"
|
||||||
|
|
||||||
|
|
||||||
|
def segment_chapter_text(ct: ChapterText) -> list[Segment]:
|
||||||
|
"""Decoupe un chapitre en segments narration/dialogue (regles seules)."""
|
||||||
|
segments: list[Segment] = []
|
||||||
|
for para in ct.paragraphs:
|
||||||
|
if _DIALOGUE_LEAD_RE.match(para):
|
||||||
|
text = _DIALOGUE_LEAD_RE.sub("", para).strip()
|
||||||
|
segments.append(Segment(
|
||||||
|
type=SegmentType.DIALOGUE, text=text, speaker="?"))
|
||||||
|
else:
|
||||||
|
segments.append(Segment(
|
||||||
|
type=SegmentType.NARRATION, text=para, speaker="narrateur"))
|
||||||
|
return segments
|
||||||
|
|
||||||
|
|
||||||
|
# --- Attribution des locuteurs (Gemma) --------------------------------------
|
||||||
|
# Le prompt systeme est editable dans les reglages (settings.prompt_speakers).
|
||||||
|
|
||||||
|
|
||||||
|
_UNKNOWN = {"", "?", "inconnu", "narrateur"}
|
||||||
|
_CTX_CHARS = 160 # troncature du contexte narratif avant/apres
|
||||||
|
_CHUNK_MAX_DIALOGUES = 30 # repliques par appel (fiabilite du modele)
|
||||||
|
|
||||||
|
|
||||||
|
def attribute_speakers(
|
||||||
|
segments: list[Segment],
|
||||||
|
gemma: Gemma,
|
||||||
|
*,
|
||||||
|
characters: Optional[list[Character]] = None,
|
||||||
|
pov: Optional[str] = None,
|
||||||
|
) -> dict[int, str]:
|
||||||
|
"""Renseigne `speaker` pour chaque dialogue (mutation en place).
|
||||||
|
|
||||||
|
Fournit au modele la liste canonique enrichie des personnages (nom, genre,
|
||||||
|
description) et, pour chaque replique, le contexte narratif AVANT et APRES
|
||||||
|
(l'incise d'attribution est souvent placee apres : "— Bonjour. dit Marie.").
|
||||||
|
|
||||||
|
Renvoie une map {index_de_segment: confidence} ("high"/"medium"/"low"),
|
||||||
|
conservee en memoire (non persistee) pour piloter la 2e passe retroactive.
|
||||||
|
Une replique dont le nom rendu sort de la liste fournie est gardee mais
|
||||||
|
marquee "low" afin d'etre reexaminee.
|
||||||
|
"""
|
||||||
|
dialogues = [(i, s) for i, s in enumerate(segments)
|
||||||
|
if s.type is SegmentType.DIALOGUE]
|
||||||
|
if not dialogues:
|
||||||
|
return {}
|
||||||
|
|
||||||
|
# Repliques deja resolues (seed par incise) : montrees comme contexte fixe,
|
||||||
|
# jamais re-demandees au modele. Si tout est resolu, rien a faire.
|
||||||
|
locked = {i for i, s in dialogues if _is_resolved(s.speaker)}
|
||||||
|
if len(locked) == len(dialogues):
|
||||||
|
return {i: "high" for i, _ in dialogues}
|
||||||
|
|
||||||
|
hint = _speakers_hint(characters, pov)
|
||||||
|
valid = {c.name.strip().lower() for c in (characters or [])}
|
||||||
|
confidence: dict[int, str] = {}
|
||||||
|
|
||||||
|
for chunk in _chunk_dialogues(dialogues, segments, hint):
|
||||||
|
prompt = (
|
||||||
|
"Voici les repliques de dialogue d'un extrait, numerotees, avec la "
|
||||||
|
"narration qui precede et qui suit chaque replique. Les repliques "
|
||||||
|
"deja attribuees affichent (locuteur: X) : ne les modifie pas, "
|
||||||
|
"sers-t'en comme contexte (alternance des tours). Pour les AUTRES, "
|
||||||
|
"indique le personnage qui parle (recopie son nom depuis la liste "
|
||||||
|
"fournie ; 'inconnu' si vraiment indeterminable) et ta confiance "
|
||||||
|
"(high/medium/low)."
|
||||||
|
f"{hint}\n\n" + "\n".join(line for _, line in chunk) +
|
||||||
|
'\n\nReponds par un tableau JSON: '
|
||||||
|
'[{"i": 0, "speaker": "Holden", "confidence": "high"}, ...]'
|
||||||
|
)
|
||||||
|
result = gemma.generate_json(prompt, system=get_settings().prompt_speakers)
|
||||||
|
by_i: dict[int, dict] = {item["i"]: item for item in result
|
||||||
|
if isinstance(item, dict) and "i" in item}
|
||||||
|
for j, (seg_idx, _line) in enumerate(chunk):
|
||||||
|
if seg_idx in locked: # seed conserve
|
||||||
|
confidence[seg_idx] = "high"
|
||||||
|
continue
|
||||||
|
seg = segments[seg_idx]
|
||||||
|
item = by_i.get(j) or {}
|
||||||
|
speaker = (str(item.get("speaker") or "inconnu").strip()
|
||||||
|
or "inconnu")
|
||||||
|
conf = str(item.get("confidence") or "low").strip().lower()
|
||||||
|
if conf not in {"high", "medium", "low"}:
|
||||||
|
conf = "low"
|
||||||
|
# Nom hors liste connue -> on garde le nom mais on le rejuge.
|
||||||
|
if (valid and speaker.lower() not in _UNKNOWN
|
||||||
|
and speaker.lower() not in valid):
|
||||||
|
conf = "low"
|
||||||
|
seg.speaker = speaker
|
||||||
|
confidence[seg_idx] = conf
|
||||||
|
return confidence
|
||||||
|
|
||||||
|
|
||||||
|
def _speakers_hint(characters: Optional[list[Character]], pov: Optional[str]) -> str:
|
||||||
|
hint = ""
|
||||||
|
if characters:
|
||||||
|
lines = []
|
||||||
|
for c in characters:
|
||||||
|
attrs = c.gender or ""
|
||||||
|
desc = f" — {c.description}" if c.description else ""
|
||||||
|
lines.append(f"- {c.name}" + (f" ({attrs})" if attrs else "") + desc)
|
||||||
|
hint += "\nPersonnages du chapitre:\n" + "\n".join(lines)
|
||||||
|
if pov:
|
||||||
|
hint += f"\nLe point de vue de ce chapitre est: {pov}."
|
||||||
|
return hint
|
||||||
|
|
||||||
|
|
||||||
|
def _is_resolved(speaker: str) -> bool:
|
||||||
|
"""Vrai si la replique a deja un locuteur sur (seed incise, etc.)."""
|
||||||
|
return (speaker or "").strip().lower() not in _UNKNOWN
|
||||||
|
|
||||||
|
|
||||||
|
def _dialogue_line(n: int, segments: list[Segment], idx: int) -> str:
|
||||||
|
seg = segments[idx]
|
||||||
|
# Replique deja resolue (ex: seed par incise) -> montree comme contexte fixe.
|
||||||
|
if _is_resolved(seg.speaker):
|
||||||
|
return f"[{n}] (locuteur: {seg.speaker}) REPLIQUE: {seg.text!r}"
|
||||||
|
before = _adjacent_narration(segments, idx, -1)
|
||||||
|
after = _adjacent_narration(segments, idx, +1)
|
||||||
|
parts = [f"[{n}]"]
|
||||||
|
if before:
|
||||||
|
parts.append(f"(avant: {before!r})")
|
||||||
|
parts.append(f"REPLIQUE: {seg.text!r}")
|
||||||
|
if after:
|
||||||
|
parts.append(f"(apres: {after!r})")
|
||||||
|
return " ".join(parts)
|
||||||
|
|
||||||
|
|
||||||
|
def _adjacent_narration(segments: list[Segment], idx: int, direction: int) -> str:
|
||||||
|
"""Texte de la narration immediatement adjacente (incise d'attribution)."""
|
||||||
|
j = idx + direction
|
||||||
|
if 0 <= j < len(segments) and segments[j].type is SegmentType.NARRATION:
|
||||||
|
return segments[j].text[:_CTX_CHARS]
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def _chunk_dialogues(
|
||||||
|
dialogues: list[tuple[int, Segment]],
|
||||||
|
segments: list[Segment],
|
||||||
|
hint: str,
|
||||||
|
) -> list[list[tuple[int, str]]]:
|
||||||
|
"""Decoupe les repliques en lots tenant sous `_MAX_PROMPT_CHARS`.
|
||||||
|
|
||||||
|
Chaque lot est une liste de (index_segment, ligne_rendue) ; la ligne est
|
||||||
|
numerotee localement (0..k) pour le prompt, l'index segment sert au mapping
|
||||||
|
retour. Evite la troncature brutale sur les longs chapitres.
|
||||||
|
"""
|
||||||
|
budget = _MAX_PROMPT_CHARS - len(hint) - 400 # marge pour les consignes
|
||||||
|
chunks: list[list[tuple[int, str]]] = []
|
||||||
|
current: list[tuple[int, str]] = []
|
||||||
|
size = 0
|
||||||
|
for idx, _seg in dialogues:
|
||||||
|
line = _dialogue_line(len(current), segments, idx)
|
||||||
|
if current and (size + len(line) > budget
|
||||||
|
or len(current) >= _CHUNK_MAX_DIALOGUES):
|
||||||
|
chunks.append(current)
|
||||||
|
current = []
|
||||||
|
size = 0
|
||||||
|
line = _dialogue_line(0, segments, idx)
|
||||||
|
current.append((idx, line))
|
||||||
|
size += len(line) + 1
|
||||||
|
if current:
|
||||||
|
chunks.append(current)
|
||||||
|
return chunks
|
||||||
|
|
||||||
|
|
||||||
|
# --- Passe retroactive : re-resolution des repliques indeterminees ----------
|
||||||
|
# Le prompt systeme est editable (settings.prompt_speakers_refine).
|
||||||
|
|
||||||
|
|
||||||
|
def _refine_unknown_speakers(
|
||||||
|
segments: list[Segment],
|
||||||
|
gemma: Gemma,
|
||||||
|
*,
|
||||||
|
characters: Optional[list[Character]] = None,
|
||||||
|
confidence: dict[int, str],
|
||||||
|
) -> None:
|
||||||
|
"""2e passe : re-resout les repliques restees indeterminees/peu sures.
|
||||||
|
|
||||||
|
Chaque replique douteuse est presentee avec ses voisines de dialogue DEJA
|
||||||
|
identifiees (alternance des tours) et son contexte narratif, pour exploiter
|
||||||
|
l'information venant des repliques *suivantes*. Mutation en place ; aucun
|
||||||
|
appel Gemma si rien n'est douteux.
|
||||||
|
"""
|
||||||
|
dialogues = [(i, s) for i, s in enumerate(segments)
|
||||||
|
if s.type is SegmentType.DIALOGUE]
|
||||||
|
if not dialogues:
|
||||||
|
return
|
||||||
|
pos = {seg_idx: n for n, (seg_idx, _s) in enumerate(dialogues)}
|
||||||
|
doubtful = [seg_idx for seg_idx, _s in dialogues
|
||||||
|
if segments[seg_idx].speaker.strip().lower() in _UNKNOWN
|
||||||
|
or confidence.get(seg_idx) == "low"]
|
||||||
|
if not doubtful:
|
||||||
|
return
|
||||||
|
|
||||||
|
hint = _speakers_hint(characters, pov=None)
|
||||||
|
lines = []
|
||||||
|
for j, seg_idx in enumerate(doubtful):
|
||||||
|
n = pos[seg_idx]
|
||||||
|
ctx = []
|
||||||
|
if n > 0:
|
||||||
|
prev_idx = dialogues[n - 1][0]
|
||||||
|
ctx.append(f"replique precedente (dite par "
|
||||||
|
f"{segments[prev_idx].speaker}): "
|
||||||
|
f"{segments[prev_idx].text[:_CTX_CHARS]!r}")
|
||||||
|
before = _adjacent_narration(segments, seg_idx, -1)
|
||||||
|
if before:
|
||||||
|
ctx.append(f"narration avant: {before!r}")
|
||||||
|
after = _adjacent_narration(segments, seg_idx, +1)
|
||||||
|
if after:
|
||||||
|
ctx.append(f"narration apres: {after!r}")
|
||||||
|
if n < len(dialogues) - 1:
|
||||||
|
next_idx = dialogues[n + 1][0]
|
||||||
|
ctx.append(f"replique suivante (dite par "
|
||||||
|
f"{segments[next_idx].speaker}): "
|
||||||
|
f"{segments[next_idx].text[:_CTX_CHARS]!r}")
|
||||||
|
ctx_str = (" [" + " ; ".join(ctx) + "]") if ctx else ""
|
||||||
|
lines.append(f"[{j}]{ctx_str} REPLIQUE: {segments[seg_idx].text!r}")
|
||||||
|
|
||||||
|
prompt = (
|
||||||
|
"Repliques au locuteur indetermine. Pour chacune, en t'appuyant sur les "
|
||||||
|
"repliques voisines DEJA attribuees (alternance des tours) et le "
|
||||||
|
"contexte, indique qui parle (recopie le nom depuis la liste ; "
|
||||||
|
"'inconnu' si toujours indeterminable)."
|
||||||
|
f"{hint}\n\n" + "\n".join(lines) +
|
||||||
|
'\n\nReponds par un tableau JSON: [{"i": 0, "speaker": "Holden"}, ...]'
|
||||||
|
)
|
||||||
|
result = gemma.generate_json(_truncate(prompt),
|
||||||
|
system=get_settings().prompt_speakers_refine)
|
||||||
|
by_i = {item["i"]: item.get("speaker") for item in result
|
||||||
|
if isinstance(item, dict) and "i" in item}
|
||||||
|
for j, seg_idx in enumerate(doubtful):
|
||||||
|
new = (str(by_i.get(j) or "").strip())
|
||||||
|
if new and new.lower() not in _UNKNOWN:
|
||||||
|
segments[seg_idx].speaker = new
|
||||||
|
|
||||||
|
|
||||||
|
# --- Extraction du casting (Gemma) ------------------------------------------
|
||||||
|
# Le prompt systeme est editable dans les reglages (settings.prompt_characters).
|
||||||
|
|
||||||
|
|
||||||
|
def extract_characters(text: str, gemma: Gemma) -> list[Character]:
|
||||||
|
"""Extrait les personnages et leurs attributs (genre, age) d'un texte."""
|
||||||
|
prompt = (
|
||||||
|
"A partir de l'extrait suivant, liste les personnages qui parlent ou "
|
||||||
|
"sont nommes. Pour chacun, donne: name (nom court canonique), gender "
|
||||||
|
"(male/female/unknown), age (child/young/adult/old/unknown), et une "
|
||||||
|
"courte description. Ignore les figurants sans nom.\n\n"
|
||||||
|
f"EXTRAIT:\n{_truncate(text)}\n\n"
|
||||||
|
'Reponds par un tableau JSON: '
|
||||||
|
'[{"name":"Holden","gender":"male","age":"adult","description":"..."}]'
|
||||||
|
)
|
||||||
|
result = gemma.generate_json(prompt, system=get_settings().prompt_characters)
|
||||||
|
characters: list[Character] = []
|
||||||
|
for item in result:
|
||||||
|
if not isinstance(item, dict) or not item.get("name"):
|
||||||
|
continue
|
||||||
|
characters.append(Character(
|
||||||
|
name=str(item["name"]).strip(),
|
||||||
|
gender=_norm(item.get("gender")),
|
||||||
|
age=_norm(item.get("age")),
|
||||||
|
description=(item.get("description") or None),
|
||||||
|
))
|
||||||
|
return characters
|
||||||
|
|
||||||
|
|
||||||
|
def merge_characters(existing: list[Character], new: list[Character]) -> list[Character]:
|
||||||
|
"""Fusionne deux listes de personnages par nom (insensible a la casse)."""
|
||||||
|
by_key = {c.name.lower(): c for c in existing}
|
||||||
|
for c in new:
|
||||||
|
key = c.name.lower()
|
||||||
|
if key in by_key:
|
||||||
|
cur = by_key[key]
|
||||||
|
cur.gender = cur.gender or c.gender
|
||||||
|
cur.age = cur.age or c.age
|
||||||
|
cur.description = cur.description or c.description
|
||||||
|
else:
|
||||||
|
by_key[key] = c
|
||||||
|
return list(by_key.values())
|
||||||
|
|
||||||
|
|
||||||
|
def _norm(value) -> Optional[str]:
|
||||||
|
if not value:
|
||||||
|
return None
|
||||||
|
v = str(value).strip().lower()
|
||||||
|
return v if v and v != "unknown" else None
|
||||||
|
|
||||||
|
|
||||||
|
# --- Helpers -----------------------------------------------------------------
|
||||||
|
|
||||||
|
# Garde-fou de contexte (caracteres) pour rester dans une fenetre raisonnable.
|
||||||
|
_MAX_PROMPT_CHARS = 24000
|
||||||
|
|
||||||
|
|
||||||
|
def _truncate(text: str) -> str:
|
||||||
|
return text if len(text) <= _MAX_PROMPT_CHARS else text[:_MAX_PROMPT_CHARS]
|
||||||
|
|
||||||
|
|
||||||
|
# --- Detection des incises (deterministe, conscience du casting) -------------
|
||||||
|
# Les incises sont annotees par des bornes (offsets) sur la replique persistee
|
||||||
|
# (non destructif) ; le rendu les fait porter par la voix du narrateur. Deux
|
||||||
|
# passes complementaires :
|
||||||
|
# 1. inversion verbe-pronom ("dit-il", "coupa-t-elle") ;
|
||||||
|
# 2. nominale : verbe de parole + sujet connu (nom du casting OU nom de role,
|
||||||
|
# ex: "compatit Holden", "lanca Drummer", "informa le soldat").
|
||||||
|
# La passe nominale s'appuie sur la liste des personnages -> peu de faux positifs
|
||||||
|
# et permet d'extraire le locuteur explicite (seeding de l'attribution).
|
||||||
|
|
||||||
|
# Pronom objet eventuel devant le verbe ("lui demanda un garde").
|
||||||
|
_CLITIC = r"(?:lui|leur|nous|vous|me|te|se|y|en|[mts]['’])"
|
||||||
|
|
||||||
|
# Formes conjuguees de verbes de parole (3e pers., passe simple / present /
|
||||||
|
# imparfait). Liste curee : on prefere rater une incise que d'en inventer une.
|
||||||
|
_SPEECH_VERBS = {
|
||||||
|
"dit", "disait", "redit", "répondit", "repondit", "répond", "repond",
|
||||||
|
"répondait", "repondait", "demanda", "demandait", "demande", "interrogea",
|
||||||
|
"questionna", "ecria", "écria", "exclama", "enquit", "lança", "lanca",
|
||||||
|
"lançait", "lance", "murmura", "chuchota", "souffla", "soupira", "ajouta",
|
||||||
|
"ajoute", "reprit", "poursuivit", "poursuit", "continua", "enchaîna",
|
||||||
|
"enchaina", "fit", "faisait", "remarqua", "observa", "nota", "déclara",
|
||||||
|
"declara", "affirma", "assura", "rétorqua", "retorqua", "répliqua",
|
||||||
|
"repliqua", "riposta", "objecta", "protesta", "insista", "renchérit",
|
||||||
|
"rencherit", "acquiesça", "acquiesca", "admit", "avoua", "convint",
|
||||||
|
"concéda", "conceda", "rectifia", "corrigea", "précisa", "precisa",
|
||||||
|
"expliqua", "raconta", "annonça", "annonca", "proclama", "ordonna",
|
||||||
|
"commanda", "supplia", "implora", "gémit", "gemit", "grogna", "ronchonna",
|
||||||
|
"maugréa", "maugrea", "marmonna", "glissa", "lâcha", "lacha", "coupa",
|
||||||
|
"interrompit", "conclut", "compléta", "completa", "suggéra", "suggera",
|
||||||
|
"proposa", "promit", "jura", "menaça", "menaca", "ironisa", "plaisanta",
|
||||||
|
"railla", "cria", "hurla", "tonna", "gronda", "rugit", "susurra",
|
||||||
|
"compatit", "salua", "appela", "héla", "hela", "interpella", "balbutia",
|
||||||
|
"bredouilla", "bafouilla", "gloussa", "ricana", "siffla", "tempêta",
|
||||||
|
"tempeta", "rétorque", "lâche", "informa", "renseigna", "indiqua",
|
||||||
|
"rappela", "avertit", "prévint", "prevint", "intima", "rétorquait",
|
||||||
|
"lançait", "questionnait", "reconnut", "constata", "répéta", "repeta",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Noms de role pouvant etre sujet d'une incise ("informa le soldat").
|
||||||
|
_ROLE_NOUNS = {
|
||||||
|
"garde", "soldat", "sentinelle", "gardien", "prêtre", "pretre", "homme",
|
||||||
|
"femme", "fille", "garçon", "garcon", "vieille", "vieillard", "capitaine",
|
||||||
|
"lieutenant", "sergent", "général", "general", "amiral", "officier", "voix",
|
||||||
|
"inconnu", "inconnue", "étranger", "etranger", "enfant", "serviteur",
|
||||||
|
"servante", "messager", "domestique", "médecin", "medecin",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Mots vides ignores quand on indexe les tokens d'un nom de personnage.
|
||||||
|
_NAME_STOP = {
|
||||||
|
"le", "la", "les", "un", "une", "de", "du", "des", "monsieur", "madame",
|
||||||
|
"mademoiselle", "m", "mme", "mlle", "mr", "dr", "docteur", "saint", "sainte",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Ponctuations qui terminent la partie parlee : si l'incise les suit, tout le
|
||||||
|
# reste de la replique est de la narration (la parole est finie). Apres une
|
||||||
|
# simple virgule au contraire, le dialogue reprend apres l'incise.
|
||||||
|
_SENTENCE_FINAL = {"", ".", "!", "?", "…"}
|
||||||
|
|
||||||
|
|
||||||
|
def _incise_end(text: str, close_end: int, lead: str) -> int:
|
||||||
|
"""Fin effective de l'incise : jusqu'au bout de la replique si la parole
|
||||||
|
etait deja close a gauche (`.`/`!`/`?`/`…` ou debut), sinon la cloture."""
|
||||||
|
return len(text) if lead in _SENTENCE_FINAL else close_end
|
||||||
|
|
||||||
|
|
||||||
|
# Passe 1 : inversion verbe-(t-)pronom, ancree sur une ponctuation a gauche
|
||||||
|
# (virgule, point, ?, !, …) ou le debut de la replique.
|
||||||
|
_INVERSION_RE = re.compile(
|
||||||
|
r"(?P<lead>[,.!?…]|^)\s*"
|
||||||
|
r"(?P<inc>" + _INCISE_VERB + r"-(?:t-)?" + _INCISE_PRON +
|
||||||
|
r"(?:\s+[^.!?…»\",;]*?)?)" # complements eventuels ("dit-il en souriant")
|
||||||
|
r"(?P<close>[.!?…,])", # cloture : ponctuation forte OU virgule
|
||||||
|
re.IGNORECASE,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _inversion_spans(text: str) -> list[tuple[int, int]]:
|
||||||
|
return [(m.start("inc"), _incise_end(text, m.end("close"), m.group("lead")))
|
||||||
|
for m in _INVERSION_RE.finditer(text)]
|
||||||
|
|
||||||
|
|
||||||
|
def _name_token_index(names) -> dict[str, str]:
|
||||||
|
"""Index token -> nom canonique (tokens distinctifs uniquement).
|
||||||
|
|
||||||
|
Un token partage par plusieurs personnages est ambigu et ecarte.
|
||||||
|
"""
|
||||||
|
idx: dict[str, str] = {}
|
||||||
|
ambiguous: set[str] = set()
|
||||||
|
for name in names or ():
|
||||||
|
for tok in re.split(r"[^\wÀ-ÿ]+", name):
|
||||||
|
t = tok.lower()
|
||||||
|
if len(t) < 2 or t in _NAME_STOP:
|
||||||
|
continue
|
||||||
|
if t in idx and idx[t] != name:
|
||||||
|
ambiguous.add(t)
|
||||||
|
else:
|
||||||
|
idx[t] = name
|
||||||
|
for t in ambiguous:
|
||||||
|
idx.pop(t, None)
|
||||||
|
return idx
|
||||||
|
|
||||||
|
|
||||||
|
# Nom propre : initiale majuscule (motif sensible a la casse).
|
||||||
|
_PROPER = r"[A-ZÀ-Ÿ][\wÀ-ÿ’'\-]+"
|
||||||
|
_REJECT = object() # le sujet n'en est pas un -> pas une incise
|
||||||
|
|
||||||
|
|
||||||
|
def _classify_subject(subj: str, idx: dict[str, str]):
|
||||||
|
"""Locuteur porte par le sujet d'une incise nominale.
|
||||||
|
|
||||||
|
- personnage connu -> nom canonique ;
|
||||||
|
- nom propre (capitalise) inconnu -> nom de surface (seed quand meme : le
|
||||||
|
texte le nomme, independamment de la fiabilite de l'extraction) ;
|
||||||
|
- nom de role generique ("le soldat") -> None (incise reelle, pas de seed) ;
|
||||||
|
- mot quelconque -> _REJECT (pas une incise).
|
||||||
|
"""
|
||||||
|
low = subj.lower()
|
||||||
|
if low in idx:
|
||||||
|
return idx[low]
|
||||||
|
if low in _ROLE_NOUNS:
|
||||||
|
return None
|
||||||
|
if subj[:1].isupper() and len(low) >= 2 and low not in _NAME_STOP:
|
||||||
|
return subj.strip("’'")
|
||||||
|
return _REJECT
|
||||||
|
|
||||||
|
|
||||||
|
def _nominal_matches(text: str, names) -> list[tuple[int, int, Optional[str]]]:
|
||||||
|
"""Passe 2 : (start, end, locuteur) pour chaque incise nominale.
|
||||||
|
|
||||||
|
Une incise nominale = verbe de parole + sujet (nom du casting, nom propre,
|
||||||
|
ou nom de role). Le sujet nom propre est seede meme absent du casting.
|
||||||
|
"""
|
||||||
|
idx = _name_token_index(names)
|
||||||
|
literals = sorted(set(idx) | _ROLE_NOUNS, key=len, reverse=True)
|
||||||
|
lit_alt = "|".join(re.escape(s) for s in literals)
|
||||||
|
# Sujet : nom connu/role (insensible casse) OU nom propre (capitalise, sensible
|
||||||
|
# casse pour ne pas happer un determiner "un"/"le"). Pas d'IGNORECASE global.
|
||||||
|
subj_alt = (f"(?i:{lit_alt})|{_PROPER}") if lit_alt else _PROPER
|
||||||
|
verbs = "|".join(re.escape(v) for v in sorted(_SPEECH_VERBS, key=len, reverse=True))
|
||||||
|
pat = re.compile(
|
||||||
|
r"(?P<lead>[,.!?…]|^)\s*"
|
||||||
|
r"(?P<inc>(?:(?i:" + _CLITIC + r")\s+)?"
|
||||||
|
r"(?i:" + verbs + r")\b"
|
||||||
|
r"[^.!?…»\",;]{0,40}?\b"
|
||||||
|
r"(?P<subj>" + subj_alt + r")\b"
|
||||||
|
r"[^.!?…»\",;]*?)"
|
||||||
|
r"(?P<close>[.!?…,])",
|
||||||
|
)
|
||||||
|
out: list[tuple[int, int, Optional[str]]] = []
|
||||||
|
for m in pat.finditer(text):
|
||||||
|
spk = _classify_subject(m.group("subj"), idx)
|
||||||
|
if spk is _REJECT:
|
||||||
|
continue
|
||||||
|
out.append((m.start("inc"),
|
||||||
|
_incise_end(text, m.end("close"), m.group("lead")), spk))
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
def _merge_spans(spans: list[tuple[int, int]]) -> list[Incise]:
|
||||||
|
"""Trie et fusionne (sans chevauchement) une liste de bornes -> Incise."""
|
||||||
|
out: list[Incise] = []
|
||||||
|
last_end = -1
|
||||||
|
for s, e in sorted(set(spans)):
|
||||||
|
if s < last_end: # chevauchement -> on garde le premier vu
|
||||||
|
continue
|
||||||
|
out.append(Incise(start=s, end=e))
|
||||||
|
last_end = e
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
def detect_incises(text: str, *, names=None) -> list[Incise]:
|
||||||
|
"""Bornes des incises dans une replique (inversion + nominale cast-aware)."""
|
||||||
|
spans = _inversion_spans(text)
|
||||||
|
spans += [(s, e) for s, e, _ in _nominal_matches(text, names or set())]
|
||||||
|
return _merge_spans(spans)
|
||||||
|
|
||||||
|
|
||||||
|
def incise_speaker(text: str, incise: Incise, names) -> Optional[str]:
|
||||||
|
"""Locuteur explicite porte par une incise nominale ("compatit Holden")."""
|
||||||
|
for s, e, spk in _nominal_matches(text, names):
|
||||||
|
if s == incise.start and e == incise.end:
|
||||||
|
return spk
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def iter_incise_pieces(
|
||||||
|
text: str, incises: list[Incise]
|
||||||
|
) -> list[tuple[bool, str]]:
|
||||||
|
"""Decoupe `text` en morceaux (is_incise, sous_texte) via les bornes.
|
||||||
|
|
||||||
|
Utilise au rendu : pieces dialogue -> voix du personnage, pieces incise ->
|
||||||
|
voix du narrateur. Texte conserve modulo espaces de bordure.
|
||||||
|
"""
|
||||||
|
pieces: list[tuple[bool, str]] = []
|
||||||
|
cursor = 0
|
||||||
|
for inc in sorted(incises, key=lambda i: i.start):
|
||||||
|
if inc.start < cursor: # garde-fou chevauchement
|
||||||
|
continue
|
||||||
|
before = text[cursor:inc.start]
|
||||||
|
if before.strip():
|
||||||
|
pieces.append((False, before.strip()))
|
||||||
|
body = text[inc.start:inc.end]
|
||||||
|
if body.strip():
|
||||||
|
pieces.append((True, body.strip()))
|
||||||
|
cursor = inc.end
|
||||||
|
tail = text[cursor:]
|
||||||
|
if tail.strip():
|
||||||
|
pieces.append((False, tail.strip()))
|
||||||
|
return pieces
|
||||||
|
|
||||||
|
|
||||||
|
def analyze_chapter(
|
||||||
|
chapter: Chapter,
|
||||||
|
ct: ChapterText,
|
||||||
|
gemma: Gemma,
|
||||||
|
*,
|
||||||
|
book_chars: Optional[list[Character]] = None,
|
||||||
|
dedup_gemma: Optional[Gemma] = None,
|
||||||
|
) -> tuple[ChapterAnalysis, list[Character]]:
|
||||||
|
"""Analyse complete d'un chapitre.
|
||||||
|
|
||||||
|
Sequence : segmentation -> extraction des personnages -> reconciliation
|
||||||
|
(dedup contre le cast cumule du livre) -> annotation des incises + seeding
|
||||||
|
du locuteur explicite -> attribution LLM des repliques restantes -> passe
|
||||||
|
retroactive. Les repliques sont persistees entieres (incises = bornes).
|
||||||
|
|
||||||
|
`book_chars` : cast cumule du livre (personnages canoniques deja connus).
|
||||||
|
`dedup_gemma` : si fourni, tranche les cas de dedup ambigus.
|
||||||
|
|
||||||
|
Renvoie (analyse, cast cumule mis a jour) ; le 2e element est l'ensemble du
|
||||||
|
casting du livre reconcilie, pret a etre persiste tel quel.
|
||||||
|
"""
|
||||||
|
from ..casting.dedup import reconcile_characters
|
||||||
|
|
||||||
|
segments = segment_chapter_text(ct)
|
||||||
|
full_text = "\n".join(ct.paragraphs)
|
||||||
|
found = extract_characters(full_text, gemma)
|
||||||
|
|
||||||
|
# Dedup AVANT l'attribution : le modele recevra des noms canoniques.
|
||||||
|
chars, name_map = reconcile_characters(book_chars or [], found, dedup_gemma)
|
||||||
|
|
||||||
|
# Liste canonique restreinte a ce chapitre (personnages detectes + POV).
|
||||||
|
chapter_canon = {(name_map.get(c.name.strip().lower()) or c.name).strip().lower()
|
||||||
|
for c in found}
|
||||||
|
chapter_chars = [c for c in chars if c.name.strip().lower() in chapter_canon]
|
||||||
|
if chapter.pov:
|
||||||
|
pv = chapter.pov.strip().lower()
|
||||||
|
for c in chars:
|
||||||
|
if (c not in chapter_chars and
|
||||||
|
(pv in c.name.lower()
|
||||||
|
or any(pv in a.lower() for a in c.aliases))):
|
||||||
|
chapter_chars.append(c)
|
||||||
|
|
||||||
|
# Annotation deterministe des incises (bornes, non destructif) + seeding :
|
||||||
|
# une incise nominale qui nomme un personnage fixe le locuteur avec certitude
|
||||||
|
# AVANT l'appel LLM (corrige les cas que le petit modele rate).
|
||||||
|
names = {c.name for c in chars}
|
||||||
|
for seg in segments:
|
||||||
|
if seg.type is not SegmentType.DIALOGUE:
|
||||||
|
continue
|
||||||
|
seg.incises = detect_incises(seg.text, names=names)
|
||||||
|
for inc in seg.incises:
|
||||||
|
spk = incise_speaker(seg.text, inc, names)
|
||||||
|
if spk:
|
||||||
|
seg.speaker = spk
|
||||||
|
break
|
||||||
|
|
||||||
|
conf = attribute_speakers(segments, gemma, characters=chapter_chars,
|
||||||
|
pov=chapter.pov)
|
||||||
|
if get_settings().retro_pass_use_gemma:
|
||||||
|
_refine_unknown_speakers(segments, gemma, characters=chapter_chars,
|
||||||
|
confidence=conf)
|
||||||
|
|
||||||
|
# Absorbe les locuteurs residuels (hors liste) en aliases (heuristique seule).
|
||||||
|
chars, _ = reconcile_characters(
|
||||||
|
chars, [], None, speaker_names=[s.speaker for s in segments])
|
||||||
|
|
||||||
|
# Les repliques sont persistees entieres ; les incises restent des bornes
|
||||||
|
# (rendu : voix narrateur). Plus de fragmentation a l'analyse.
|
||||||
|
analysis = ChapterAnalysis(index=chapter.index, title=ct.title,
|
||||||
|
segments=segments)
|
||||||
|
return analysis, chars
|
||||||
0
backend/inkflow/api/__init__.py
Normal file
0
backend/inkflow/api/__init__.py
Normal file
295
backend/inkflow/api/app.py
Normal file
295
backend/inkflow/api/app.py
Normal file
@@ -0,0 +1,295 @@
|
|||||||
|
"""Application FastAPI : pilote le pipeline et sert l'UI.
|
||||||
|
|
||||||
|
Toutes les routes lourdes (analyse, casting, rendu) sont *enfilees* dans
|
||||||
|
l'orchestrateur et rendent la main immediatement ; l'avancement arrive par
|
||||||
|
WebSocket. Les operations rapides (preview de voix) tournent dans un threadpool.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import io
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import soundfile as sf
|
||||||
|
from fastapi import FastAPI, HTTPException, UploadFile, WebSocket, WebSocketDisconnect
|
||||||
|
from fastapi.middleware.cors import CORSMiddleware
|
||||||
|
from fastapi.responses import FileResponse, Response
|
||||||
|
from fastapi.staticfiles import StaticFiles
|
||||||
|
from pydantic import BaseModel
|
||||||
|
|
||||||
|
from ..config import DATA_DIR, book_data_dir, book_output_dir, ensure_dirs
|
||||||
|
from ..epub.parser import load_book, load_chapter_text, parse_epub
|
||||||
|
from ..models import Cast, ChapterAnalysis, Pronunciation
|
||||||
|
from ..pipeline.orchestrator import load_state, orchestrator
|
||||||
|
from ..settings import Settings, get_settings, save_settings
|
||||||
|
from ..store import artifacts
|
||||||
|
from ..util import slugify
|
||||||
|
from .ws import manager
|
||||||
|
|
||||||
|
app = FastAPI(title="InkFlow API")
|
||||||
|
app.add_middleware(
|
||||||
|
CORSMiddleware, allow_origins=["*"], allow_methods=["*"], allow_headers=["*"],
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@app.on_event("startup")
|
||||||
|
async def _startup() -> None:
|
||||||
|
ensure_dirs()
|
||||||
|
manager.bind_loop(asyncio.get_running_loop())
|
||||||
|
orchestrator.set_broadcaster(manager.broadcast_threadsafe)
|
||||||
|
|
||||||
|
|
||||||
|
# --- Helpers -----------------------------------------------------------------
|
||||||
|
|
||||||
|
def _list_book_slugs() -> list[str]:
|
||||||
|
if not DATA_DIR.exists():
|
||||||
|
return []
|
||||||
|
return sorted(p.parent.name for p in DATA_DIR.glob("*/book.json"))
|
||||||
|
|
||||||
|
|
||||||
|
def _book_summary(slug: str) -> dict:
|
||||||
|
book = load_book(slug)
|
||||||
|
state = load_state(slug)
|
||||||
|
rendered = sum(1 for r in state.render.values() if r.mp3)
|
||||||
|
return {
|
||||||
|
"slug": slug,
|
||||||
|
"title": book.title,
|
||||||
|
"author": book.author,
|
||||||
|
"chapters": len(book.render_chapters),
|
||||||
|
"rendered": rendered,
|
||||||
|
"cover": f"/api/books/{slug}/cover" if book.cover_file else None,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# --- Bibliotheque / upload ---------------------------------------------------
|
||||||
|
|
||||||
|
@app.get("/api/books")
|
||||||
|
def list_books() -> list[dict]:
|
||||||
|
return [_book_summary(s) for s in _list_book_slugs()]
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/api/books")
|
||||||
|
async def upload_book(file: UploadFile) -> dict:
|
||||||
|
ensure_dirs()
|
||||||
|
uploads = DATA_DIR / "_uploads"
|
||||||
|
uploads.mkdir(parents=True, exist_ok=True)
|
||||||
|
dest = uploads / (file.filename or "livre.epub")
|
||||||
|
dest.write_bytes(await file.read())
|
||||||
|
book = await asyncio.to_thread(parse_epub, dest)
|
||||||
|
# Initialise l'etat.
|
||||||
|
load_state(book.slug)
|
||||||
|
return {"slug": book.slug, "title": book.title}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/api/books/{slug}")
|
||||||
|
def get_book(slug: str) -> dict:
|
||||||
|
_require(slug)
|
||||||
|
book = load_book(slug)
|
||||||
|
return {"book": book.model_dump(mode="json"),
|
||||||
|
"state": load_state(slug).model_dump(mode="json")}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/api/books/{slug}/cover")
|
||||||
|
def get_cover(slug: str):
|
||||||
|
book = load_book(slug)
|
||||||
|
if not book.cover_file:
|
||||||
|
raise HTTPException(404, "pas de couverture")
|
||||||
|
return FileResponse(str(book_data_dir(slug) / book.cover_file))
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/api/books/{slug}/chapters/{index}")
|
||||||
|
def get_chapter(slug: str, index: int) -> dict:
|
||||||
|
_require(slug)
|
||||||
|
book = load_book(slug)
|
||||||
|
ch = next((c for c in book.chapters if c.index == index), None)
|
||||||
|
if ch is None:
|
||||||
|
raise HTTPException(404, "chapitre inconnu")
|
||||||
|
out: dict = {"chapter": ch.model_dump(mode="json")}
|
||||||
|
apath = artifacts.analysis_path(slug, index)
|
||||||
|
if apath.exists():
|
||||||
|
out["analysis"] = artifacts.load_analysis(slug, index).model_dump(mode="json")
|
||||||
|
elif ch.text_file:
|
||||||
|
out["text"] = load_chapter_text(slug, ch).model_dump(mode="json")
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
@app.put("/api/books/{slug}/chapters/{index}/analysis")
|
||||||
|
def put_analysis(slug: str, index: int, analysis: ChapterAnalysis) -> dict:
|
||||||
|
_require(slug)
|
||||||
|
if analysis.index != index:
|
||||||
|
raise HTTPException(400, "index incoherent")
|
||||||
|
artifacts.save_analysis(slug, analysis)
|
||||||
|
return {"saved": True}
|
||||||
|
|
||||||
|
|
||||||
|
# --- Etapes (enfilees) -------------------------------------------------------
|
||||||
|
|
||||||
|
class ChaptersBody(BaseModel):
|
||||||
|
chapters: Optional[list[int]] = None
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/api/books/{slug}/analyze")
|
||||||
|
def analyze(slug: str, body: ChaptersBody) -> dict:
|
||||||
|
_require(slug)
|
||||||
|
orchestrator.run_analyze(slug, body.chapters)
|
||||||
|
return {"queued": True}
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/api/books/{slug}/pronounce")
|
||||||
|
def pronounce(slug: str) -> dict:
|
||||||
|
_require(slug)
|
||||||
|
orchestrator.run_pronounce(slug)
|
||||||
|
return {"queued": True}
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/api/books/{slug}/cast/auto")
|
||||||
|
def cast_auto(slug: str) -> dict:
|
||||||
|
_require(slug)
|
||||||
|
orchestrator.run_cast(slug)
|
||||||
|
return {"queued": True}
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/api/books/{slug}/cast/analyze")
|
||||||
|
def cast_analyze(slug: str, body: ChaptersBody) -> dict:
|
||||||
|
"""(Re)analyse le casting d'un/des chapitre(s) avec reconciliation."""
|
||||||
|
_require(slug)
|
||||||
|
orchestrator.run_cast_analyze(slug, body.chapters)
|
||||||
|
return {"queued": True}
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/api/books/{slug}/cast/dedup")
|
||||||
|
def cast_dedup(slug: str) -> dict:
|
||||||
|
"""Deduplique le casting existant (variantes de noms -> aliases)."""
|
||||||
|
_require(slug)
|
||||||
|
orchestrator.run_dedup_cast(slug)
|
||||||
|
return {"queued": True}
|
||||||
|
|
||||||
|
|
||||||
|
class RenderBody(BaseModel):
|
||||||
|
chapters: list[int]
|
||||||
|
backend: Optional[str] = None
|
||||||
|
mono: bool = False
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/api/books/{slug}/render")
|
||||||
|
def render(slug: str, body: RenderBody) -> dict:
|
||||||
|
_require(slug)
|
||||||
|
orchestrator.run_render(slug, body.chapters, backend=body.backend, mono=body.mono)
|
||||||
|
return {"queued": True}
|
||||||
|
|
||||||
|
|
||||||
|
# --- Casting / prononciation (lecture-ecriture directe) ----------------------
|
||||||
|
|
||||||
|
@app.get("/api/books/{slug}/cast")
|
||||||
|
def get_cast(slug: str) -> dict:
|
||||||
|
from ..casting.voicebank import load_voicebank
|
||||||
|
_require(slug)
|
||||||
|
return {"cast": artifacts.load_cast(slug).model_dump(mode="json"),
|
||||||
|
"voicebank": load_voicebank().model_dump(mode="json")}
|
||||||
|
|
||||||
|
|
||||||
|
@app.put("/api/books/{slug}/cast")
|
||||||
|
def put_cast(slug: str, cast: Cast) -> dict:
|
||||||
|
_require(slug)
|
||||||
|
artifacts.save_cast(slug, cast)
|
||||||
|
return {"saved": True}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/api/books/{slug}/pronunciation")
|
||||||
|
def get_pron(slug: str) -> dict:
|
||||||
|
_require(slug)
|
||||||
|
return artifacts.load_pronunciation(slug).model_dump(mode="json")
|
||||||
|
|
||||||
|
|
||||||
|
@app.put("/api/books/{slug}/pronunciation")
|
||||||
|
def put_pron(slug: str, pron: Pronunciation) -> dict:
|
||||||
|
_require(slug)
|
||||||
|
artifacts.save_pronunciation(slug, pron)
|
||||||
|
return {"saved": True}
|
||||||
|
|
||||||
|
|
||||||
|
# --- Reglages techniques globaux ---------------------------------------------
|
||||||
|
|
||||||
|
@app.get("/api/settings")
|
||||||
|
def read_settings() -> dict:
|
||||||
|
return get_settings().model_dump(mode="json")
|
||||||
|
|
||||||
|
|
||||||
|
@app.put("/api/settings")
|
||||||
|
def write_settings(settings: Settings) -> dict:
|
||||||
|
save_settings(settings)
|
||||||
|
return {"saved": True}
|
||||||
|
|
||||||
|
|
||||||
|
# --- Voicebank + preview -----------------------------------------------------
|
||||||
|
|
||||||
|
@app.get("/api/voicebank")
|
||||||
|
def get_voicebank() -> dict:
|
||||||
|
from ..casting.voicebank import load_voicebank
|
||||||
|
return load_voicebank().model_dump(mode="json")
|
||||||
|
|
||||||
|
|
||||||
|
class PreviewBody(BaseModel):
|
||||||
|
voice_id: str
|
||||||
|
text: str = "Bonjour, voici un aperçu de cette voix pour votre livre audio."
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/api/voicebank/preview")
|
||||||
|
async def preview_voice(body: PreviewBody):
|
||||||
|
from ..casting.voicebank import load_voicebank
|
||||||
|
from ..tts.base import VoiceSpec
|
||||||
|
|
||||||
|
entry = load_voicebank().by_id(body.voice_id)
|
||||||
|
if entry is None:
|
||||||
|
raise HTTPException(404, "voix inconnue")
|
||||||
|
|
||||||
|
def _synth() -> bytes:
|
||||||
|
from ..tts.factory import get_backend
|
||||||
|
backend = get_backend("kokoro")
|
||||||
|
audio, sr = backend.synthesize(body.text, VoiceSpec(preset=entry.kokoro_voice))
|
||||||
|
buf = io.BytesIO()
|
||||||
|
sf.write(buf, audio, sr, format="WAV")
|
||||||
|
return buf.getvalue()
|
||||||
|
|
||||||
|
data = await asyncio.to_thread(_synth)
|
||||||
|
return Response(content=data, media_type="audio/wav")
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/api/books/{slug}/audio/{index}")
|
||||||
|
def get_audio(slug: str, index: int):
|
||||||
|
state = load_state(slug)
|
||||||
|
rs = state.render.get(index)
|
||||||
|
if not rs or not rs.mp3:
|
||||||
|
raise HTTPException(404, "audio non genere")
|
||||||
|
path = book_output_dir(load_book(slug).title) / rs.mp3
|
||||||
|
if not path.exists():
|
||||||
|
raise HTTPException(404, "fichier introuvable")
|
||||||
|
return FileResponse(str(path), media_type="audio/mpeg", filename=rs.mp3)
|
||||||
|
|
||||||
|
|
||||||
|
# --- WebSocket ---------------------------------------------------------------
|
||||||
|
|
||||||
|
@app.websocket("/ws/{slug}")
|
||||||
|
async def ws_endpoint(ws: WebSocket, slug: str) -> None:
|
||||||
|
await manager.connect(slug, ws)
|
||||||
|
try:
|
||||||
|
# Envoi de l'etat courant a la connexion.
|
||||||
|
await ws.send_json({"type": "state", "state": load_state(slug).model_dump(mode="json")})
|
||||||
|
while True:
|
||||||
|
await ws.receive_text() # garde la connexion ouverte
|
||||||
|
except WebSocketDisconnect:
|
||||||
|
manager.disconnect(slug, ws)
|
||||||
|
except Exception: # noqa: BLE001
|
||||||
|
manager.disconnect(slug, ws)
|
||||||
|
|
||||||
|
|
||||||
|
def _require(slug: str) -> None:
|
||||||
|
if not (book_data_dir(slug) / "book.json").exists():
|
||||||
|
raise HTTPException(404, "livre inconnu")
|
||||||
|
|
||||||
|
|
||||||
|
# --- Service du frontend build (si present) ----------------------------------
|
||||||
|
_FRONTEND_DIST = Path(__file__).resolve().parents[2].parent / "frontend" / "dist"
|
||||||
|
if _FRONTEND_DIST.exists():
|
||||||
|
app.mount("/", StaticFiles(directory=str(_FRONTEND_DIST), html=True), name="ui")
|
||||||
47
backend/inkflow/api/ws.py
Normal file
47
backend/inkflow/api/ws.py
Normal file
@@ -0,0 +1,47 @@
|
|||||||
|
"""Gestionnaire de connexions WebSocket avec diffusion thread-safe.
|
||||||
|
|
||||||
|
L'orchestrateur tourne dans un thread worker ; il appelle `broadcast_threadsafe`
|
||||||
|
qui replanifie l'envoi sur la boucle asyncio de l'API.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
from collections import defaultdict
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from fastapi import WebSocket
|
||||||
|
|
||||||
|
|
||||||
|
class ConnectionManager:
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self.active: dict[str, set[WebSocket]] = defaultdict(set)
|
||||||
|
self._loop: Optional[asyncio.AbstractEventLoop] = None
|
||||||
|
|
||||||
|
def bind_loop(self, loop: asyncio.AbstractEventLoop) -> None:
|
||||||
|
self._loop = loop
|
||||||
|
|
||||||
|
async def connect(self, slug: str, ws: WebSocket) -> None:
|
||||||
|
await ws.accept()
|
||||||
|
self.active[slug].add(ws)
|
||||||
|
|
||||||
|
def disconnect(self, slug: str, ws: WebSocket) -> None:
|
||||||
|
self.active[slug].discard(ws)
|
||||||
|
|
||||||
|
def broadcast_threadsafe(self, slug: str, data: dict) -> None:
|
||||||
|
"""Appelable depuis n'importe quel thread (worker orchestrateur)."""
|
||||||
|
if self._loop is None:
|
||||||
|
return
|
||||||
|
self._loop.call_soon_threadsafe(self._dispatch, slug, data)
|
||||||
|
|
||||||
|
def _dispatch(self, slug: str, data: dict) -> None:
|
||||||
|
for ws in list(self.active.get(slug, ())):
|
||||||
|
asyncio.create_task(self._safe_send(slug, ws, data))
|
||||||
|
|
||||||
|
async def _safe_send(self, slug: str, ws: WebSocket, data: dict) -> None:
|
||||||
|
try:
|
||||||
|
await ws.send_json({"type": "state", "state": data})
|
||||||
|
except Exception: # noqa: BLE001 — connexion fermee
|
||||||
|
self.disconnect(slug, ws)
|
||||||
|
|
||||||
|
|
||||||
|
manager = ConnectionManager()
|
||||||
0
backend/inkflow/audio/__init__.py
Normal file
0
backend/inkflow/audio/__init__.py
Normal file
125
backend/inkflow/audio/postprocess.py
Normal file
125
backend/inkflow/audio/postprocess.py
Normal file
@@ -0,0 +1,125 @@
|
|||||||
|
"""Assemblage audio final : concat -> normalisation -> WAV -> MP3 taggue.
|
||||||
|
|
||||||
|
Pas de pydub (casse en Python 3.13) : concat/normalisation en numpy, encodage
|
||||||
|
mp3 + cover via ffmpeg CLI, tags via les metadonnees ffmpeg.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import shutil
|
||||||
|
import subprocess
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
import soundfile as sf
|
||||||
|
|
||||||
|
from ..settings import get_settings
|
||||||
|
|
||||||
|
|
||||||
|
def _resample(audio: np.ndarray, src_sr: int, dst_sr: int) -> np.ndarray:
|
||||||
|
if src_sr == dst_sr or audio.size == 0:
|
||||||
|
return audio
|
||||||
|
duration = audio.size / src_sr
|
||||||
|
n_dst = int(round(duration * dst_sr))
|
||||||
|
x_src = np.linspace(0.0, duration, num=audio.size, endpoint=False)
|
||||||
|
x_dst = np.linspace(0.0, duration, num=n_dst, endpoint=False)
|
||||||
|
return np.interp(x_dst, x_src, audio).astype(np.float32)
|
||||||
|
|
||||||
|
|
||||||
|
def silence(seconds: float, sr: int) -> np.ndarray:
|
||||||
|
return np.zeros(int(seconds * sr), dtype=np.float32)
|
||||||
|
|
||||||
|
|
||||||
|
def concat_segments(
|
||||||
|
parts: list[tuple[np.ndarray, int]],
|
||||||
|
*,
|
||||||
|
target_sr: Optional[int] = None,
|
||||||
|
gap_seconds: float = 0.35,
|
||||||
|
intra_gap_seconds: float = 0.12,
|
||||||
|
glued: Optional[list[bool]] = None,
|
||||||
|
) -> tuple[np.ndarray, int]:
|
||||||
|
"""Concatene des segments (audio, sr) avec un silence entre chacun.
|
||||||
|
|
||||||
|
`glued[i] == True` (ex: une incise et sa replique, issues du meme paragraphe)
|
||||||
|
insere un silence court `intra_gap_seconds` au lieu de `gap_seconds`.
|
||||||
|
"""
|
||||||
|
if target_sr is None:
|
||||||
|
target_sr = get_settings().target_sample_rate
|
||||||
|
gap = silence(gap_seconds, target_sr)
|
||||||
|
intra_gap = silence(intra_gap_seconds, target_sr)
|
||||||
|
buf: list[np.ndarray] = []
|
||||||
|
first = True
|
||||||
|
for i, (audio, sr) in enumerate(parts):
|
||||||
|
if audio is None or audio.size == 0:
|
||||||
|
continue
|
||||||
|
if not first:
|
||||||
|
use_intra = glued is not None and i < len(glued) and glued[i]
|
||||||
|
buf.append(intra_gap if use_intra else gap)
|
||||||
|
first = False
|
||||||
|
buf.append(_resample(np.asarray(audio, dtype=np.float32), sr, target_sr))
|
||||||
|
if not buf:
|
||||||
|
return np.zeros(0, dtype=np.float32), target_sr
|
||||||
|
return np.concatenate(buf), target_sr
|
||||||
|
|
||||||
|
|
||||||
|
def normalize_loudness(audio: np.ndarray, target_dbfs: Optional[float] = None) -> np.ndarray:
|
||||||
|
"""Normalise le niveau RMS vers target_dbfs, avec garde anti-saturation."""
|
||||||
|
if audio.size == 0:
|
||||||
|
return audio
|
||||||
|
if target_dbfs is None:
|
||||||
|
target_dbfs = get_settings().target_dbfs
|
||||||
|
rms = float(np.sqrt(np.mean(audio.astype(np.float64) ** 2)))
|
||||||
|
if rms < 1e-6:
|
||||||
|
return audio
|
||||||
|
current_dbfs = 20.0 * np.log10(rms)
|
||||||
|
gain = 10.0 ** ((target_dbfs - current_dbfs) / 20.0)
|
||||||
|
out = audio * gain
|
||||||
|
peak = float(np.max(np.abs(out))) if out.size else 0.0
|
||||||
|
if peak > 0.99: # limiteur simple pour eviter le clipping
|
||||||
|
out *= 0.99 / peak
|
||||||
|
return out.astype(np.float32)
|
||||||
|
|
||||||
|
|
||||||
|
def write_wav(path: str | Path, audio: np.ndarray, sr: int) -> Path:
|
||||||
|
path = Path(path)
|
||||||
|
path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
sf.write(str(path), audio, sr)
|
||||||
|
return path
|
||||||
|
|
||||||
|
|
||||||
|
def encode_mp3(
|
||||||
|
wav_path: str | Path,
|
||||||
|
mp3_path: str | Path,
|
||||||
|
*,
|
||||||
|
bitrate: Optional[str] = None,
|
||||||
|
title: Optional[str] = None,
|
||||||
|
album: Optional[str] = None,
|
||||||
|
artist: Optional[str] = None,
|
||||||
|
track: Optional[int] = None,
|
||||||
|
cover_path: Optional[str | Path] = None,
|
||||||
|
) -> Path:
|
||||||
|
"""Encode un WAV en MP3 (ffmpeg) avec tags ID3 et cover optionnelle."""
|
||||||
|
if bitrate is None:
|
||||||
|
bitrate = get_settings().mp3_bitrate
|
||||||
|
if not shutil.which("ffmpeg"):
|
||||||
|
raise RuntimeError("ffmpeg introuvable — brew install ffmpeg")
|
||||||
|
wav_path, mp3_path = Path(wav_path), Path(mp3_path)
|
||||||
|
mp3_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
cmd = ["ffmpeg", "-y", "-i", str(wav_path)]
|
||||||
|
has_cover = cover_path and Path(cover_path).exists()
|
||||||
|
if has_cover:
|
||||||
|
cmd += ["-i", str(cover_path), "-map", "0:a", "-map", "1:v",
|
||||||
|
"-c:v", "mjpeg", "-disposition:v", "attached_pic"]
|
||||||
|
cmd += ["-c:a", "libmp3lame", "-b:a", bitrate]
|
||||||
|
|
||||||
|
meta = {"title": title, "album": album, "artist": artist}
|
||||||
|
if track is not None:
|
||||||
|
meta["track"] = str(track)
|
||||||
|
for key, val in meta.items():
|
||||||
|
if val:
|
||||||
|
cmd += ["-metadata", f"{key}={val}"]
|
||||||
|
cmd += ["-id3v2_version", "3", str(mp3_path)]
|
||||||
|
|
||||||
|
subprocess.run(cmd, check=True, capture_output=True)
|
||||||
|
return mp3_path
|
||||||
0
backend/inkflow/casting/__init__.py
Normal file
0
backend/inkflow/casting/__init__.py
Normal file
86
backend/inkflow/casting/assign.py
Normal file
86
backend/inkflow/casting/assign.py
Normal file
@@ -0,0 +1,86 @@
|
|||||||
|
"""Auto-casting : attribue une voix distincte a chaque personnage.
|
||||||
|
|
||||||
|
Strategie deterministe :
|
||||||
|
- Narrateur : voix FR native par defaut (ff_siwis), sinon premiere voix.
|
||||||
|
- Personnages : voix du meme genre, distinctes tant qu'il en reste ; au-dela on
|
||||||
|
recycle en repartissant le plus equitablement possible. Genre inconnu -> pool
|
||||||
|
mixte. L'ordre (tri par nom) garantit la reproductibilite.
|
||||||
|
L'utilisateur pourra surcharger ces choix dans l'UI.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from collections import Counter
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from ..models import Cast, Character, Voicebank
|
||||||
|
|
||||||
|
# Voix narrateur preferee (FR native).
|
||||||
|
PREFERRED_NARRATOR = "fr_f_siwis"
|
||||||
|
|
||||||
|
|
||||||
|
def _pick_pool(vb: Voicebank, gender: Optional[str], narrator_id: str) -> list[str]:
|
||||||
|
"""Voix candidates : on privilegie STRICTEMENT le genre (quitte a reutiliser).
|
||||||
|
|
||||||
|
On ne croise le genre que si aucune voix du bon genre n'existe. Le narrateur
|
||||||
|
est exclu tant qu'il reste d'autres options, pour le distinguer.
|
||||||
|
"""
|
||||||
|
same = [e.id for e in vb.by_gender(gender)] if gender in ("male", "female") else []
|
||||||
|
pool = same if same else [e.id for e in vb.entries]
|
||||||
|
non_narrator = [vid for vid in pool if vid != narrator_id]
|
||||||
|
return non_narrator or pool # garde le narrateur seulement s'il est seul
|
||||||
|
|
||||||
|
|
||||||
|
def assign_voices(
|
||||||
|
characters: list[Character],
|
||||||
|
vb: Voicebank,
|
||||||
|
*,
|
||||||
|
narrator_voice_id: Optional[str] = None,
|
||||||
|
respect_existing: bool = False,
|
||||||
|
) -> Cast:
|
||||||
|
"""Renvoie un Cast avec narrateur + voix par personnage (mutation des chars).
|
||||||
|
|
||||||
|
`respect_existing=True` conserve les voix deja attribuees (overrides UI) ;
|
||||||
|
sinon tout est re-calcule (auto-casting frais).
|
||||||
|
"""
|
||||||
|
if not vb.entries:
|
||||||
|
return Cast(narrator_voice_id=narrator_voice_id, characters=characters)
|
||||||
|
|
||||||
|
narrator_id = narrator_voice_id or (
|
||||||
|
PREFERRED_NARRATOR if vb.by_id(PREFERRED_NARRATOR) else vb.entries[0].id)
|
||||||
|
|
||||||
|
usage: Counter[str] = Counter()
|
||||||
|
usage[narrator_id] += 1 # le narrateur compte deja
|
||||||
|
|
||||||
|
for ch in sorted(characters, key=lambda c: c.name.lower()):
|
||||||
|
if respect_existing and ch.voice_id and vb.by_id(ch.voice_id):
|
||||||
|
usage[ch.voice_id] += 1
|
||||||
|
continue # respecte une attribution existante (override utilisateur)
|
||||||
|
pool = _pick_pool(vb, ch.gender, narrator_id)
|
||||||
|
# Choisit la voix la moins utilisee du pool (donc une voix neuve d'abord).
|
||||||
|
best = min(pool, key=lambda vid: (usage[vid], pool.index(vid)))
|
||||||
|
ch.voice_id = best
|
||||||
|
usage[best] += 1
|
||||||
|
|
||||||
|
return Cast(narrator_voice_id=narrator_id, characters=characters)
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_speaker_voice(
|
||||||
|
speaker: str, cast: Cast, vb: Voicebank
|
||||||
|
) -> Optional[str]:
|
||||||
|
"""Mappe un nom de locuteur (segment) vers un id de voix.
|
||||||
|
|
||||||
|
Matche d'abord par nom/alias exact (rapide), puis en dernier recours par
|
||||||
|
rapprochement heuristique de tokens (ex: un "Jim" qui n'aurait pas encore
|
||||||
|
ete absorbe comme alias de "James Holden").
|
||||||
|
"""
|
||||||
|
if speaker == "narrateur":
|
||||||
|
return cast.narrator_voice_id
|
||||||
|
low = speaker.lower()
|
||||||
|
for ch in cast.characters:
|
||||||
|
if ch.name.lower() == low or low in (a.lower() for a in ch.aliases):
|
||||||
|
return ch.voice_id
|
||||||
|
from .dedup import heuristic_match
|
||||||
|
match = heuristic_match(speaker, cast.characters)
|
||||||
|
if isinstance(match, Character):
|
||||||
|
return match.voice_id
|
||||||
|
return None # inconnu -> le rendu repliera sur le narrateur
|
||||||
345
backend/inkflow/casting/dedup.py
Normal file
345
backend/inkflow/casting/dedup.py
Normal file
@@ -0,0 +1,345 @@
|
|||||||
|
"""Reconciliation du casting : deduplication des variantes de noms.
|
||||||
|
|
||||||
|
Probleme : un meme personnage apparait sous plusieurs formes ("Holden",
|
||||||
|
"James Holden", "James", "Jim"). Sans reconciliation, chaque forme devient un
|
||||||
|
personnage distinct avec sa propre voix -> incoherence a l'ecoute.
|
||||||
|
|
||||||
|
Strategie hybride :
|
||||||
|
1. Heuristique (sans LLM) : match exact sur nom/alias, puis sous-ensemble de
|
||||||
|
tokens ("Holden" contenu dans "James Holden").
|
||||||
|
2. Gemma tranche les cas ambigus (plusieurs candidats compatibles, ou variante
|
||||||
|
non evidente type "Jim" <-> "James") a l'aide des descriptions.
|
||||||
|
|
||||||
|
Chaque variante rencontree est conservee comme `alias` du personnage canonique ;
|
||||||
|
le nom canonique est la forme la plus complete vue ("James Holden"). Les
|
||||||
|
artefacts d'analyse (segments) ne sont PAS modifies : la resolution de voix au
|
||||||
|
rendu s'appuie sur les aliases (`casting/assign.py`).
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import re
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from ..models import Character
|
||||||
|
from ..settings import get_settings
|
||||||
|
|
||||||
|
# Sentinelles internes.
|
||||||
|
_AMBIGUOUS = object() # heuristique : plusieurs candidats -> on delegue a Gemma
|
||||||
|
_NEW = object() # decision Gemma : nouveau personnage
|
||||||
|
|
||||||
|
# Mots vides / titres a ignorer pour le rapprochement par tokens.
|
||||||
|
_STOPWORDS = {
|
||||||
|
"le", "la", "les", "un", "une", "de", "du", "des", "monsieur", "madame",
|
||||||
|
"mademoiselle", "m", "mme", "mlle", "mr", "dr", "docteur", "capitaine",
|
||||||
|
"lieutenant", "sergent", "general", "amiral", "the", "of",
|
||||||
|
}
|
||||||
|
_SPLIT_RE = re.compile(r"[^\wÀ-ÿ]+")
|
||||||
|
|
||||||
|
# Garde-fou de contexte (caracteres) pour le prompt Gemma.
|
||||||
|
_MAX_PROMPT_CHARS = 24000
|
||||||
|
|
||||||
|
|
||||||
|
def _norm(name: str) -> str:
|
||||||
|
return name.strip().lower()
|
||||||
|
|
||||||
|
|
||||||
|
def _tokens(name: str) -> set[str]:
|
||||||
|
"""Tokens significatifs d'un nom (minuscules, sans titres ni mots vides)."""
|
||||||
|
parts = [p for p in _SPLIT_RE.split(name.strip()) if p]
|
||||||
|
return {p.lower() for p in parts
|
||||||
|
if len(p) >= 2 and p.lower() not in _STOPWORDS}
|
||||||
|
|
||||||
|
|
||||||
|
def _completeness(name: str) -> tuple[int, int]:
|
||||||
|
"""Cle de tri du nom le plus "complet" : plus de tokens, puis plus long."""
|
||||||
|
return (len(_tokens(name)), len(name.strip()))
|
||||||
|
|
||||||
|
|
||||||
|
def _forms(c: Character) -> list[str]:
|
||||||
|
return [c.name, *c.aliases]
|
||||||
|
|
||||||
|
|
||||||
|
def _token_freq(characters: list[Character], extra: Optional[list[str]] = None):
|
||||||
|
"""Compte, pour chaque token, le nb de surfaces distinctes le contenant.
|
||||||
|
|
||||||
|
Sert a juger la distinctivite d'un token : "holden" present dans une seule
|
||||||
|
famille est sur a fusionner ; "alex" present dans plusieurs ne l'est pas.
|
||||||
|
"""
|
||||||
|
from collections import Counter
|
||||||
|
freq: Counter[str] = Counter()
|
||||||
|
surfaces = {_norm(f) for c in characters for f in _forms(c)}
|
||||||
|
surfaces |= {_norm(s) for s in (extra or [])}
|
||||||
|
for s in surfaces:
|
||||||
|
for t in _tokens(s):
|
||||||
|
freq[t] += 1
|
||||||
|
return freq
|
||||||
|
|
||||||
|
|
||||||
|
def heuristic_match(surface: str, characters: list[Character], tokfreq=None):
|
||||||
|
"""Rapproche `surface` d'un personnage connu sans LLM (conservateur).
|
||||||
|
|
||||||
|
Renvoie le `Character` correspondant, `None` si aucun, ou `_AMBIGUOUS` si le
|
||||||
|
rapprochement est plausible mais incertain (decision laissee a Gemma).
|
||||||
|
|
||||||
|
Un lien par sous-ensemble de tokens n'est considere SUR que si le plus petit
|
||||||
|
cote a >=2 tokens, ou si les tokens partages sont globalement distinctifs
|
||||||
|
(presents dans <=2 surfaces). Sinon le lien est ambigu (ex: un prenom
|
||||||
|
courant "Alex" partage par plusieurs personnages).
|
||||||
|
"""
|
||||||
|
s_norm = _norm(surface)
|
||||||
|
for c in characters:
|
||||||
|
if _norm(c.name) == s_norm or any(_norm(a) == s_norm for a in c.aliases):
|
||||||
|
return c
|
||||||
|
s_tok = _tokens(surface)
|
||||||
|
if not s_tok:
|
||||||
|
return None
|
||||||
|
if tokfreq is None:
|
||||||
|
tokfreq = _token_freq(characters, [surface])
|
||||||
|
|
||||||
|
safe: list[Character] = []
|
||||||
|
ambiguous = False
|
||||||
|
for c in characters:
|
||||||
|
linked = is_safe = False
|
||||||
|
for form in _forms(c):
|
||||||
|
f_tok = _tokens(form)
|
||||||
|
if not f_tok or not (s_tok <= f_tok or f_tok <= s_tok):
|
||||||
|
continue
|
||||||
|
linked = True
|
||||||
|
shared = s_tok & f_tok
|
||||||
|
if min(len(s_tok), len(f_tok)) >= 2 or all(tokfreq[t] <= 2 for t in shared):
|
||||||
|
is_safe = True
|
||||||
|
if is_safe:
|
||||||
|
safe.append(c)
|
||||||
|
elif linked:
|
||||||
|
ambiguous = True
|
||||||
|
if len(safe) == 1 and not ambiguous:
|
||||||
|
return safe[0]
|
||||||
|
if safe or ambiguous:
|
||||||
|
return _AMBIGUOUS
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def canonical_of(a: str, b: str) -> str:
|
||||||
|
"""Forme canonique entre deux variantes : la plus complete."""
|
||||||
|
return a if _completeness(a) >= _completeness(b) else b
|
||||||
|
|
||||||
|
|
||||||
|
def _absorb(
|
||||||
|
target: Character,
|
||||||
|
name: str,
|
||||||
|
*,
|
||||||
|
gender: Optional[str] = None,
|
||||||
|
age: Optional[str] = None,
|
||||||
|
description: Optional[str] = None,
|
||||||
|
voice_id: Optional[str] = None,
|
||||||
|
) -> None:
|
||||||
|
"""Fusionne la variante `name` dans `target` (mutation en place).
|
||||||
|
|
||||||
|
Enrichit les attributs manquants, recalcule le nom canonique et range les
|
||||||
|
autres formes en aliases.
|
||||||
|
"""
|
||||||
|
target.gender = target.gender or gender
|
||||||
|
target.age = target.age or age
|
||||||
|
target.description = target.description or description
|
||||||
|
target.voice_id = target.voice_id or voice_id
|
||||||
|
|
||||||
|
forms: dict[str, str] = {} # norm -> graphie d'origine (1re vue conservee)
|
||||||
|
for f in [target.name, *target.aliases, name]:
|
||||||
|
f = (f or "").strip()
|
||||||
|
if f:
|
||||||
|
forms.setdefault(_norm(f), f)
|
||||||
|
canon = max(forms, key=lambda n: _completeness(forms[n]))
|
||||||
|
target.name = forms[canon]
|
||||||
|
target.aliases = sorted(v for k, v in forms.items() if k != canon)
|
||||||
|
|
||||||
|
|
||||||
|
def _item(c) -> dict:
|
||||||
|
"""Normalise un personnage ou un nom brut en entree de reconciliation."""
|
||||||
|
if isinstance(c, Character):
|
||||||
|
return {"name": c.name, "gender": c.gender, "age": c.age,
|
||||||
|
"description": c.description, "voice_id": c.voice_id}
|
||||||
|
return {"name": str(c), "gender": None, "age": None,
|
||||||
|
"description": None, "voice_id": None}
|
||||||
|
|
||||||
|
|
||||||
|
def _find(chars: list[Character], name: str) -> Optional[Character]:
|
||||||
|
n = _norm(name)
|
||||||
|
return next((c for c in chars
|
||||||
|
if _norm(c.name) == n or any(_norm(a) == n for a in c.aliases)),
|
||||||
|
None)
|
||||||
|
|
||||||
|
|
||||||
|
def _create(chars: list[Character], it: dict, name_map: dict[str, str]) -> None:
|
||||||
|
new = Character(name=it["name"].strip(), gender=it["gender"], age=it["age"],
|
||||||
|
description=it["description"], voice_id=it["voice_id"])
|
||||||
|
chars.append(new)
|
||||||
|
name_map[_norm(it["name"])] = new.name
|
||||||
|
|
||||||
|
|
||||||
|
def reconcile_characters(
|
||||||
|
book_chars: list[Character],
|
||||||
|
new_chars,
|
||||||
|
gemma=None,
|
||||||
|
*,
|
||||||
|
speaker_names: Optional[list[str]] = None,
|
||||||
|
) -> tuple[list[Character], dict[str, str]]:
|
||||||
|
"""Reconcilie de nouvelles detections dans le casting du livre.
|
||||||
|
|
||||||
|
`new_chars` : personnages extraits (objets `Character`) du/des chapitre(s).
|
||||||
|
`speaker_names` : formes de locuteur brutes vues dans les segments (absorbees
|
||||||
|
comme aliases pour que la resolution de voix matche au rendu).
|
||||||
|
`gemma` : si fourni, tranche les cas ambigus ; sinon heuristique seule.
|
||||||
|
|
||||||
|
Renvoie (liste canonique mise a jour, map nom_surface_normalise -> canonique).
|
||||||
|
"""
|
||||||
|
chars = [c.model_copy(deep=True) for c in book_chars]
|
||||||
|
name_map: dict[str, str] = {}
|
||||||
|
|
||||||
|
items = [_item(c) for c in new_chars]
|
||||||
|
seen = {_norm(it["name"]) for it in items}
|
||||||
|
for sp in (speaker_names or []):
|
||||||
|
n = _norm(sp or "")
|
||||||
|
if n and n not in seen and n not in {"narrateur", "inconnu", "?"}:
|
||||||
|
items.append(_item(sp))
|
||||||
|
seen.add(n)
|
||||||
|
|
||||||
|
# Fréquence globale des tokens (base + entrants) -> distinctivite stable,
|
||||||
|
# independante de l'ordre de traitement.
|
||||||
|
tokfreq = _token_freq(chars, [it["name"] for it in items])
|
||||||
|
|
||||||
|
pending: list[dict] = []
|
||||||
|
for it in items:
|
||||||
|
m = heuristic_match(it["name"], chars, tokfreq)
|
||||||
|
if m is _AMBIGUOUS:
|
||||||
|
pending.append(it)
|
||||||
|
elif m is not None:
|
||||||
|
_absorb(m, it["name"], gender=it["gender"], age=it["age"],
|
||||||
|
description=it["description"], voice_id=it["voice_id"])
|
||||||
|
name_map[_norm(it["name"])] = m.name
|
||||||
|
elif gemma is not None:
|
||||||
|
pending.append(it) # peut etre une variante non evidente ("Jim")
|
||||||
|
else:
|
||||||
|
_create(chars, it, name_map)
|
||||||
|
|
||||||
|
if pending and gemma is not None:
|
||||||
|
decisions = _gemma_reconcile(chars, pending, gemma)
|
||||||
|
for it in pending:
|
||||||
|
canon = decisions.get(_norm(it["name"]))
|
||||||
|
target = _find(chars, canon) if isinstance(canon, str) else None
|
||||||
|
if target is None: # Gemma dit NOUVEAU/inconnu : ultime essai heuristique
|
||||||
|
hm = heuristic_match(it["name"], chars, tokfreq)
|
||||||
|
target = hm if isinstance(hm, Character) else None
|
||||||
|
if target is not None:
|
||||||
|
_absorb(target, it["name"], gender=it["gender"], age=it["age"],
|
||||||
|
description=it["description"], voice_id=it["voice_id"])
|
||||||
|
name_map[_norm(it["name"])] = target.name
|
||||||
|
else:
|
||||||
|
_create(chars, it, name_map)
|
||||||
|
elif pending:
|
||||||
|
# Sans Gemma : on ne devine pas les cas ambigus, on les garde distincts.
|
||||||
|
for it in pending:
|
||||||
|
_create(chars, it, name_map)
|
||||||
|
|
||||||
|
return chars, name_map
|
||||||
|
|
||||||
|
|
||||||
|
def dedup_cast(characters: list[Character], gemma=None) -> list[Character]:
|
||||||
|
"""Replie les doublons d'un casting existant (conserve les voix attribuees).
|
||||||
|
|
||||||
|
Deux phases : (1) regroupement heuristique sur (gemma=None) -> liste reduite
|
||||||
|
et sure ; (2) si `gemma` fourni, passe de regroupement Gemma sur les seuls
|
||||||
|
noms candidats (partageant un token avec un autre), pour fusionner les
|
||||||
|
variantes que l'heuristique laisse de cote (ex: "Okoye" -> "Elvi Okoye").
|
||||||
|
"""
|
||||||
|
base, _ = reconcile_characters([], characters, gemma=None)
|
||||||
|
if gemma is None:
|
||||||
|
return base
|
||||||
|
return _gemma_merge_pass(base, gemma)
|
||||||
|
|
||||||
|
|
||||||
|
def _gemma_merge_pass(base: list[Character], gemma) -> list[Character]:
|
||||||
|
"""Rattache via Gemma les formes courtes a un nom complet (ancre).
|
||||||
|
|
||||||
|
Tache volontairement contrainte (et plus fiable qu'un regroupement libre) :
|
||||||
|
une "forme courte" est un nom dont les tokens sont strictement inclus dans
|
||||||
|
ceux d'un autre (ex: "Okoye" vs "Elvi Okoye"). Gemma mappe chaque forme
|
||||||
|
courte vers le nom canonique EXACT d'une ancre, ou "NOUVEAU". Traite par
|
||||||
|
petits lots pour rester dans la zone de fiabilite du modele.
|
||||||
|
"""
|
||||||
|
shorts: list[Character] = []
|
||||||
|
anchors: list[Character] = []
|
||||||
|
for i, c in enumerate(base):
|
||||||
|
ts = _tokens(c.name)
|
||||||
|
if ts and any(j != i and ts < _tokens(d.name) for j, d in enumerate(base)):
|
||||||
|
shorts.append(c)
|
||||||
|
else:
|
||||||
|
anchors.append(c)
|
||||||
|
if not shorts:
|
||||||
|
return base
|
||||||
|
|
||||||
|
result = [a.model_copy(deep=True) for a in anchors]
|
||||||
|
leftovers: list[Character] = []
|
||||||
|
for start in range(0, len(shorts), 12):
|
||||||
|
chunk = shorts[start:start + 12]
|
||||||
|
decisions = _gemma_reconcile(result, [_item(s) for s in chunk], gemma)
|
||||||
|
for s in chunk:
|
||||||
|
canon = decisions.get(_norm(s.name))
|
||||||
|
tgt = _find(result, canon) if isinstance(canon, str) else None
|
||||||
|
if tgt is None:
|
||||||
|
hm = heuristic_match(s.name, result)
|
||||||
|
tgt = hm if isinstance(hm, Character) else None
|
||||||
|
# Garde-fou : ne pas fusionner deux genres connus opposes.
|
||||||
|
if tgt is not None and s.gender and tgt.gender and s.gender != tgt.gender:
|
||||||
|
tgt = None
|
||||||
|
if tgt is not None:
|
||||||
|
_absorb(tgt, s.name, gender=s.gender, age=s.age,
|
||||||
|
description=s.description, voice_id=s.voice_id)
|
||||||
|
for a in s.aliases:
|
||||||
|
_absorb(tgt, a)
|
||||||
|
else:
|
||||||
|
leftovers.append(s)
|
||||||
|
return result + leftovers
|
||||||
|
|
||||||
|
|
||||||
|
def _gemma_reconcile(
|
||||||
|
chars: list[Character], pending: list[dict], gemma
|
||||||
|
) -> dict[str, object]:
|
||||||
|
"""Un appel groupe : pour chaque nom en attente, son canonique ou _NEW."""
|
||||||
|
known = []
|
||||||
|
for c in chars:
|
||||||
|
al = f" (alias: {', '.join(c.aliases)})" if c.aliases else ""
|
||||||
|
desc = f" — {c.description}" if c.description else ""
|
||||||
|
known.append(f"- {c.name}{al}{desc}")
|
||||||
|
new_lines = []
|
||||||
|
for n, it in enumerate(pending):
|
||||||
|
desc = f" — {it['description']}" if it.get("description") else ""
|
||||||
|
new_lines.append(f"[{n}] {it['name']}{desc}")
|
||||||
|
|
||||||
|
prompt = (
|
||||||
|
"Personnages DEJA connus du livre :\n"
|
||||||
|
+ ("\n".join(known) if known else "(aucun)")
|
||||||
|
+ "\n\nNoms DETECTES a classer :\n" + "\n".join(new_lines)
|
||||||
|
+ "\n\nPour chaque nom detecte, indique s'il designe un personnage deja "
|
||||||
|
"connu (donne alors son nom canonique EXACT tel qu'ecrit ci-dessus) ou "
|
||||||
|
"s'il s'agit d'un nouveau personnage (\"NOUVEAU\"). Ne fusionne que si "
|
||||||
|
"c'est, avec certitude, la meme personne. EN CAS DE DOUTE, ou si "
|
||||||
|
"plusieurs personnages connus pourraient correspondre, reponds "
|
||||||
|
"\"NOUVEAU\". Ne rapproche jamais deux personnes differentes qui "
|
||||||
|
"partagent seulement un prenom ou un nom de famille.\n\n"
|
||||||
|
'Reponds par un tableau JSON: '
|
||||||
|
'[{"i":0,"canonical":"James Holden"},{"i":1,"canonical":"NOUVEAU"}]'
|
||||||
|
)
|
||||||
|
if len(prompt) > _MAX_PROMPT_CHARS:
|
||||||
|
prompt = prompt[:_MAX_PROMPT_CHARS]
|
||||||
|
result = gemma.generate_json(prompt, system=get_settings().prompt_dedup)
|
||||||
|
|
||||||
|
decisions: dict[str, object] = {}
|
||||||
|
for item in result:
|
||||||
|
if not isinstance(item, dict) or "i" not in item:
|
||||||
|
continue
|
||||||
|
n = item["i"]
|
||||||
|
canon = str(item.get("canonical") or "").strip()
|
||||||
|
if isinstance(n, int) and 0 <= n < len(pending) and canon:
|
||||||
|
decisions[_norm(pending[n]["name"])] = (
|
||||||
|
_NEW if canon.upper() == "NOUVEAU" else canon)
|
||||||
|
return decisions
|
||||||
91
backend/inkflow/casting/voicebank.py
Normal file
91
backend/inkflow/casting/voicebank.py
Normal file
@@ -0,0 +1,91 @@
|
|||||||
|
"""Banque de voix : un jeu de voix variees (genre/age) auto-suffisant.
|
||||||
|
|
||||||
|
Chaque voix s'appuie sur une voix Kokoro (identite + clip de reference). Le clip
|
||||||
|
de reference est genere une fois en lisant un passage francais standard ; il sert
|
||||||
|
de reference de timbre pour le clonage Qwen3 (rendu final). Aucune ressource
|
||||||
|
externe a sourcer.
|
||||||
|
|
||||||
|
Resolution moteur :
|
||||||
|
- Kokoro -> VoiceSpec(preset=kokoro_voice) (rapide, preview / draft)
|
||||||
|
- Qwen3 -> VoiceSpec(ref_audio=clip, ref_text=…) (qualite, clonage)
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import soundfile as sf
|
||||||
|
|
||||||
|
from ..config import VOICEBANK_DIR
|
||||||
|
from ..models import VoiceEntry, Voicebank
|
||||||
|
from ..tts.base import VoiceSpec
|
||||||
|
|
||||||
|
# Passage de reference lu par chaque voix pour creer son clip de clonage.
|
||||||
|
REFERENCE_TEXT = (
|
||||||
|
"L'univers est toujours plus étrange qu'on ne le croit. "
|
||||||
|
"Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Jeu de voix par defaut (varie en genre). ff_siwis est la seule voix FR native ;
|
||||||
|
# les autres empruntent un timbre anglais mais lisent un texte phonemise en FR.
|
||||||
|
SEED: list[VoiceEntry] = [
|
||||||
|
VoiceEntry(id="fr_f_siwis", kokoro_voice="ff_siwis", gender="female", age="adult", label="Siwis (FR)"),
|
||||||
|
VoiceEntry(id="f_bella", kokoro_voice="af_bella", gender="female", age="adult", label="Bella"),
|
||||||
|
VoiceEntry(id="f_heart", kokoro_voice="af_heart", gender="female", age="young", label="Heart"),
|
||||||
|
VoiceEntry(id="f_emma", kokoro_voice="bf_emma", gender="female", age="adult", label="Emma"),
|
||||||
|
VoiceEntry(id="f_nicole", kokoro_voice="af_nicole", gender="female", age="adult", label="Nicole"),
|
||||||
|
VoiceEntry(id="m_fenrir", kokoro_voice="am_fenrir", gender="male", age="adult", label="Fenrir"),
|
||||||
|
VoiceEntry(id="m_michael", kokoro_voice="am_michael", gender="male", age="adult", label="Michael"),
|
||||||
|
VoiceEntry(id="m_george", kokoro_voice="bm_george", gender="male", age="adult", label="George"),
|
||||||
|
VoiceEntry(id="m_lewis", kokoro_voice="bm_lewis", gender="male", age="adult", label="Lewis"),
|
||||||
|
VoiceEntry(id="m_eric", kokoro_voice="am_eric", gender="male", age="young", label="Eric"),
|
||||||
|
VoiceEntry(id="m_santa", kokoro_voice="am_santa", gender="male", age="old", label="Santa"),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def metadata_path() -> Path:
|
||||||
|
return VOICEBANK_DIR / "metadata.json"
|
||||||
|
|
||||||
|
|
||||||
|
def clips_dir() -> Path:
|
||||||
|
return VOICEBANK_DIR / "clips"
|
||||||
|
|
||||||
|
|
||||||
|
def load_voicebank() -> Voicebank:
|
||||||
|
path = metadata_path()
|
||||||
|
if path.exists():
|
||||||
|
return Voicebank.model_validate_json(path.read_text(encoding="utf-8"))
|
||||||
|
return Voicebank(entries=list(SEED))
|
||||||
|
|
||||||
|
|
||||||
|
def save_voicebank(vb: Voicebank) -> Path:
|
||||||
|
VOICEBANK_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
metadata_path().write_text(vb.model_dump_json(indent=2), encoding="utf-8")
|
||||||
|
return metadata_path()
|
||||||
|
|
||||||
|
|
||||||
|
def build_voicebank(*, regenerate: bool = False) -> Voicebank:
|
||||||
|
"""Genere les clips de reference manquants et ecrit metadata.json."""
|
||||||
|
from ..tts.kokoro import KokoroBackend
|
||||||
|
|
||||||
|
clips_dir().mkdir(parents=True, exist_ok=True)
|
||||||
|
backend = KokoroBackend()
|
||||||
|
entries: list[VoiceEntry] = []
|
||||||
|
for seed in SEED:
|
||||||
|
clip_rel = f"clips/{seed.id}.wav"
|
||||||
|
clip_abs = VOICEBANK_DIR / clip_rel
|
||||||
|
if regenerate or not clip_abs.exists():
|
||||||
|
audio, sr = backend.synthesize(REFERENCE_TEXT, VoiceSpec(preset=seed.kokoro_voice))
|
||||||
|
sf.write(str(clip_abs), audio, sr)
|
||||||
|
entry = seed.model_copy(update={"ref_audio": clip_rel, "ref_text": REFERENCE_TEXT})
|
||||||
|
entries.append(entry)
|
||||||
|
vb = Voicebank(entries=entries)
|
||||||
|
save_voicebank(vb)
|
||||||
|
return vb
|
||||||
|
|
||||||
|
|
||||||
|
def voice_spec_for(entry: VoiceEntry, engine: str, *, speed: float = 1.0) -> VoiceSpec:
|
||||||
|
"""Construit la VoiceSpec adaptee au moteur cible."""
|
||||||
|
if engine == "qwen3" and entry.ref_audio:
|
||||||
|
ref_abs = str(VOICEBANK_DIR / entry.ref_audio)
|
||||||
|
return VoiceSpec(ref_audio=ref_abs, ref_text=entry.ref_text, speed=speed)
|
||||||
|
return VoiceSpec(preset=entry.kokoro_voice, speed=speed)
|
||||||
239
backend/inkflow/cli.py
Normal file
239
backend/inkflow/cli.py
Normal file
@@ -0,0 +1,239 @@
|
|||||||
|
"""CLI InkFlow (typer).
|
||||||
|
|
||||||
|
Commandes :
|
||||||
|
- parse : EPUB -> book.json + chapters/chNN.json
|
||||||
|
- analyze : analyse Gemma d'un (ou de tous les) chapitre(s) -> analysis + cast
|
||||||
|
- info : affiche la structure d'un livre deja parse
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import typer
|
||||||
|
from rich.console import Console
|
||||||
|
from rich.table import Table
|
||||||
|
|
||||||
|
from .config import ensure_dirs
|
||||||
|
from .epub.parser import load_book, load_chapter_text, parse_epub
|
||||||
|
from .models import Cast
|
||||||
|
from .store import artifacts
|
||||||
|
|
||||||
|
app = typer.Typer(add_completion=False, help="InkFlow : EPUB -> livre audio (local, MLX).")
|
||||||
|
console = Console()
|
||||||
|
|
||||||
|
|
||||||
|
@app.command()
|
||||||
|
def parse(epub_path: str, slug: Optional[str] = typer.Option(None, help="Slug interne (def: depuis le titre).")):
|
||||||
|
"""Parse un EPUB en structure normalisee."""
|
||||||
|
ensure_dirs()
|
||||||
|
book = parse_epub(epub_path, slug=slug)
|
||||||
|
console.print(f"[green]Parse:[/] {book.title} — slug=[cyan]{book.slug}[/]")
|
||||||
|
console.print(f" {len(book.chapters)} items, {len(book.render_chapters)} a rendre.")
|
||||||
|
_print_chapters(book)
|
||||||
|
|
||||||
|
|
||||||
|
@app.command()
|
||||||
|
def info(slug: str):
|
||||||
|
"""Affiche la structure d'un livre deja parse."""
|
||||||
|
_print_chapters(load_book(slug))
|
||||||
|
|
||||||
|
|
||||||
|
@app.command()
|
||||||
|
def serve(host: str = "127.0.0.1", port: int = 8000):
|
||||||
|
"""Lance l'API + l'UI web (sert frontend/dist si build)."""
|
||||||
|
import uvicorn
|
||||||
|
ensure_dirs()
|
||||||
|
console.print(f"[green]InkFlow[/] sur http://{host}:{port}")
|
||||||
|
uvicorn.run("inkflow.api.app:app", host=host, port=port, log_level="info")
|
||||||
|
|
||||||
|
|
||||||
|
@app.command()
|
||||||
|
def analyze(
|
||||||
|
slug: str,
|
||||||
|
chapter: Optional[int] = typer.Option(None, help="Index de chapitre unique (def: tous)."),
|
||||||
|
limit: Optional[int] = typer.Option(None, help="Limiter au N premiers chapitres rendus."),
|
||||||
|
force: bool = typer.Option(False, help="Re-analyser meme si un artefact existe."),
|
||||||
|
):
|
||||||
|
"""Analyse Gemma : segments narration/dialogue + locuteurs + casting."""
|
||||||
|
from .analysis.gemma import Gemma
|
||||||
|
from .analysis.segmenter import analyze_chapter
|
||||||
|
from .settings import get_settings
|
||||||
|
|
||||||
|
book = load_book(slug)
|
||||||
|
gemma = Gemma()
|
||||||
|
dedup_gemma = gemma if get_settings().dedup_use_gemma else None
|
||||||
|
cast = artifacts.load_cast(slug)
|
||||||
|
chars = list(cast.characters)
|
||||||
|
|
||||||
|
targets = [c for c in book.render_chapters]
|
||||||
|
if chapter is not None:
|
||||||
|
targets = [c for c in book.chapters if c.index == chapter]
|
||||||
|
elif limit:
|
||||||
|
targets = targets[:limit]
|
||||||
|
|
||||||
|
for ch in targets:
|
||||||
|
if not force and artifacts.analysis_path(slug, ch.index).exists():
|
||||||
|
console.print(f"[dim]ch{ch.index:02d} deja analyse — ignore.[/]")
|
||||||
|
continue
|
||||||
|
ct = load_chapter_text(slug, ch)
|
||||||
|
console.print(f"[blue]Analyse[/] ch{ch.index:02d} — {ch.title} ({ct.word_count} mots)…")
|
||||||
|
try:
|
||||||
|
# La dedup est faite dans analyze_chapter : `chars` recoit le cast
|
||||||
|
# cumule reconcilie.
|
||||||
|
analysis, chars = analyze_chapter(
|
||||||
|
ch, ct, gemma, book_chars=chars, dedup_gemma=dedup_gemma)
|
||||||
|
except Exception as exc: # noqa: BLE001 — un chapitre ne doit pas tout stopper
|
||||||
|
console.print(f" [yellow]! echec, chapitre ignore: {exc}[/]")
|
||||||
|
continue
|
||||||
|
artifacts.save_analysis(slug, analysis)
|
||||||
|
n_dlg = sum(1 for s in analysis.segments if s.type.value == "dialogue")
|
||||||
|
console.print(f" -> {len(analysis.segments)} segments ({n_dlg} repliques), "
|
||||||
|
f"{len(chars)} personnages cumules.")
|
||||||
|
|
||||||
|
cast = Cast(narrator_voice_id=cast.narrator_voice_id, characters=chars)
|
||||||
|
artifacts.save_cast(slug, cast)
|
||||||
|
console.print(f"[green]Casting[/] : {len(chars)} personnages -> cast.json")
|
||||||
|
|
||||||
|
|
||||||
|
@app.command()
|
||||||
|
def pronounce(
|
||||||
|
slug: str,
|
||||||
|
chapter: Optional[int] = typer.Option(None, help="Index de chapitre (def: 1er rendu)."),
|
||||||
|
):
|
||||||
|
"""Propose des candidats de prononciation (Gemma) -> pronunciation.json."""
|
||||||
|
from .analysis.gemma import Gemma
|
||||||
|
from .analysis.pronunciation import merge_pronunciations, propose_pronunciations
|
||||||
|
|
||||||
|
book = load_book(slug)
|
||||||
|
ch = (next((c for c in book.chapters if c.index == chapter), None)
|
||||||
|
if chapter is not None else (book.render_chapters[0] if book.render_chapters else None))
|
||||||
|
if ch is None or not ch.text_file:
|
||||||
|
console.print("[red]Chapitre introuvable.[/]"); raise typer.Exit(1)
|
||||||
|
|
||||||
|
ct = load_chapter_text(slug, ch)
|
||||||
|
gemma = Gemma()
|
||||||
|
with console.status("Recherche des mots a risque…"):
|
||||||
|
new = propose_pronunciations("\n".join(ct.paragraphs), gemma)
|
||||||
|
pron = merge_pronunciations(artifacts.load_pronunciation(slug), new)
|
||||||
|
artifacts.save_pronunciation(slug, pron)
|
||||||
|
|
||||||
|
table = Table("terme", "prononciation", "note")
|
||||||
|
for e in pron.entries:
|
||||||
|
table.add_row(e.term, e.replacement, e.note or "")
|
||||||
|
console.print(table)
|
||||||
|
console.print(f"[green]{len(pron.entries)} entrees[/] -> pronunciation.json")
|
||||||
|
|
||||||
|
|
||||||
|
@app.command()
|
||||||
|
def cast(
|
||||||
|
slug: str,
|
||||||
|
rebuild_voicebank: bool = typer.Option(False, help="Regenere les clips de la voicebank."),
|
||||||
|
dedup: bool = typer.Option(False, help="Deduplique d'abord les variantes de noms (heuristique)."),
|
||||||
|
llm: bool = typer.Option(False, "--llm", help="Ajoute la passe Gemma a la dedup (moins sur)."),
|
||||||
|
):
|
||||||
|
"""Construit la voicebank (si besoin) et auto-assigne les voix au casting."""
|
||||||
|
from .casting.assign import assign_voices
|
||||||
|
from .casting.voicebank import build_voicebank, load_voicebank
|
||||||
|
|
||||||
|
cast = artifacts.load_cast(slug)
|
||||||
|
if not cast.characters:
|
||||||
|
console.print("[yellow]Aucun personnage — lance d'abord `analyze`.[/]")
|
||||||
|
raise typer.Exit(1)
|
||||||
|
|
||||||
|
if dedup:
|
||||||
|
from .casting.dedup import dedup_cast
|
||||||
|
from .models import Cast
|
||||||
|
gemma = None
|
||||||
|
if llm:
|
||||||
|
from .analysis.gemma import Gemma
|
||||||
|
gemma = Gemma()
|
||||||
|
before = len(cast.characters)
|
||||||
|
with console.status("Deduplication du casting…"):
|
||||||
|
chars = dedup_cast(cast.characters, gemma)
|
||||||
|
cast = Cast(narrator_voice_id=cast.narrator_voice_id, characters=chars)
|
||||||
|
artifacts.save_cast(slug, cast)
|
||||||
|
console.print(f"[green]Dedup[/] : {before} -> {len(chars)} personnages.")
|
||||||
|
|
||||||
|
vb = load_voicebank()
|
||||||
|
if rebuild_voicebank or not vb.entries or not any(e.ref_audio for e in vb.entries):
|
||||||
|
with console.status("Generation des clips de la voicebank…"):
|
||||||
|
vb = build_voicebank(regenerate=rebuild_voicebank)
|
||||||
|
console.print(f"[green]Voicebank[/] : {len(vb.entries)} voix, clips generes.")
|
||||||
|
|
||||||
|
cast = assign_voices(cast.characters, vb, narrator_voice_id=cast.narrator_voice_id)
|
||||||
|
artifacts.save_cast(slug, cast)
|
||||||
|
|
||||||
|
table = Table("personnage", "genre", "voix")
|
||||||
|
table.add_row("[narrateur]", "", cast.narrator_voice_id or "")
|
||||||
|
for ch in cast.characters:
|
||||||
|
table.add_row(ch.name, ch.gender or "?", ch.voice_id or "")
|
||||||
|
console.print(table)
|
||||||
|
|
||||||
|
|
||||||
|
@app.command()
|
||||||
|
def render(
|
||||||
|
slug: str,
|
||||||
|
chapter: int = typer.Argument(..., help="Index du chapitre a synthetiser."),
|
||||||
|
backend: str = typer.Option("kokoro", help="Moteur TTS: kokoro | qwen3."),
|
||||||
|
mono: bool = typer.Option(True, help="Mono-narrateur (sinon multi-voix via cast)."),
|
||||||
|
max_paragraphs: Optional[int] = typer.Option(None, help="Limiter (test rapide)."),
|
||||||
|
):
|
||||||
|
"""Synthetise un chapitre en MP3 dans output/<livre>/."""
|
||||||
|
from .pipeline.render import (
|
||||||
|
build_units_mono,
|
||||||
|
build_units_multi,
|
||||||
|
render_chapter_to_mp3,
|
||||||
|
)
|
||||||
|
from .tts.base import VoiceSpec
|
||||||
|
from .tts.factory import get_backend
|
||||||
|
|
||||||
|
book = load_book(slug)
|
||||||
|
ch = next((c for c in book.chapters if c.index == chapter), None)
|
||||||
|
if ch is None or not ch.text_file:
|
||||||
|
console.print(f"[red]Chapitre {chapter} introuvable ou non rendu.[/]")
|
||||||
|
raise typer.Exit(1)
|
||||||
|
|
||||||
|
ct = load_chapter_text(slug, ch)
|
||||||
|
if max_paragraphs:
|
||||||
|
ct.paragraphs = ct.paragraphs[:max_paragraphs]
|
||||||
|
tts = get_backend(backend)
|
||||||
|
pron = artifacts.load_pronunciation(slug)
|
||||||
|
|
||||||
|
if mono:
|
||||||
|
units = build_units_mono(ct, tts.default_voice())
|
||||||
|
else:
|
||||||
|
from .casting.voicebank import load_voicebank, voice_spec_for
|
||||||
|
from .pipeline.render import make_voice_resolver
|
||||||
|
|
||||||
|
analysis = artifacts.load_analysis(slug, chapter)
|
||||||
|
cast_data = artifacts.load_cast(slug)
|
||||||
|
vb = load_voicebank()
|
||||||
|
# Voix narrateur par defaut depuis la voicebank si disponible.
|
||||||
|
narrator_entry = vb.by_id(cast_data.narrator_voice_id) if cast_data.narrator_voice_id else None
|
||||||
|
default_voice = (voice_spec_for(narrator_entry, backend)
|
||||||
|
if narrator_entry else tts.default_voice())
|
||||||
|
resolver = make_voice_resolver(cast_data, vb, backend)
|
||||||
|
units = build_units_multi(analysis, resolver, default_voice)
|
||||||
|
|
||||||
|
with console.status(f"Synthese de {len(units)} unites ({backend})…"):
|
||||||
|
def _p(done, total):
|
||||||
|
console.print(f" unite {done}/{total}", end="\r")
|
||||||
|
track = (book.render_chapters.index(ch) + 1) if ch in book.render_chapters else None
|
||||||
|
mp3 = render_chapter_to_mp3(book, ch, units, tts, pron=pron, track=track, progress=_p)
|
||||||
|
console.print(f"\n[green]MP3:[/] {mp3}")
|
||||||
|
|
||||||
|
|
||||||
|
def _print_chapters(book) -> None:
|
||||||
|
table = Table(show_header=True, header_style="bold")
|
||||||
|
for col in ("idx", "kind", "render", "pov", "mots", "sortie", "titre"):
|
||||||
|
table.add_column(col)
|
||||||
|
for c in book.chapters:
|
||||||
|
table.add_row(
|
||||||
|
str(c.index), c.kind.value, "✓" if c.render else "·",
|
||||||
|
c.pov or "", str(c.word_count), c.output_name or "",
|
||||||
|
c.title)
|
||||||
|
console.print(table)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
app()
|
||||||
96
backend/inkflow/config.py
Normal file
96
backend/inkflow/config.py
Normal file
@@ -0,0 +1,96 @@
|
|||||||
|
"""Configuration centrale d'InkFlow.
|
||||||
|
|
||||||
|
Toutes les constantes (chemins, identifiants de modeles MLX, parametres par
|
||||||
|
defaut) sont regroupees ici pour rester facilement surchargeables via variables
|
||||||
|
d'environnement.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# --- Racines du projet -------------------------------------------------------
|
||||||
|
# config.py est dans backend/inkflow/, la racine projet est donc deux niveaux
|
||||||
|
# au-dessus de backend/.
|
||||||
|
BACKEND_DIR = Path(__file__).resolve().parents[1]
|
||||||
|
PROJECT_ROOT = BACKEND_DIR.parent
|
||||||
|
|
||||||
|
|
||||||
|
def _env_path(var: str, default: Path) -> Path:
|
||||||
|
return Path(os.environ.get(var, default)).expanduser().resolve()
|
||||||
|
|
||||||
|
|
||||||
|
# Donnees de travail (etat par livre : json, db, wav intermediaires)
|
||||||
|
DATA_DIR = _env_path("INKFLOW_DATA_DIR", PROJECT_ROOT / "data")
|
||||||
|
# Sortie finale (1 dossier par livre, 1 mp3 par chapitre)
|
||||||
|
OUTPUT_DIR = _env_path("INKFLOW_OUTPUT_DIR", PROJECT_ROOT / "output")
|
||||||
|
# Banque de voix de reference (clips + metadata.json)
|
||||||
|
VOICEBANK_DIR = _env_path("INKFLOW_VOICEBANK_DIR", PROJECT_ROOT / "voicebank")
|
||||||
|
# Echantillons fournis
|
||||||
|
SAMPLES_DIR = PROJECT_ROOT / "samples"
|
||||||
|
|
||||||
|
# --- Modeles MLX (HuggingFace mlx-community) ---------------------------------
|
||||||
|
# Analyse de texte : Gemma via mlx-lm.
|
||||||
|
GEMMA_MODEL = os.environ.get(
|
||||||
|
"INKFLOW_GEMMA_MODEL", "mlx-community/gemma-3-4b-it-4bit"
|
||||||
|
)
|
||||||
|
|
||||||
|
# TTS : Qwen3-TTS (rendu final, clonage) et Kokoro (preview rapide).
|
||||||
|
QWEN3_TTS_MODEL = os.environ.get(
|
||||||
|
"INKFLOW_QWEN3_MODEL", "mlx-community/Qwen3-TTS-12Hz-1.7B-Base-8bit"
|
||||||
|
)
|
||||||
|
KOKORO_MODEL = os.environ.get(
|
||||||
|
"INKFLOW_KOKORO_MODEL", "mlx-community/Kokoro-82M-bf16"
|
||||||
|
)
|
||||||
|
|
||||||
|
# --- Parametres TTS ----------------------------------------------------------
|
||||||
|
DEFAULT_LANGUAGE = os.environ.get("INKFLOW_LANGUAGE", "French")
|
||||||
|
# Code langue Kokoro (misaki) : 'f' = francais.
|
||||||
|
KOKORO_LANG_CODE = os.environ.get("INKFLOW_KOKORO_LANG", "f")
|
||||||
|
# Voix Kokoro par defaut pour les previews / mono-narrateur rapide.
|
||||||
|
KOKORO_DEFAULT_VOICE = os.environ.get("INKFLOW_KOKORO_VOICE", "ff_siwis")
|
||||||
|
# Voix Qwen3 par defaut (narrateur) si aucun clip de reference fourni.
|
||||||
|
QWEN3_DEFAULT_VOICE = os.environ.get("INKFLOW_QWEN3_VOICE", "Chelsie")
|
||||||
|
|
||||||
|
# Frequence d'echantillonnage cible pour la concatenation (Hz). Les backends
|
||||||
|
# renvoient leur propre sr ; postprocess reechantillonne au besoin.
|
||||||
|
TARGET_SAMPLE_RATE = int(os.environ.get("INKFLOW_SAMPLE_RATE", "24000"))
|
||||||
|
|
||||||
|
# Encodage mp3 final.
|
||||||
|
MP3_BITRATE = os.environ.get("INKFLOW_MP3_BITRATE", "128k")
|
||||||
|
# Cible de normalisation loudness (LUFS approx via pydub gain).
|
||||||
|
TARGET_DBFS = float(os.environ.get("INKFLOW_TARGET_DBFS", "-18.0"))
|
||||||
|
|
||||||
|
|
||||||
|
def book_data_dir(book_slug: str) -> Path:
|
||||||
|
"""Dossier de travail pour un livre (artefacts intermediaires)."""
|
||||||
|
return DATA_DIR / book_slug
|
||||||
|
|
||||||
|
|
||||||
|
def book_output_dir(book_title: str) -> Path:
|
||||||
|
"""Dossier de sortie final pour un livre (mp3 par chapitre)."""
|
||||||
|
return OUTPUT_DIR / book_title
|
||||||
|
|
||||||
|
|
||||||
|
def ensure_dirs() -> None:
|
||||||
|
for d in (DATA_DIR, OUTPUT_DIR, VOICEBANK_DIR):
|
||||||
|
d.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
|
||||||
|
def setup_espeak() -> None:
|
||||||
|
"""Localise libespeak-ng pour phonemizer (requis par Kokoro non-anglais).
|
||||||
|
|
||||||
|
phonemizer ne trouve pas toujours la lib installee via brew ; on pointe
|
||||||
|
explicitement PHONEMIZER_ESPEAK_LIBRARY si la variable n'est pas deja fixee.
|
||||||
|
"""
|
||||||
|
if os.environ.get("PHONEMIZER_ESPEAK_LIBRARY"):
|
||||||
|
return
|
||||||
|
candidates = [
|
||||||
|
"/opt/homebrew/lib/libespeak-ng.dylib",
|
||||||
|
"/usr/local/lib/libespeak-ng.dylib",
|
||||||
|
"/opt/homebrew/lib/libespeak-ng.1.dylib",
|
||||||
|
]
|
||||||
|
for path in candidates:
|
||||||
|
if os.path.exists(path):
|
||||||
|
os.environ["PHONEMIZER_ESPEAK_LIBRARY"] = path
|
||||||
|
return
|
||||||
0
backend/inkflow/epub/__init__.py
Normal file
0
backend/inkflow/epub/__init__.py
Normal file
267
backend/inkflow/epub/parser.py
Normal file
267
backend/inkflow/epub/parser.py
Normal file
@@ -0,0 +1,267 @@
|
|||||||
|
"""Parsing EPUB -> structure de livre normalisee.
|
||||||
|
|
||||||
|
Strategie :
|
||||||
|
- ebooklib lit l'archive (manifest + spine + ncx).
|
||||||
|
- L'ordre de lecture vient du spine.
|
||||||
|
- Les titres viennent de la table des matieres (ncx/nav), mappes par href.
|
||||||
|
- Le texte de chaque document est extrait via BeautifulSoup (paragraphes).
|
||||||
|
- On classe chaque item en front / chapter / back et on decide s'il faut le lire.
|
||||||
|
|
||||||
|
Sorties ecrites dans data/<slug>/ :
|
||||||
|
- book.json : metadonnees + liste des chapitres (modele Book)
|
||||||
|
- chapters/chNN.json : texte normalise par chapitre (modele ChapterText)
|
||||||
|
- cover.<ext> : couverture extraite (si presente)
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import re
|
||||||
|
import warnings
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
from urllib.parse import unquote, urldefrag
|
||||||
|
|
||||||
|
import ebooklib
|
||||||
|
from bs4 import BeautifulSoup
|
||||||
|
from ebooklib import epub
|
||||||
|
|
||||||
|
# Les xhtml d'epub declenchent un avertissement bs4 inoffensif ; on le tait.
|
||||||
|
try:
|
||||||
|
from bs4 import XMLParsedAsHTMLWarning
|
||||||
|
warnings.filterwarnings("ignore", category=XMLParsedAsHTMLWarning)
|
||||||
|
except ImportError: # pragma: no cover
|
||||||
|
pass
|
||||||
|
|
||||||
|
from ..config import book_data_dir
|
||||||
|
from ..models import Book, Chapter, ChapterKind, ChapterText
|
||||||
|
from ..util import safe_filename, slugify
|
||||||
|
|
||||||
|
# Un titre de chapitre commence par un numero, PROLOGUE ou EPILOGUE.
|
||||||
|
_CHAPTER_RE = re.compile(r"^\s*(\d+|prologue|[ée]pilogue)\b", re.IGNORECASE)
|
||||||
|
# Capture "<numero> - <POV>" ou juste "<numero>".
|
||||||
|
_TITLE_PARTS_RE = re.compile(r"^\s*([^-\n]+?)(?:\s*[-–—]\s*(.+))?\s*$")
|
||||||
|
|
||||||
|
# Seuil de mots pour qu'un element de back matter (remerciements...) soit lu.
|
||||||
|
_BACK_MATTER_MIN_WORDS = 40
|
||||||
|
|
||||||
|
|
||||||
|
def _build_toc_titles(book: epub.EpubBook) -> dict[str, str]:
|
||||||
|
"""Mappe href (sans fragment) -> titre, en aplatissant la toc ncx/nav."""
|
||||||
|
titles: dict[str, str] = {}
|
||||||
|
|
||||||
|
def walk(items) -> None:
|
||||||
|
for it in items:
|
||||||
|
if isinstance(it, tuple): # (Section, [children])
|
||||||
|
section, children = it
|
||||||
|
if isinstance(section, epub.Link):
|
||||||
|
_add(section)
|
||||||
|
walk(children)
|
||||||
|
elif isinstance(it, list):
|
||||||
|
walk(it)
|
||||||
|
elif isinstance(it, epub.Link):
|
||||||
|
_add(it)
|
||||||
|
|
||||||
|
def _add(link: epub.Link) -> None:
|
||||||
|
href = unquote(urldefrag(link.href)[0])
|
||||||
|
if href and href not in titles and link.title:
|
||||||
|
titles[href] = link.title.strip()
|
||||||
|
|
||||||
|
walk(book.toc)
|
||||||
|
return titles
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_paragraphs(html: bytes) -> list[str]:
|
||||||
|
"""Extrait les paragraphes lisibles d'un document xhtml."""
|
||||||
|
soup = BeautifulSoup(html, "lxml")
|
||||||
|
# Retire les elements non narratifs.
|
||||||
|
for tag in soup(["script", "style", "sup", "table"]):
|
||||||
|
tag.decompose()
|
||||||
|
|
||||||
|
paragraphs: list[str] = []
|
||||||
|
blocks = soup.find_all(["p", "h1", "h2", "h3", "h4", "blockquote", "li"])
|
||||||
|
if not blocks and soup.body:
|
||||||
|
blocks = [soup.body]
|
||||||
|
|
||||||
|
for block in blocks:
|
||||||
|
text = block.get_text(" ", strip=True)
|
||||||
|
text = re.sub(r"\s+", " ", text).strip()
|
||||||
|
if text:
|
||||||
|
paragraphs.append(text)
|
||||||
|
return paragraphs
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_title(title: str) -> tuple[Optional[str], Optional[str]]:
|
||||||
|
"""Decoupe un titre de chapitre en (numero, pov)."""
|
||||||
|
m = _TITLE_PARTS_RE.match(title)
|
||||||
|
if not m:
|
||||||
|
return None, None
|
||||||
|
number = (m.group(1) or "").strip() or None
|
||||||
|
pov = (m.group(2) or "").strip() or None
|
||||||
|
return number, pov
|
||||||
|
|
||||||
|
|
||||||
|
def _output_name(seq: int, kind: ChapterKind, number: Optional[str], title: str) -> str:
|
||||||
|
"""Nom de mp3 calque sur le format du sample (NN-<libelle>.mp3)."""
|
||||||
|
prefix = f"{seq:02d}"
|
||||||
|
label: str
|
||||||
|
if kind is ChapterKind.CHAPTER and number:
|
||||||
|
low = number.lower()
|
||||||
|
if low == "prologue":
|
||||||
|
label = "Prologue"
|
||||||
|
elif low in ("epilogue", "épilogue"):
|
||||||
|
label = "Épilogue"
|
||||||
|
elif number.isdigit():
|
||||||
|
label = f"Chapitre {int(number)}"
|
||||||
|
else:
|
||||||
|
label = number.capitalize()
|
||||||
|
else:
|
||||||
|
label = title
|
||||||
|
if label.isupper(): # titres tout-majuscule (ex "REMERCIEMENTS")
|
||||||
|
label = label.capitalize()
|
||||||
|
return safe_filename(f"{prefix}-{label}") + ".mp3"
|
||||||
|
|
||||||
|
|
||||||
|
def _classify(ordered: list[dict]) -> None:
|
||||||
|
"""Affecte kind/render a chaque item (mutation en place).
|
||||||
|
|
||||||
|
front = avant le premier chapitre numerote (couverture, page de titre...)
|
||||||
|
chapter = correspond au motif de titre de chapitre
|
||||||
|
back = apres le dernier chapitre (remerciements, glossaire...)
|
||||||
|
"""
|
||||||
|
chapter_idxs = [
|
||||||
|
i for i, it in enumerate(ordered)
|
||||||
|
if it["title"] and _CHAPTER_RE.match(it["title"])
|
||||||
|
]
|
||||||
|
first = chapter_idxs[0] if chapter_idxs else len(ordered)
|
||||||
|
last = chapter_idxs[-1] if chapter_idxs else -1
|
||||||
|
|
||||||
|
for i, it in enumerate(ordered):
|
||||||
|
is_chapter = bool(it["title"]) and bool(_CHAPTER_RE.match(it["title"]))
|
||||||
|
if is_chapter:
|
||||||
|
it["kind"] = ChapterKind.CHAPTER
|
||||||
|
it["render"] = it["word_count"] > 0
|
||||||
|
elif i < first:
|
||||||
|
it["kind"] = ChapterKind.FRONT
|
||||||
|
it["render"] = False
|
||||||
|
else: # i > last (back matter)
|
||||||
|
it["kind"] = ChapterKind.BACK
|
||||||
|
it["render"] = it["word_count"] >= _BACK_MATTER_MIN_WORDS
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_cover(book: epub.EpubBook, dest_dir: Path) -> Optional[str]:
|
||||||
|
cover_item = None
|
||||||
|
for item in book.get_items_of_type(ebooklib.ITEM_COVER):
|
||||||
|
cover_item = item
|
||||||
|
break
|
||||||
|
if cover_item is None: # fallback : item nomme "cover"
|
||||||
|
for item in book.get_items_of_type(ebooklib.ITEM_IMAGE):
|
||||||
|
if "cover" in item.get_name().lower():
|
||||||
|
cover_item = item
|
||||||
|
break
|
||||||
|
if cover_item is None:
|
||||||
|
return None
|
||||||
|
ext = Path(cover_item.get_name()).suffix or ".jpg"
|
||||||
|
dest = dest_dir / f"cover{ext}"
|
||||||
|
dest.write_bytes(cover_item.get_content())
|
||||||
|
return dest.name
|
||||||
|
|
||||||
|
|
||||||
|
def parse_epub(epub_path: str | Path, slug: Optional[str] = None) -> Book:
|
||||||
|
"""Parse un EPUB et ecrit book.json + chapters/chNN.json dans data/<slug>/."""
|
||||||
|
epub_path = Path(epub_path)
|
||||||
|
book_ml = epub.read_epub(str(epub_path), options={"ignore_ncx": False})
|
||||||
|
|
||||||
|
title = _meta(book_ml, "title") or epub_path.stem
|
||||||
|
author = _meta(book_ml, "creator")
|
||||||
|
description = _meta(book_ml, "description")
|
||||||
|
language = _meta(book_ml, "language") or "fr"
|
||||||
|
slug = slug or slugify(title)
|
||||||
|
|
||||||
|
data_dir = book_data_dir(slug)
|
||||||
|
chapters_dir = data_dir / "chapters"
|
||||||
|
chapters_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
toc_titles = _build_toc_titles(book_ml)
|
||||||
|
|
||||||
|
# Documents dans l'ordre du spine.
|
||||||
|
id_to_item = {it.get_id(): it for it in book_ml.get_items()}
|
||||||
|
ordered: list[dict] = []
|
||||||
|
for idref, _linear in book_ml.spine:
|
||||||
|
item = id_to_item.get(idref)
|
||||||
|
if item is None or item.get_type() != ebooklib.ITEM_DOCUMENT:
|
||||||
|
continue
|
||||||
|
href = unquote(item.get_name())
|
||||||
|
paragraphs = _extract_paragraphs(item.get_content())
|
||||||
|
title_txt = toc_titles.get(href, "")
|
||||||
|
ordered.append({
|
||||||
|
"item_id": idref,
|
||||||
|
"src": href,
|
||||||
|
"title": title_txt,
|
||||||
|
"paragraphs": paragraphs,
|
||||||
|
"word_count": sum(len(p.split()) for p in paragraphs),
|
||||||
|
})
|
||||||
|
|
||||||
|
_classify(ordered)
|
||||||
|
|
||||||
|
cover_file = _extract_cover(book_ml, data_dir)
|
||||||
|
|
||||||
|
chapters: list[Chapter] = []
|
||||||
|
seq = 0 # compteur de prefixe sur les seuls chapitres rendus
|
||||||
|
for index, it in enumerate(ordered):
|
||||||
|
number = pov = None
|
||||||
|
if it["kind"] is ChapterKind.CHAPTER:
|
||||||
|
number, pov = _parse_title(it["title"])
|
||||||
|
|
||||||
|
text_file = None
|
||||||
|
output_name = None
|
||||||
|
if it["render"]:
|
||||||
|
seq += 1
|
||||||
|
ct = ChapterText(index=index, title=it["title"] or it["src"],
|
||||||
|
paragraphs=it["paragraphs"])
|
||||||
|
text_file = f"chapters/ch{index:02d}.json"
|
||||||
|
(data_dir / text_file).write_text(
|
||||||
|
ct.model_dump_json(indent=2), encoding="utf-8")
|
||||||
|
output_name = _output_name(seq, it["kind"], number, it["title"] or "")
|
||||||
|
|
||||||
|
chapters.append(Chapter(
|
||||||
|
index=index,
|
||||||
|
item_id=it["item_id"],
|
||||||
|
src=it["src"],
|
||||||
|
title=it["title"] or it["src"],
|
||||||
|
kind=it["kind"],
|
||||||
|
render=it["render"],
|
||||||
|
number=number,
|
||||||
|
pov=pov,
|
||||||
|
word_count=it["word_count"],
|
||||||
|
text_file=text_file,
|
||||||
|
output_name=output_name,
|
||||||
|
))
|
||||||
|
|
||||||
|
book = Book(
|
||||||
|
slug=slug,
|
||||||
|
title=title,
|
||||||
|
author=author,
|
||||||
|
language=(language[:2] if language else "fr"),
|
||||||
|
description=description,
|
||||||
|
cover_file=cover_file,
|
||||||
|
chapters=chapters,
|
||||||
|
)
|
||||||
|
(data_dir / "book.json").write_text(
|
||||||
|
book.model_dump_json(indent=2), encoding="utf-8")
|
||||||
|
return book
|
||||||
|
|
||||||
|
|
||||||
|
def _meta(book: epub.EpubBook, name: str) -> Optional[str]:
|
||||||
|
values = book.get_metadata("DC", name)
|
||||||
|
if values:
|
||||||
|
return values[0][0]
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def load_book(slug: str) -> Book:
|
||||||
|
path = book_data_dir(slug) / "book.json"
|
||||||
|
return Book.model_validate_json(path.read_text(encoding="utf-8"))
|
||||||
|
|
||||||
|
|
||||||
|
def load_chapter_text(slug: str, chapter: Chapter) -> ChapterText:
|
||||||
|
path = book_data_dir(slug) / chapter.text_file
|
||||||
|
return ChapterText.model_validate_json(path.read_text(encoding="utf-8"))
|
||||||
176
backend/inkflow/models.py
Normal file
176
backend/inkflow/models.py
Normal file
@@ -0,0 +1,176 @@
|
|||||||
|
"""Schemas de donnees partages dans tout le pipeline (pydantic v2).
|
||||||
|
|
||||||
|
Ces modeles sont serialises en JSON sur disque (book.json, analysis/chNN.json,
|
||||||
|
cast.json, pronunciation.json) et constituent le contrat entre les etapes du
|
||||||
|
pipeline. Chaque etape lit l'artefact de la precedente et ecrit le sien.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from enum import Enum
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from pydantic import BaseModel, Field
|
||||||
|
|
||||||
|
|
||||||
|
class ChapterKind(str, Enum):
|
||||||
|
FRONT = "front" # couverture, page de titre, mentions editeur (non lu)
|
||||||
|
CHAPTER = "chapter" # prologue, chapitres numerotes, epilogue (lu)
|
||||||
|
BACK = "back" # remerciements, glossaire... (lu si texte significatif)
|
||||||
|
|
||||||
|
|
||||||
|
class Chapter(BaseModel):
|
||||||
|
index: int # ordre dans le spine (0-based)
|
||||||
|
item_id: str # idref du manifest opf
|
||||||
|
src: str # chemin interne xhtml
|
||||||
|
title: str # titre toc brut, ex "1 - ELVI"
|
||||||
|
kind: ChapterKind
|
||||||
|
render: bool # doit-on synthetiser l'audio ?
|
||||||
|
number: Optional[str] = None # "1", "PROLOGUE", "EPILOGUE"...
|
||||||
|
pov: Optional[str] = None # personnage point de vue, ex "ELVI"
|
||||||
|
word_count: int = 0
|
||||||
|
text_file: Optional[str] = None # chemin relatif du json de texte (chapters/chNN.json)
|
||||||
|
output_name: Optional[str] = None # nom du mp3 final, ex "02-Chapitre 1.mp3"
|
||||||
|
|
||||||
|
|
||||||
|
class Book(BaseModel):
|
||||||
|
slug: str # identifiant interne (dossier data)
|
||||||
|
title: str
|
||||||
|
author: Optional[str] = None
|
||||||
|
language: str = "fr"
|
||||||
|
description: Optional[str] = None
|
||||||
|
cover_file: Optional[str] = None # chemin du cover extrait dans data/<slug>/
|
||||||
|
chapters: list[Chapter] = Field(default_factory=list)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def render_chapters(self) -> list[Chapter]:
|
||||||
|
return [c for c in self.chapters if c.render]
|
||||||
|
|
||||||
|
|
||||||
|
class ChapterText(BaseModel):
|
||||||
|
"""Texte brut normalise d'un chapitre (sortie du parser)."""
|
||||||
|
index: int
|
||||||
|
title: str
|
||||||
|
paragraphs: list[str] = Field(default_factory=list)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def word_count(self) -> int:
|
||||||
|
return sum(len(p.split()) for p in self.paragraphs)
|
||||||
|
|
||||||
|
|
||||||
|
# --- Analyse (etape Gemma) ---------------------------------------------------
|
||||||
|
|
||||||
|
class SegmentType(str, Enum):
|
||||||
|
NARRATION = "narration"
|
||||||
|
DIALOGUE = "dialogue"
|
||||||
|
|
||||||
|
|
||||||
|
class Incise(BaseModel):
|
||||||
|
"""Borne d'une incise de narration inseree dans une replique de dialogue.
|
||||||
|
|
||||||
|
Offsets (caracteres) dans `Segment.text` : la sous-chaine `text[start:end]`
|
||||||
|
est de la narration (ex: "dit-il", "lanca Drummer") a porter par la voix du
|
||||||
|
narrateur au rendu, sans fragmenter la replique persistee.
|
||||||
|
"""
|
||||||
|
start: int # offset inclus
|
||||||
|
end: int # offset exclu
|
||||||
|
|
||||||
|
|
||||||
|
class Segment(BaseModel):
|
||||||
|
"""Unite de synthese : un bout de texte attribue a un locuteur."""
|
||||||
|
type: SegmentType
|
||||||
|
text: str
|
||||||
|
speaker: str = "narrateur" # "narrateur" ou nom de personnage
|
||||||
|
glued_to_prev: bool = False # sous-segment issu du meme paragraphe (incise)
|
||||||
|
# -> gap audio reduit avec le segment precedent
|
||||||
|
incises: list[Incise] = Field(default_factory=list) # spans narrateur DANS text
|
||||||
|
|
||||||
|
|
||||||
|
class ChapterAnalysis(BaseModel):
|
||||||
|
index: int
|
||||||
|
title: str
|
||||||
|
segments: list[Segment] = Field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
class Character(BaseModel):
|
||||||
|
name: str # nom canonique
|
||||||
|
aliases: list[str] = Field(default_factory=list)
|
||||||
|
gender: Optional[str] = None # "male" | "female" | "unknown"
|
||||||
|
age: Optional[str] = None # "child" | "young" | "adult" | "old"
|
||||||
|
description: Optional[str] = None
|
||||||
|
voice_id: Optional[str] = None # id dans la voicebank (assigne au casting)
|
||||||
|
|
||||||
|
|
||||||
|
class Cast(BaseModel):
|
||||||
|
narrator_voice_id: Optional[str] = None
|
||||||
|
characters: list[Character] = Field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
class VoiceEntry(BaseModel):
|
||||||
|
"""Une voix de la banque, agnostique du moteur.
|
||||||
|
|
||||||
|
`kokoro_voice` est l'identite (rendu Kokoro direct + clip de reference) ;
|
||||||
|
`ref_audio`/`ref_text` servent au clonage Qwen3 (rendu final).
|
||||||
|
"""
|
||||||
|
id: str # ex "fr_f_siwis"
|
||||||
|
kokoro_voice: str # ex "ff_siwis"
|
||||||
|
gender: str = "unknown" # male | female | unknown
|
||||||
|
age: str = "adult" # child | young | adult | old
|
||||||
|
lang: str = "fr"
|
||||||
|
label: Optional[str] = None # libelle lisible
|
||||||
|
ref_audio: Optional[str] = None # chemin du clip (relatif a voicebank/)
|
||||||
|
ref_text: Optional[str] = None # transcription du clip
|
||||||
|
|
||||||
|
|
||||||
|
class Voicebank(BaseModel):
|
||||||
|
entries: list[VoiceEntry] = Field(default_factory=list)
|
||||||
|
|
||||||
|
def by_id(self, voice_id: str) -> Optional[VoiceEntry]:
|
||||||
|
return next((e for e in self.entries if e.id == voice_id), None)
|
||||||
|
|
||||||
|
def by_gender(self, gender: str) -> list[VoiceEntry]:
|
||||||
|
return [e for e in self.entries if e.gender == gender]
|
||||||
|
|
||||||
|
|
||||||
|
class PronunciationEntry(BaseModel):
|
||||||
|
term: str # graphie d'origine, ex "Tiamat"
|
||||||
|
replacement: str # graphie phonetique guidee, ex "Tia-mat"
|
||||||
|
note: Optional[str] = None
|
||||||
|
enabled: bool = True
|
||||||
|
|
||||||
|
|
||||||
|
class Pronunciation(BaseModel):
|
||||||
|
entries: list[PronunciationEntry] = Field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
# --- Etat du projet (orchestration / UI) ------------------------------------
|
||||||
|
|
||||||
|
class StageStatus(str, Enum):
|
||||||
|
PENDING = "pending"
|
||||||
|
RUNNING = "running"
|
||||||
|
DONE = "done"
|
||||||
|
ERROR = "error"
|
||||||
|
|
||||||
|
|
||||||
|
class ChapterRenderState(BaseModel):
|
||||||
|
index: int
|
||||||
|
status: StageStatus = StageStatus.PENDING
|
||||||
|
progress: float = 0.0 # 0..1
|
||||||
|
mp3: Optional[str] = None # nom du fichier de sortie
|
||||||
|
backend: Optional[str] = None
|
||||||
|
error: Optional[str] = None
|
||||||
|
|
||||||
|
|
||||||
|
class ProjectState(BaseModel):
|
||||||
|
"""Etat persistant d'un livre, pilote par l'orchestrateur et lu par l'UI."""
|
||||||
|
slug: str
|
||||||
|
title: str
|
||||||
|
stages: dict[str, StageStatus] = Field(default_factory=dict) # parse/analyze/cast/pronounce
|
||||||
|
analyzed_chapters: list[int] = Field(default_factory=list)
|
||||||
|
render: dict[int, ChapterRenderState] = Field(default_factory=dict)
|
||||||
|
# Job courant (pour l'affichage temps reel).
|
||||||
|
active_stage: Optional[str] = None
|
||||||
|
active_detail: Optional[str] = None
|
||||||
|
active_progress: float = 0.0
|
||||||
|
|
||||||
|
def stage(self, name: str) -> StageStatus:
|
||||||
|
return self.stages.get(name, StageStatus.PENDING)
|
||||||
0
backend/inkflow/pipeline/__init__.py
Normal file
0
backend/inkflow/pipeline/__init__.py
Normal file
364
backend/inkflow/pipeline/orchestrator.py
Normal file
364
backend/inkflow/pipeline/orchestrator.py
Normal file
@@ -0,0 +1,364 @@
|
|||||||
|
"""Orchestrateur : execute les etapes du pipeline en tache de fond, piste l'etat
|
||||||
|
et diffuse l'etat complet a l'UI a chaque changement.
|
||||||
|
|
||||||
|
- Un seul worker thread execute les jobs en serie (un Mac = une charge MLX a la
|
||||||
|
fois). Les jobs sont enfiles et rendent la main immediatement a l'API.
|
||||||
|
- L'etat (ProjectState) est persiste dans data/<slug>/state.json -> reprenable.
|
||||||
|
- La diffusion passe par un `broadcaster` injecte par la couche API (pour rester
|
||||||
|
independant de FastAPI). Il recoit (slug, dict_etat).
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import queue
|
||||||
|
import threading
|
||||||
|
import traceback
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Callable, Optional
|
||||||
|
|
||||||
|
from ..config import book_data_dir, book_output_dir
|
||||||
|
from ..epub.parser import load_book, load_chapter_text
|
||||||
|
from ..models import ChapterRenderState, ProjectState, StageStatus
|
||||||
|
from ..store import artifacts
|
||||||
|
|
||||||
|
Broadcaster = Callable[[str, dict], None]
|
||||||
|
|
||||||
|
|
||||||
|
def state_path(slug: str) -> Path:
|
||||||
|
return book_data_dir(slug) / "state.json"
|
||||||
|
|
||||||
|
|
||||||
|
def load_state(slug: str) -> ProjectState:
|
||||||
|
path = state_path(slug)
|
||||||
|
if path.exists():
|
||||||
|
state = ProjectState.model_validate_json(path.read_text(encoding="utf-8"))
|
||||||
|
else:
|
||||||
|
book = load_book(slug)
|
||||||
|
state = ProjectState(slug=slug, title=book.title,
|
||||||
|
stages={"parse": StageStatus.DONE})
|
||||||
|
return _reconcile(slug, state)
|
||||||
|
|
||||||
|
|
||||||
|
def _reconcile(slug: str, state: ProjectState) -> ProjectState:
|
||||||
|
"""Aligne l'etat sur les artefacts presents sur disque (reprise robuste).
|
||||||
|
|
||||||
|
Permet a l'UI de refleter ce qui a deja ete fait, meme via la CLI ou apres
|
||||||
|
un redemarrage, sans rejouer les etapes.
|
||||||
|
"""
|
||||||
|
book = load_book(slug)
|
||||||
|
state.stages.setdefault("parse", StageStatus.DONE)
|
||||||
|
|
||||||
|
# Analyse : chapitres possedant un artefact d'analyse.
|
||||||
|
analyzed = [c.index for c in book.render_chapters
|
||||||
|
if artifacts.analysis_path(slug, c.index).exists()]
|
||||||
|
if analyzed:
|
||||||
|
for idx in analyzed:
|
||||||
|
if idx not in state.analyzed_chapters:
|
||||||
|
state.analyzed_chapters.append(idx)
|
||||||
|
if state.stage("analyze") == StageStatus.PENDING:
|
||||||
|
state.stages["analyze"] = (
|
||||||
|
StageStatus.DONE if len(analyzed) == len(book.render_chapters)
|
||||||
|
else StageStatus.RUNNING)
|
||||||
|
|
||||||
|
# Casting : au moins une voix attribuee.
|
||||||
|
cast = artifacts.load_cast(slug)
|
||||||
|
if cast.narrator_voice_id or any(c.voice_id for c in cast.characters):
|
||||||
|
state.stages.setdefault("cast", StageStatus.DONE)
|
||||||
|
|
||||||
|
# Prononciation : au moins une entree.
|
||||||
|
if artifacts.load_pronunciation(slug).entries:
|
||||||
|
state.stages.setdefault("pronounce", StageStatus.DONE)
|
||||||
|
|
||||||
|
# Rendu : mp3 presents en sortie.
|
||||||
|
out_dir = book_output_dir(book.title)
|
||||||
|
for ch in book.render_chapters:
|
||||||
|
existing = state.render.get(ch.index)
|
||||||
|
if existing and existing.mp3:
|
||||||
|
continue
|
||||||
|
if ch.output_name and (out_dir / ch.output_name).exists():
|
||||||
|
state.render[ch.index] = ChapterRenderState(
|
||||||
|
index=ch.index, status=StageStatus.DONE, progress=1.0,
|
||||||
|
mp3=ch.output_name)
|
||||||
|
return state
|
||||||
|
|
||||||
|
|
||||||
|
class Orchestrator:
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self._q: "queue.Queue[tuple[str, Callable[[], None]]]" = queue.Queue()
|
||||||
|
self._worker: Optional[threading.Thread] = None
|
||||||
|
self._broadcaster: Optional[Broadcaster] = None
|
||||||
|
self._lock = threading.Lock()
|
||||||
|
self.busy_slug: Optional[str] = None
|
||||||
|
|
||||||
|
# --- infra ---------------------------------------------------------------
|
||||||
|
def set_broadcaster(self, fn: Broadcaster) -> None:
|
||||||
|
self._broadcaster = fn
|
||||||
|
|
||||||
|
def _ensure_worker(self) -> None:
|
||||||
|
if self._worker is None or not self._worker.is_alive():
|
||||||
|
self._worker = threading.Thread(target=self._loop, daemon=True)
|
||||||
|
self._worker.start()
|
||||||
|
|
||||||
|
def _loop(self) -> None:
|
||||||
|
while True:
|
||||||
|
slug, job = self._q.get()
|
||||||
|
self.busy_slug = slug
|
||||||
|
try:
|
||||||
|
job()
|
||||||
|
except Exception: # noqa: BLE001
|
||||||
|
traceback.print_exc()
|
||||||
|
finally:
|
||||||
|
self.busy_slug = None
|
||||||
|
self._q.task_done()
|
||||||
|
|
||||||
|
def _save_and_emit(self, state: ProjectState) -> None:
|
||||||
|
path = state_path(state.slug)
|
||||||
|
path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
path.write_text(state.model_dump_json(indent=2), encoding="utf-8")
|
||||||
|
if self._broadcaster:
|
||||||
|
self._broadcaster(state.slug, state.model_dump(mode="json"))
|
||||||
|
|
||||||
|
def enqueue(self, slug: str, job: Callable[[], None]) -> None:
|
||||||
|
self._ensure_worker()
|
||||||
|
self._q.put((slug, job))
|
||||||
|
|
||||||
|
# --- etapes --------------------------------------------------------------
|
||||||
|
def run_analyze(self, slug: str, chapter_indexes: Optional[list[int]] = None) -> None:
|
||||||
|
def job() -> None:
|
||||||
|
from ..analysis.gemma import Gemma
|
||||||
|
from ..analysis.segmenter import analyze_chapter
|
||||||
|
from ..models import Cast
|
||||||
|
from ..settings import get_settings
|
||||||
|
|
||||||
|
state = load_state(slug)
|
||||||
|
book = load_book(slug)
|
||||||
|
targets = [c for c in book.render_chapters
|
||||||
|
if chapter_indexes is None or c.index in chapter_indexes]
|
||||||
|
state.stages["analyze"] = StageStatus.RUNNING
|
||||||
|
state.active_stage = "analyze"
|
||||||
|
self._save_and_emit(state)
|
||||||
|
|
||||||
|
gemma = Gemma()
|
||||||
|
dedup_gemma = gemma if get_settings().dedup_use_gemma else None
|
||||||
|
cast = artifacts.load_cast(slug)
|
||||||
|
chars = list(cast.characters)
|
||||||
|
total = len(targets)
|
||||||
|
for i, ch in enumerate(targets):
|
||||||
|
state.active_detail = f"Analyse {ch.title}"
|
||||||
|
state.active_progress = i / max(total, 1)
|
||||||
|
self._save_and_emit(state)
|
||||||
|
ct = load_chapter_text(slug, ch)
|
||||||
|
try:
|
||||||
|
# La dedup est faite dans analyze_chapter : `chars` recoit le
|
||||||
|
# cast cumule reconcilie.
|
||||||
|
analysis, chars = analyze_chapter(
|
||||||
|
ch, ct, gemma, book_chars=chars, dedup_gemma=dedup_gemma)
|
||||||
|
except Exception: # noqa: BLE001 — chapitre ignore, on continue
|
||||||
|
traceback.print_exc()
|
||||||
|
continue
|
||||||
|
artifacts.save_analysis(slug, analysis)
|
||||||
|
if ch.index not in state.analyzed_chapters:
|
||||||
|
state.analyzed_chapters.append(ch.index)
|
||||||
|
self._save_and_emit(state)
|
||||||
|
|
||||||
|
artifacts.save_cast(slug, Cast(
|
||||||
|
narrator_voice_id=cast.narrator_voice_id, characters=chars))
|
||||||
|
state.stages["analyze"] = StageStatus.DONE
|
||||||
|
self._finish(state)
|
||||||
|
self.enqueue(slug, job)
|
||||||
|
|
||||||
|
def run_cast(self, slug: str) -> None:
|
||||||
|
def job() -> None:
|
||||||
|
from ..casting.assign import assign_voices
|
||||||
|
from ..casting.voicebank import build_voicebank, load_voicebank
|
||||||
|
|
||||||
|
state = load_state(slug)
|
||||||
|
state.stages["cast"] = StageStatus.RUNNING
|
||||||
|
state.active_stage = "cast"
|
||||||
|
state.active_detail = "Preparation de la voicebank"
|
||||||
|
self._save_and_emit(state)
|
||||||
|
|
||||||
|
vb = load_voicebank()
|
||||||
|
if not vb.entries or not any(e.ref_audio for e in vb.entries):
|
||||||
|
vb = build_voicebank()
|
||||||
|
cast = artifacts.load_cast(slug)
|
||||||
|
cast = assign_voices(cast.characters, vb,
|
||||||
|
narrator_voice_id=cast.narrator_voice_id)
|
||||||
|
artifacts.save_cast(slug, cast)
|
||||||
|
state.stages["cast"] = StageStatus.DONE
|
||||||
|
self._finish(state)
|
||||||
|
self.enqueue(slug, job)
|
||||||
|
|
||||||
|
def run_cast_analyze(self, slug: str, chapter_indexes: Optional[list[int]] = None) -> None:
|
||||||
|
"""(Re)extrait les personnages d'un/des chapitre(s) et les reconcilie.
|
||||||
|
|
||||||
|
Plus leger que `run_analyze` : ne re-segmente pas (les artefacts d'analyse
|
||||||
|
existants restent intacts). Sert le casting "a l'echelle d'un chapitre"
|
||||||
|
tout en maintenant la coherence du livre (deduplication).
|
||||||
|
"""
|
||||||
|
def job() -> None:
|
||||||
|
from ..analysis.gemma import Gemma
|
||||||
|
from ..analysis.segmenter import extract_characters
|
||||||
|
from ..casting.dedup import reconcile_characters
|
||||||
|
from ..models import Cast
|
||||||
|
from ..settings import get_settings
|
||||||
|
|
||||||
|
state = load_state(slug)
|
||||||
|
book = load_book(slug)
|
||||||
|
targets = [c for c in book.render_chapters
|
||||||
|
if chapter_indexes is None or c.index in chapter_indexes]
|
||||||
|
state.active_stage = "cast"
|
||||||
|
self._save_and_emit(state)
|
||||||
|
|
||||||
|
gemma = Gemma()
|
||||||
|
dedup_gemma = gemma if get_settings().dedup_use_gemma else None
|
||||||
|
cast = artifacts.load_cast(slug)
|
||||||
|
chars = list(cast.characters)
|
||||||
|
total = len(targets)
|
||||||
|
for i, ch in enumerate(targets):
|
||||||
|
state.active_detail = f"Casting — {ch.title}"
|
||||||
|
state.active_progress = i / max(total, 1)
|
||||||
|
self._save_and_emit(state)
|
||||||
|
ct = load_chapter_text(slug, ch)
|
||||||
|
try:
|
||||||
|
found = extract_characters("\n".join(ct.paragraphs), gemma)
|
||||||
|
speakers: list[str] = []
|
||||||
|
if artifacts.analysis_path(slug, ch.index).exists():
|
||||||
|
analysis = artifacts.load_analysis(slug, ch.index)
|
||||||
|
speakers = [s.speaker for s in analysis.segments]
|
||||||
|
chars, _ = reconcile_characters(
|
||||||
|
chars, found, dedup_gemma, speaker_names=speakers)
|
||||||
|
except Exception: # noqa: BLE001 — chapitre ignore, on continue
|
||||||
|
traceback.print_exc()
|
||||||
|
continue
|
||||||
|
artifacts.save_cast(slug, Cast(
|
||||||
|
narrator_voice_id=cast.narrator_voice_id, characters=chars))
|
||||||
|
self._save_and_emit(state)
|
||||||
|
self._finish(state)
|
||||||
|
self.enqueue(slug, job)
|
||||||
|
|
||||||
|
def run_dedup_cast(self, slug: str) -> None:
|
||||||
|
"""Replie les doublons d'un casting deja constitue (Holden/James Holden...)."""
|
||||||
|
def job() -> None:
|
||||||
|
from ..analysis.gemma import Gemma
|
||||||
|
from ..casting.dedup import dedup_cast
|
||||||
|
from ..models import Cast
|
||||||
|
from ..settings import get_settings
|
||||||
|
|
||||||
|
state = load_state(slug)
|
||||||
|
state.active_stage = "cast"
|
||||||
|
state.active_detail = "Deduplication du casting"
|
||||||
|
self._save_and_emit(state)
|
||||||
|
|
||||||
|
cast = artifacts.load_cast(slug)
|
||||||
|
gemma = Gemma() if get_settings().dedup_use_gemma else None
|
||||||
|
chars = dedup_cast(cast.characters, gemma)
|
||||||
|
artifacts.save_cast(slug, Cast(
|
||||||
|
narrator_voice_id=cast.narrator_voice_id, characters=chars))
|
||||||
|
self._finish(state)
|
||||||
|
self.enqueue(slug, job)
|
||||||
|
|
||||||
|
def run_pronounce(self, slug: str) -> None:
|
||||||
|
def job() -> None:
|
||||||
|
from ..analysis.gemma import Gemma
|
||||||
|
from ..analysis.pronunciation import (
|
||||||
|
merge_pronunciations,
|
||||||
|
propose_pronunciations,
|
||||||
|
)
|
||||||
|
|
||||||
|
state = load_state(slug)
|
||||||
|
book = load_book(slug)
|
||||||
|
state.stages["pronounce"] = StageStatus.RUNNING
|
||||||
|
state.active_stage = "pronounce"
|
||||||
|
self._save_and_emit(state)
|
||||||
|
|
||||||
|
gemma = Gemma()
|
||||||
|
pron = artifacts.load_pronunciation(slug)
|
||||||
|
targets = book.render_chapters[:3] # echantillon de chapitres
|
||||||
|
for i, ch in enumerate(targets):
|
||||||
|
state.active_detail = f"Mots a risque — {ch.title}"
|
||||||
|
state.active_progress = i / max(len(targets), 1)
|
||||||
|
self._save_and_emit(state)
|
||||||
|
ct = load_chapter_text(slug, ch)
|
||||||
|
pron = merge_pronunciations(
|
||||||
|
pron, propose_pronunciations("\n".join(ct.paragraphs), gemma))
|
||||||
|
artifacts.save_pronunciation(slug, pron)
|
||||||
|
state.stages["pronounce"] = StageStatus.DONE
|
||||||
|
self._finish(state)
|
||||||
|
self.enqueue(slug, job)
|
||||||
|
|
||||||
|
def run_render(self, slug: str, chapter_indexes: list[int],
|
||||||
|
backend: Optional[str] = None, mono: bool = False) -> None:
|
||||||
|
from ..settings import get_settings
|
||||||
|
backend = backend or get_settings().default_backend
|
||||||
|
|
||||||
|
def job() -> None:
|
||||||
|
from ..casting.voicebank import load_voicebank, voice_spec_for
|
||||||
|
from ..pipeline.render import (
|
||||||
|
build_units_mono,
|
||||||
|
build_units_multi,
|
||||||
|
make_voice_resolver,
|
||||||
|
render_chapter_to_mp3,
|
||||||
|
)
|
||||||
|
from ..tts.factory import get_backend
|
||||||
|
|
||||||
|
state = load_state(slug)
|
||||||
|
book = load_book(slug)
|
||||||
|
state.stages["render"] = StageStatus.RUNNING
|
||||||
|
state.active_stage = "render"
|
||||||
|
self._save_and_emit(state)
|
||||||
|
|
||||||
|
tts = get_backend(backend)
|
||||||
|
pron = artifacts.load_pronunciation(slug)
|
||||||
|
cast = artifacts.load_cast(slug)
|
||||||
|
vb = load_voicebank()
|
||||||
|
render_list = [c for c in book.render_chapters if c.index in chapter_indexes]
|
||||||
|
|
||||||
|
for ch in render_list:
|
||||||
|
rs = state.render.get(ch.index) or ChapterRenderState(index=ch.index)
|
||||||
|
rs.status = StageStatus.RUNNING
|
||||||
|
rs.progress = 0.0
|
||||||
|
rs.backend = backend
|
||||||
|
state.render[ch.index] = rs
|
||||||
|
state.active_detail = f"Synthese — {ch.title}"
|
||||||
|
self._save_and_emit(state)
|
||||||
|
try:
|
||||||
|
ct = load_chapter_text(slug, ch)
|
||||||
|
if mono or ch.index not in state.analyzed_chapters:
|
||||||
|
units = build_units_mono(ct, tts.default_voice())
|
||||||
|
else:
|
||||||
|
analysis = artifacts.load_analysis(slug, ch.index)
|
||||||
|
narr = vb.by_id(cast.narrator_voice_id) if cast.narrator_voice_id else None
|
||||||
|
default_voice = (voice_spec_for(narr, backend)
|
||||||
|
if narr else tts.default_voice())
|
||||||
|
resolver = make_voice_resolver(cast, vb, backend)
|
||||||
|
units = build_units_multi(analysis, resolver, default_voice)
|
||||||
|
|
||||||
|
def _p(done: int, total: int, _rs=rs, _state=state) -> None:
|
||||||
|
_rs.progress = done / max(total, 1)
|
||||||
|
_state.active_progress = _rs.progress
|
||||||
|
self._save_and_emit(_state)
|
||||||
|
|
||||||
|
track = book.render_chapters.index(ch) + 1
|
||||||
|
mp3 = render_chapter_to_mp3(book, ch, units, tts, pron=pron,
|
||||||
|
track=track, progress=_p)
|
||||||
|
rs.status = StageStatus.DONE
|
||||||
|
rs.progress = 1.0
|
||||||
|
rs.mp3 = mp3.name
|
||||||
|
except Exception as exc: # noqa: BLE001
|
||||||
|
rs.status = StageStatus.ERROR
|
||||||
|
rs.error = str(exc)
|
||||||
|
self._save_and_emit(state)
|
||||||
|
|
||||||
|
state.stages["render"] = StageStatus.DONE
|
||||||
|
self._finish(state)
|
||||||
|
self.enqueue(slug, job)
|
||||||
|
|
||||||
|
def _finish(self, state: ProjectState) -> None:
|
||||||
|
state.active_stage = None
|
||||||
|
state.active_detail = None
|
||||||
|
state.active_progress = 0.0
|
||||||
|
self._save_and_emit(state)
|
||||||
|
|
||||||
|
|
||||||
|
# Singleton partage par l'API.
|
||||||
|
orchestrator = Orchestrator()
|
||||||
158
backend/inkflow/pipeline/render.py
Normal file
158
backend/inkflow/pipeline/render.py
Normal file
@@ -0,0 +1,158 @@
|
|||||||
|
"""Rendu audio d'un chapitre : (segments + voix) -> WAV -> MP3.
|
||||||
|
|
||||||
|
Une `RenderUnit` = un bout de texte + la voix a employer. On construit la liste
|
||||||
|
d'unites (mono-narrateur ou multi-voix selon le casting), on synthetise chacune,
|
||||||
|
on concatene avec des silences, on normalise puis on encode en MP3.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Callable, Optional
|
||||||
|
|
||||||
|
from ..analysis.pronunciation import apply_pronunciation
|
||||||
|
from ..audio.postprocess import concat_segments, encode_mp3, normalize_loudness, write_wav
|
||||||
|
from ..config import book_data_dir, book_output_dir
|
||||||
|
from ..models import (
|
||||||
|
Book,
|
||||||
|
Chapter,
|
||||||
|
ChapterAnalysis,
|
||||||
|
ChapterText,
|
||||||
|
Pronunciation,
|
||||||
|
SegmentType,
|
||||||
|
)
|
||||||
|
from ..tts.base import TTSBackend, VoiceSpec
|
||||||
|
|
||||||
|
# Resout un nom de locuteur en une voix concrete.
|
||||||
|
VoiceResolver = Callable[[str], VoiceSpec]
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class RenderUnit:
|
||||||
|
text: str
|
||||||
|
voice: VoiceSpec
|
||||||
|
speaker: str = "narrateur"
|
||||||
|
glued_to_prev: bool = False # incise -> gap reduit avec l'unite precedente
|
||||||
|
|
||||||
|
|
||||||
|
def build_units_mono(ct: ChapterText, narrator: VoiceSpec) -> list[RenderUnit]:
|
||||||
|
"""Mono-narrateur : chaque paragraphe est lu par la voix du narrateur."""
|
||||||
|
return [RenderUnit(text=p, voice=narrator) for p in ct.paragraphs if p.strip()]
|
||||||
|
|
||||||
|
|
||||||
|
def make_voice_resolver(cast, voicebank, engine: str) -> VoiceResolver:
|
||||||
|
"""Construit un resolver locuteur -> VoiceSpec via le casting + la voicebank.
|
||||||
|
|
||||||
|
Replie sur la voix du narrateur si le locuteur n'a pas de voix attribuee.
|
||||||
|
"""
|
||||||
|
from ..casting.assign import resolve_speaker_voice
|
||||||
|
from ..casting.voicebank import voice_spec_for
|
||||||
|
|
||||||
|
def resolve(speaker: str):
|
||||||
|
vid = resolve_speaker_voice(speaker, cast, voicebank)
|
||||||
|
if vid is None:
|
||||||
|
vid = cast.narrator_voice_id
|
||||||
|
entry = voicebank.by_id(vid) if vid else None
|
||||||
|
if entry is None:
|
||||||
|
return None # le backend utilisera sa voix par defaut
|
||||||
|
return voice_spec_for(entry, engine)
|
||||||
|
|
||||||
|
return resolve
|
||||||
|
|
||||||
|
|
||||||
|
def build_units_multi(
|
||||||
|
analysis: ChapterAnalysis,
|
||||||
|
resolve: VoiceResolver,
|
||||||
|
default_voice: "VoiceSpec",
|
||||||
|
) -> list[RenderUnit]:
|
||||||
|
"""Multi-voix : narration -> narrateur, dialogue -> voix du personnage.
|
||||||
|
|
||||||
|
Les incises annotees sur une replique (bornes dans le texte) sont detachees
|
||||||
|
ici, au dernier moment : la sous-chaine d'incise est portee par la voix du
|
||||||
|
narrateur (`glued_to_prev` pour reduire le silence), le reste par la voix du
|
||||||
|
personnage. Les repliques sans incise sont rendues entieres.
|
||||||
|
"""
|
||||||
|
from ..analysis.segmenter import iter_incise_pieces
|
||||||
|
|
||||||
|
narrator = resolve("narrateur") or default_voice
|
||||||
|
units: list[RenderUnit] = []
|
||||||
|
for seg in analysis.segments:
|
||||||
|
if not seg.text.strip():
|
||||||
|
continue
|
||||||
|
if seg.type is SegmentType.NARRATION:
|
||||||
|
units.append(RenderUnit(text=seg.text, voice=narrator,
|
||||||
|
speaker="narrateur",
|
||||||
|
glued_to_prev=seg.glued_to_prev))
|
||||||
|
continue
|
||||||
|
|
||||||
|
char_voice = resolve(seg.speaker) or default_voice
|
||||||
|
if not seg.incises:
|
||||||
|
units.append(RenderUnit(text=seg.text, voice=char_voice,
|
||||||
|
speaker=seg.speaker,
|
||||||
|
glued_to_prev=seg.glued_to_prev))
|
||||||
|
continue
|
||||||
|
|
||||||
|
for k, (is_incise, piece) in enumerate(
|
||||||
|
iter_incise_pieces(seg.text, seg.incises)):
|
||||||
|
glued = seg.glued_to_prev if k == 0 else True
|
||||||
|
if is_incise:
|
||||||
|
units.append(RenderUnit(text=piece, voice=narrator,
|
||||||
|
speaker="narrateur", glued_to_prev=glued))
|
||||||
|
else:
|
||||||
|
units.append(RenderUnit(text=piece, voice=char_voice,
|
||||||
|
speaker=seg.speaker, glued_to_prev=glued))
|
||||||
|
return units
|
||||||
|
|
||||||
|
|
||||||
|
def render_units(
|
||||||
|
units: list[RenderUnit],
|
||||||
|
backend: TTSBackend,
|
||||||
|
*,
|
||||||
|
pron: Optional[Pronunciation] = None,
|
||||||
|
progress: Optional[Callable[[int, int], None]] = None,
|
||||||
|
) -> tuple["list", int]:
|
||||||
|
"""Synthetise toutes les unites et renvoie (liste (audio,sr), n_units)."""
|
||||||
|
parts = []
|
||||||
|
total = len(units)
|
||||||
|
for i, unit in enumerate(units):
|
||||||
|
text = apply_pronunciation(unit.text, pron) if pron else unit.text
|
||||||
|
audio, sr = backend.synthesize(text, unit.voice)
|
||||||
|
parts.append((audio, sr))
|
||||||
|
if progress:
|
||||||
|
progress(i + 1, total)
|
||||||
|
return parts, total
|
||||||
|
|
||||||
|
|
||||||
|
def render_chapter_to_mp3(
|
||||||
|
book: Book,
|
||||||
|
chapter: Chapter,
|
||||||
|
units: list[RenderUnit],
|
||||||
|
backend: TTSBackend,
|
||||||
|
*,
|
||||||
|
pron: Optional[Pronunciation] = None,
|
||||||
|
track: Optional[int] = None,
|
||||||
|
progress: Optional[Callable[[int, int], None]] = None,
|
||||||
|
) -> Path:
|
||||||
|
"""Pipeline complet pour un chapitre -> output/<livre>/NN-...mp3."""
|
||||||
|
parts, _ = render_units(units, backend, pron=pron, progress=progress)
|
||||||
|
# parts est aligne 1:1 avec units -> on transmet les marqueurs d'incise.
|
||||||
|
audio, sr = concat_segments(parts, glued=[u.glued_to_prev for u in units])
|
||||||
|
audio = normalize_loudness(audio)
|
||||||
|
|
||||||
|
# WAV intermediaire dans data/, MP3 final dans output/.
|
||||||
|
wav_path = book_data_dir(book.slug) / "audio" / f"ch{chapter.index:02d}.wav"
|
||||||
|
write_wav(wav_path, audio, sr)
|
||||||
|
|
||||||
|
out_dir = book_output_dir(book.title)
|
||||||
|
mp3_path = out_dir / (chapter.output_name or f"ch{chapter.index:02d}.mp3")
|
||||||
|
cover = None
|
||||||
|
if book.cover_file:
|
||||||
|
candidate = book_data_dir(book.slug) / book.cover_file
|
||||||
|
cover = candidate if candidate.exists() else None
|
||||||
|
|
||||||
|
encode_mp3(
|
||||||
|
wav_path, mp3_path,
|
||||||
|
title=chapter.title, album=book.title, artist=book.author,
|
||||||
|
track=track, cover_path=cover,
|
||||||
|
)
|
||||||
|
return mp3_path
|
||||||
170
backend/inkflow/settings.py
Normal file
170
backend/inkflow/settings.py
Normal file
@@ -0,0 +1,170 @@
|
|||||||
|
"""Reglages techniques editables au runtime (globaux a l'app).
|
||||||
|
|
||||||
|
Contrairement a `config.py` (constantes figees lues a l'import, surchargeables
|
||||||
|
seulement par variables d'environnement au demarrage), ce module expose un objet
|
||||||
|
`Settings` *persiste* dans `data/settings.json` et modifiable depuis l'UI.
|
||||||
|
|
||||||
|
Les valeurs par defaut reprennent celles de `config.py`. Le code du pipeline
|
||||||
|
consulte `get_settings()` au moment de l'execution ; une sauvegarde invalide les
|
||||||
|
caches de modeles (backends TTS, chargement Gemma) pour que les nouveaux
|
||||||
|
identifiants/parametres prennent effet sans redemarrage.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import threading
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from pydantic import BaseModel, Field
|
||||||
|
|
||||||
|
from . import config
|
||||||
|
|
||||||
|
# --- Prompts systeme par defaut (source canonique) ---------------------------
|
||||||
|
# Ces chaines pilotent les trois taches Gemma. L'utilisateur peut les editer.
|
||||||
|
DEFAULT_PROMPT_SPEAKERS = (
|
||||||
|
"Tu es un assistant d'analyse litteraire. Tu identifies QUI prononce chaque "
|
||||||
|
"replique de dialogue dans un extrait de roman en francais. Une liste des "
|
||||||
|
"personnages du chapitre t'est fournie : choisis le locuteur dans cette "
|
||||||
|
"liste en recopiant son nom EXACTEMENT. Appuie-toi sur la narration qui "
|
||||||
|
"PRECEDE et qui SUIT chaque replique (incise d'attribution type 'dit "
|
||||||
|
"Marie'), sur les vocatifs (le personnage a qui l'on s'adresse) et sur "
|
||||||
|
"l'alternance des tours de parole. Mets 'inconnu' si tu n'es pas sur. Tu "
|
||||||
|
"reponds UNIQUEMENT en JSON valide, sans texte autour."
|
||||||
|
)
|
||||||
|
DEFAULT_PROMPT_SPEAKERS_REFINE = (
|
||||||
|
"Tu es un assistant d'analyse litteraire. On te donne des repliques dont le "
|
||||||
|
"locuteur est reste indetermine, avec le locuteur DEJA identifie des "
|
||||||
|
"repliques voisines. Deduis qui parle en exploitant l'alternance des tours "
|
||||||
|
"de parole et le contexte narratif autour. Choisis le nom dans la liste des "
|
||||||
|
"personnages fournie, en le recopiant exactement, ou 'inconnu' si vraiment "
|
||||||
|
"indeterminable. Tu reponds UNIQUEMENT en JSON valide, sans texte autour."
|
||||||
|
)
|
||||||
|
DEFAULT_PROMPT_CHARACTERS = (
|
||||||
|
"Tu es un assistant d'analyse litteraire. Tu extrais la liste des "
|
||||||
|
"personnages d'un extrait de roman et leurs attributs vocaux. Tu reponds "
|
||||||
|
"UNIQUEMENT en JSON valide."
|
||||||
|
)
|
||||||
|
DEFAULT_PROMPT_PRONUNCIATION = (
|
||||||
|
"Tu es un assistant de preparation de livre audio en francais. Tu reperes "
|
||||||
|
"les mots dont la prononciation par un synthetiseur vocal francais risque "
|
||||||
|
"d'etre incorrecte (noms propres etrangers, termes de science-fiction, "
|
||||||
|
"acronymes). Tu reponds UNIQUEMENT en JSON valide."
|
||||||
|
)
|
||||||
|
DEFAULT_PROMPT_INCISES = (
|
||||||
|
"Tu es un assistant d'analyse litteraire. Tu reperes les INCISES de "
|
||||||
|
"narration inserees dans une replique de dialogue (ex: 'dit Mamie', "
|
||||||
|
"'repondit le capitaine'). Tu reponds UNIQUEMENT en JSON valide, sans "
|
||||||
|
"texte autour."
|
||||||
|
)
|
||||||
|
DEFAULT_PROMPT_DEDUP = (
|
||||||
|
"Tu es un assistant d'analyse litteraire. Tu rapproches les differentes "
|
||||||
|
"facons de nommer un meme personnage (nom complet, prenom, surnom, "
|
||||||
|
"diminutif) pour eviter les doublons dans le casting d'un livre audio. Tu "
|
||||||
|
"ne fusionnes deux noms que si c'est, avec certitude, la meme personne. Tu "
|
||||||
|
"reponds UNIQUEMENT en JSON valide, sans texte autour."
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class Settings(BaseModel):
|
||||||
|
"""Reglages techniques globaux, persistes dans data/settings.json."""
|
||||||
|
|
||||||
|
# --- Modeles MLX (identifiants HuggingFace) ---
|
||||||
|
gemma_model: str = config.GEMMA_MODEL
|
||||||
|
qwen3_model: str = config.QWEN3_TTS_MODEL
|
||||||
|
kokoro_model: str = config.KOKORO_MODEL
|
||||||
|
|
||||||
|
# --- Generation Gemma ---
|
||||||
|
gemma_temperature: float = Field(0.1, ge=0.0, le=2.0)
|
||||||
|
gemma_max_tokens: int = Field(2048, ge=64, le=8192)
|
||||||
|
|
||||||
|
# --- Prompts systeme (analyse) ---
|
||||||
|
prompt_speakers: str = DEFAULT_PROMPT_SPEAKERS
|
||||||
|
prompt_speakers_refine: str = DEFAULT_PROMPT_SPEAKERS_REFINE
|
||||||
|
prompt_characters: str = DEFAULT_PROMPT_CHARACTERS
|
||||||
|
prompt_pronunciation: str = DEFAULT_PROMPT_PRONUNCIATION
|
||||||
|
prompt_incises: str = DEFAULT_PROMPT_INCISES # DEPRECIE (detection deterministe)
|
||||||
|
prompt_dedup: str = DEFAULT_PROMPT_DEDUP
|
||||||
|
|
||||||
|
# --- Incises ---
|
||||||
|
# DEPRECIE : la detection d'incises est desormais deterministe et conscience
|
||||||
|
# du casting (analysis.segmenter.detect_incises), sans fallback Gemma. Champ
|
||||||
|
# conserve pour charger les settings.json existants sans erreur.
|
||||||
|
split_incises_use_gemma: bool = True
|
||||||
|
|
||||||
|
# --- Attribution retroactive (2e passe sur les repliques indeterminees) ---
|
||||||
|
# Apres la 1re passe, une 2e passe ciblee re-resout les repliques restees
|
||||||
|
# 'inconnu' (ou peu sures) en s'appuyant sur les voisins deja identifies.
|
||||||
|
# Declenchee seulement s'il reste des doutes -> cout nul sinon.
|
||||||
|
retro_pass_use_gemma: bool = True
|
||||||
|
|
||||||
|
# --- Deduplication du casting ---
|
||||||
|
# Heuristique (sure, deterministe) par defaut. La passe Gemma rattache en
|
||||||
|
# plus les variantes non evidentes (diminutifs, titres) mais, avec un petit
|
||||||
|
# modele local, produit des fusions erronees -> opt-in.
|
||||||
|
dedup_use_gemma: bool = False
|
||||||
|
|
||||||
|
# --- TTS ---
|
||||||
|
default_backend: str = "kokoro"
|
||||||
|
language: str = config.DEFAULT_LANGUAGE
|
||||||
|
kokoro_lang_code: str = config.KOKORO_LANG_CODE
|
||||||
|
kokoro_default_voice: str = config.KOKORO_DEFAULT_VOICE
|
||||||
|
qwen3_default_voice: str = config.QWEN3_DEFAULT_VOICE
|
||||||
|
|
||||||
|
# --- Audio (encodage final) ---
|
||||||
|
target_sample_rate: int = Field(config.TARGET_SAMPLE_RATE, ge=8000, le=48000)
|
||||||
|
mp3_bitrate: str = config.MP3_BITRATE
|
||||||
|
target_dbfs: float = Field(config.TARGET_DBFS, ge=-40.0, le=0.0)
|
||||||
|
|
||||||
|
|
||||||
|
_LOCK = threading.Lock()
|
||||||
|
_cache: Optional[Settings] = None
|
||||||
|
|
||||||
|
|
||||||
|
def settings_path():
|
||||||
|
return config.DATA_DIR / "settings.json"
|
||||||
|
|
||||||
|
|
||||||
|
def get_settings() -> Settings:
|
||||||
|
"""Renvoie les reglages courants (charges depuis le disque une seule fois)."""
|
||||||
|
global _cache
|
||||||
|
with _LOCK:
|
||||||
|
if _cache is None:
|
||||||
|
path = settings_path()
|
||||||
|
if path.exists():
|
||||||
|
try:
|
||||||
|
_cache = Settings.model_validate_json(
|
||||||
|
path.read_text(encoding="utf-8"))
|
||||||
|
except Exception: # noqa: BLE001 — fichier corrompu -> defauts
|
||||||
|
_cache = Settings()
|
||||||
|
else:
|
||||||
|
_cache = Settings()
|
||||||
|
return _cache
|
||||||
|
|
||||||
|
|
||||||
|
def save_settings(settings: Settings) -> Settings:
|
||||||
|
"""Persiste les reglages et invalide les caches de modeles."""
|
||||||
|
global _cache
|
||||||
|
with _LOCK:
|
||||||
|
_cache = settings
|
||||||
|
path = settings_path()
|
||||||
|
path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
path.write_text(settings.model_dump_json(indent=2), encoding="utf-8")
|
||||||
|
_invalidate_model_caches()
|
||||||
|
return settings
|
||||||
|
|
||||||
|
|
||||||
|
def _invalidate_model_caches() -> None:
|
||||||
|
"""Force le rechargement des modeles apres un changement d'identifiant/param.
|
||||||
|
|
||||||
|
`get_backend` est cache par *nom* de backend, pas par id de modele ; sans
|
||||||
|
purge, un changement d'id serait ignore. Idem pour le chargement Gemma.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
from .tts.factory import get_backend
|
||||||
|
get_backend.cache_clear()
|
||||||
|
except Exception: # noqa: BLE001
|
||||||
|
pass
|
||||||
|
try:
|
||||||
|
from .analysis.gemma import _load
|
||||||
|
_load.cache_clear()
|
||||||
|
except Exception: # noqa: BLE001
|
||||||
|
pass
|
||||||
0
backend/inkflow/store/__init__.py
Normal file
0
backend/inkflow/store/__init__.py
Normal file
63
backend/inkflow/store/artifacts.py
Normal file
63
backend/inkflow/store/artifacts.py
Normal file
@@ -0,0 +1,63 @@
|
|||||||
|
"""Lecture/ecriture des artefacts du pipeline dans data/<slug>/.
|
||||||
|
|
||||||
|
Chaque etape ecrit un JSON ; les etapes suivantes les relisent. C'est aussi ce
|
||||||
|
qui rend le pipeline reprenable : on peut detecter qu'un artefact existe deja.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from ..config import book_data_dir
|
||||||
|
from ..models import Cast, ChapterAnalysis, Pronunciation
|
||||||
|
|
||||||
|
|
||||||
|
def analysis_path(slug: str, chapter_index: int) -> Path:
|
||||||
|
return book_data_dir(slug) / "analysis" / f"ch{chapter_index:02d}.json"
|
||||||
|
|
||||||
|
|
||||||
|
def cast_path(slug: str) -> Path:
|
||||||
|
return book_data_dir(slug) / "cast.json"
|
||||||
|
|
||||||
|
|
||||||
|
def pronunciation_path(slug: str) -> Path:
|
||||||
|
return book_data_dir(slug) / "pronunciation.json"
|
||||||
|
|
||||||
|
|
||||||
|
def save_analysis(slug: str, analysis: ChapterAnalysis) -> Path:
|
||||||
|
path = analysis_path(slug, analysis.index)
|
||||||
|
path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
path.write_text(analysis.model_dump_json(indent=2), encoding="utf-8")
|
||||||
|
return path
|
||||||
|
|
||||||
|
|
||||||
|
def load_analysis(slug: str, chapter_index: int) -> ChapterAnalysis:
|
||||||
|
path = analysis_path(slug, chapter_index)
|
||||||
|
return ChapterAnalysis.model_validate_json(path.read_text(encoding="utf-8"))
|
||||||
|
|
||||||
|
|
||||||
|
def save_cast(slug: str, cast: Cast) -> Path:
|
||||||
|
path = cast_path(slug)
|
||||||
|
path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
path.write_text(cast.model_dump_json(indent=2), encoding="utf-8")
|
||||||
|
return path
|
||||||
|
|
||||||
|
|
||||||
|
def load_cast(slug: str) -> Cast:
|
||||||
|
path = cast_path(slug)
|
||||||
|
if not path.exists():
|
||||||
|
return Cast()
|
||||||
|
return Cast.model_validate_json(path.read_text(encoding="utf-8"))
|
||||||
|
|
||||||
|
|
||||||
|
def save_pronunciation(slug: str, pron: Pronunciation) -> Path:
|
||||||
|
path = pronunciation_path(slug)
|
||||||
|
path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
path.write_text(pron.model_dump_json(indent=2), encoding="utf-8")
|
||||||
|
return path
|
||||||
|
|
||||||
|
|
||||||
|
def load_pronunciation(slug: str) -> Pronunciation:
|
||||||
|
path = pronunciation_path(slug)
|
||||||
|
if not path.exists():
|
||||||
|
return Pronunciation()
|
||||||
|
return Pronunciation.model_validate_json(path.read_text(encoding="utf-8"))
|
||||||
0
backend/inkflow/tts/__init__.py
Normal file
0
backend/inkflow/tts/__init__.py
Normal file
48
backend/inkflow/tts/base.py
Normal file
48
backend/inkflow/tts/base.py
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
"""Abstraction des moteurs TTS (backend pluggable).
|
||||||
|
|
||||||
|
Deux implementations : Kokoro (rapide, voix preglees -> previews) et Qwen3-TTS
|
||||||
|
(qualite + clonage par audio de reference -> rendu final). Toutes deux renvoient
|
||||||
|
de l'audio mono float32 + une frequence d'echantillonnage.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from abc import ABC, abstractmethod
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class VoiceSpec:
|
||||||
|
"""Decrit la voix a utiliser pour une synthese.
|
||||||
|
|
||||||
|
- `preset` : nom d'une voix preglee (Kokoro: "ff_siwis" ; Qwen3: "Chelsie").
|
||||||
|
- `ref_audio` / `ref_text` : clip de reference pour le clonage (Qwen3).
|
||||||
|
"""
|
||||||
|
preset: Optional[str] = None
|
||||||
|
ref_audio: Optional[str] = None
|
||||||
|
ref_text: Optional[str] = None
|
||||||
|
speed: float = 1.0
|
||||||
|
|
||||||
|
|
||||||
|
class TTSBackend(ABC):
|
||||||
|
"""Interface commune a tous les moteurs TTS."""
|
||||||
|
|
||||||
|
name: str = "base"
|
||||||
|
|
||||||
|
@abstractmethod
|
||||||
|
def synthesize(self, text: str, voice: VoiceSpec) -> tuple[np.ndarray, int]:
|
||||||
|
"""Synthetise `text` et renvoie (audio mono float32, sample_rate)."""
|
||||||
|
|
||||||
|
def default_voice(self) -> VoiceSpec:
|
||||||
|
return VoiceSpec()
|
||||||
|
|
||||||
|
|
||||||
|
def to_mono_float32(audio) -> np.ndarray:
|
||||||
|
"""Normalise une sortie de modele (mx.array / np / list) en mono float32."""
|
||||||
|
arr = np.asarray(audio, dtype=np.float32)
|
||||||
|
if arr.ndim > 1:
|
||||||
|
# (channels, n) ou (n, channels) -> moyenne sur l'axe des canaux.
|
||||||
|
arr = arr.mean(axis=0) if arr.shape[0] < arr.shape[-1] else arr.mean(axis=-1)
|
||||||
|
return np.ascontiguousarray(arr.reshape(-1))
|
||||||
62
backend/inkflow/tts/chunk.py
Normal file
62
backend/inkflow/tts/chunk.py
Normal file
@@ -0,0 +1,62 @@
|
|||||||
|
"""Decoupage de texte en morceaux synthese-friendly.
|
||||||
|
|
||||||
|
Les modeles TTS (Kokoro notamment) tronquent les textes trop longs. On decoupe
|
||||||
|
donc sur les frontieres de phrases en respectant une longueur max par morceau.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import re
|
||||||
|
|
||||||
|
# Fin de phrase : ponctuation forte suivie d'un espace.
|
||||||
|
_SENTENCE_END_RE = re.compile(r"(?<=[.!?…])\s+|\n+")
|
||||||
|
# Pour les phrases tres longues, on coupe aussi sur les virgules / points-virgules.
|
||||||
|
_SOFT_BREAK_RE = re.compile(r"(?<=[,;:])\s+")
|
||||||
|
|
||||||
|
DEFAULT_MAX_CHARS = 350
|
||||||
|
|
||||||
|
|
||||||
|
def split_sentences(text: str) -> list[str]:
|
||||||
|
parts = [p.strip() for p in _SENTENCE_END_RE.split(text)]
|
||||||
|
return [p for p in parts if p]
|
||||||
|
|
||||||
|
|
||||||
|
def _split_long(sentence: str, max_chars: int) -> list[str]:
|
||||||
|
"""Coupe une phrase trop longue sur les virgules, puis par fenetre dure."""
|
||||||
|
if len(sentence) <= max_chars:
|
||||||
|
return [sentence]
|
||||||
|
out: list[str] = []
|
||||||
|
buf = ""
|
||||||
|
for piece in _SOFT_BREAK_RE.split(sentence):
|
||||||
|
cand = f"{buf} {piece}".strip()
|
||||||
|
if len(cand) <= max_chars:
|
||||||
|
buf = cand
|
||||||
|
else:
|
||||||
|
if buf:
|
||||||
|
out.append(buf)
|
||||||
|
if len(piece) <= max_chars:
|
||||||
|
buf = piece
|
||||||
|
else: # mot/segment plus long que la fenetre : coupe brute
|
||||||
|
for i in range(0, len(piece), max_chars):
|
||||||
|
out.append(piece[i:i + max_chars])
|
||||||
|
buf = ""
|
||||||
|
if buf:
|
||||||
|
out.append(buf)
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
def chunk_text(text: str, max_chars: int = DEFAULT_MAX_CHARS) -> list[str]:
|
||||||
|
"""Regroupe les phrases en morceaux <= max_chars, sans couper une phrase."""
|
||||||
|
chunks: list[str] = []
|
||||||
|
buf = ""
|
||||||
|
for sentence in split_sentences(text):
|
||||||
|
for part in _split_long(sentence, max_chars):
|
||||||
|
cand = f"{buf} {part}".strip()
|
||||||
|
if len(cand) <= max_chars:
|
||||||
|
buf = cand
|
||||||
|
else:
|
||||||
|
if buf:
|
||||||
|
chunks.append(buf)
|
||||||
|
buf = part
|
||||||
|
if buf:
|
||||||
|
chunks.append(buf)
|
||||||
|
return chunks
|
||||||
20
backend/inkflow/tts/factory.py
Normal file
20
backend/inkflow/tts/factory.py
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
"""Selection du backend TTS par nom (pluggable)."""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from functools import lru_cache
|
||||||
|
|
||||||
|
from .base import TTSBackend
|
||||||
|
|
||||||
|
BACKENDS = ("kokoro", "qwen3")
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=4)
|
||||||
|
def get_backend(name: str = "kokoro") -> TTSBackend:
|
||||||
|
name = name.lower()
|
||||||
|
if name == "kokoro":
|
||||||
|
from .kokoro import KokoroBackend
|
||||||
|
return KokoroBackend()
|
||||||
|
if name == "qwen3":
|
||||||
|
from .qwen3 import Qwen3Backend
|
||||||
|
return Qwen3Backend()
|
||||||
|
raise ValueError(f"Backend TTS inconnu: {name!r} (dispo: {', '.join(BACKENDS)})")
|
||||||
93
backend/inkflow/tts/kokoro.py
Normal file
93
backend/inkflow/tts/kokoro.py
Normal file
@@ -0,0 +1,93 @@
|
|||||||
|
"""Backend Kokoro (rapide, voix preglees) — ideal pour les previews.
|
||||||
|
|
||||||
|
Kokoro tronque les textes longs : on synthetise morceau par morceau (decoupage
|
||||||
|
par phrases) puis on concatene. Le francais passe par espeak-ng via phonemizer.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import logging
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
from ..config import setup_espeak
|
||||||
|
from ..settings import get_settings
|
||||||
|
from .base import TTSBackend, VoiceSpec, to_mono_float32
|
||||||
|
from .chunk import chunk_text
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# Le port MLX de Kokoro a un bug d'alignement intermittent (mx.random.normal
|
||||||
|
# dans le generateur harmonique) qui leve un broadcast_shapes sur certains
|
||||||
|
# tirages. Comme c'est aleatoire, un simple retry suffit le plus souvent ;
|
||||||
|
# en dernier recours on coupe le morceau en deux.
|
||||||
|
_KOKORO_RETRIES = 8
|
||||||
|
|
||||||
|
|
||||||
|
class KokoroBackend(TTSBackend):
|
||||||
|
name = "kokoro"
|
||||||
|
|
||||||
|
def __init__(self, model_id: str | None = None, lang_code: str | None = None):
|
||||||
|
setup_espeak()
|
||||||
|
settings = get_settings()
|
||||||
|
self.model_id = model_id or settings.kokoro_model
|
||||||
|
self.lang_code = lang_code or settings.kokoro_lang_code
|
||||||
|
self._model = None
|
||||||
|
self._sample_rate = 24000
|
||||||
|
|
||||||
|
def _ensure_loaded(self) -> None:
|
||||||
|
if self._model is None:
|
||||||
|
from mlx_audio.tts.utils import load_model
|
||||||
|
self._model = load_model(self.model_id)
|
||||||
|
|
||||||
|
def default_voice(self) -> VoiceSpec:
|
||||||
|
return VoiceSpec(preset=get_settings().kokoro_default_voice)
|
||||||
|
|
||||||
|
def synthesize(self, text: str, voice: VoiceSpec) -> tuple[np.ndarray, int]:
|
||||||
|
self._ensure_loaded()
|
||||||
|
preset = voice.preset or get_settings().kokoro_default_voice
|
||||||
|
pieces: list[np.ndarray] = []
|
||||||
|
for chunk in chunk_text(text):
|
||||||
|
pieces.extend(self._gen_resilient(chunk, preset, voice.speed))
|
||||||
|
if not pieces:
|
||||||
|
return np.zeros(0, dtype=np.float32), self._sample_rate
|
||||||
|
return np.concatenate(pieces), self._sample_rate
|
||||||
|
|
||||||
|
def _gen_once(self, text: str, preset: str, speed: float) -> list[np.ndarray]:
|
||||||
|
out: list[np.ndarray] = []
|
||||||
|
for result in self._model.generate(
|
||||||
|
text=text, voice=preset, speed=speed, lang_code=self.lang_code,
|
||||||
|
):
|
||||||
|
self._sample_rate = getattr(result, "sample_rate", self._sample_rate)
|
||||||
|
out.append(to_mono_float32(result.audio))
|
||||||
|
return out
|
||||||
|
|
||||||
|
def _gen_resilient(self, text: str, preset: str, speed: float,
|
||||||
|
depth: int = 0) -> list[np.ndarray]:
|
||||||
|
"""Genere un morceau avec retries, puis re-decoupe en secours."""
|
||||||
|
for _ in range(_KOKORO_RETRIES):
|
||||||
|
try:
|
||||||
|
return self._gen_once(text, preset, speed)
|
||||||
|
except Exception: # noqa: BLE001 — bug intermittent du vocoder
|
||||||
|
continue
|
||||||
|
# Toujours en echec : on coupe en deux et on reessaie chaque moitie.
|
||||||
|
if depth < 3 and len(text) > 40:
|
||||||
|
mid = _split_point(text)
|
||||||
|
left = self._gen_resilient(text[:mid].strip(), preset, speed, depth + 1)
|
||||||
|
right = self._gen_resilient(text[mid:].strip(), preset, speed, depth + 1)
|
||||||
|
return left + right
|
||||||
|
logger.warning("Kokoro: morceau abandonne apres echecs: %r", text[:60])
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def _split_point(text: str) -> int:
|
||||||
|
"""Point de coupe au plus proche du milieu (espace de preference)."""
|
||||||
|
mid = len(text) // 2
|
||||||
|
left = text.rfind(" ", 0, mid)
|
||||||
|
right = text.find(" ", mid)
|
||||||
|
if left == -1 and right == -1:
|
||||||
|
return mid
|
||||||
|
if left == -1:
|
||||||
|
return right
|
||||||
|
if right == -1:
|
||||||
|
return left
|
||||||
|
return left if (mid - left) <= (right - mid) else right
|
||||||
58
backend/inkflow/tts/qwen3.py
Normal file
58
backend/inkflow/tts/qwen3.py
Normal file
@@ -0,0 +1,58 @@
|
|||||||
|
"""Backend Qwen3-TTS (qualite + clonage par audio de reference) — rendu final.
|
||||||
|
|
||||||
|
Deux modes :
|
||||||
|
- voix preglee : `voice` (ex "Chelsie") + `language` ("French").
|
||||||
|
- clonage : `ref_audio` (+ `ref_text` transcription du clip) pour imiter une
|
||||||
|
voix de la voicebank, attribuee a un personnage.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
from ..settings import get_settings
|
||||||
|
from .base import TTSBackend, VoiceSpec, to_mono_float32
|
||||||
|
from .chunk import chunk_text
|
||||||
|
|
||||||
|
# Qwen3 tolere des sequences plus longues que Kokoro, mais on borne quand meme.
|
||||||
|
_QWEN_MAX_CHARS = 500
|
||||||
|
|
||||||
|
|
||||||
|
class Qwen3Backend(TTSBackend):
|
||||||
|
name = "qwen3"
|
||||||
|
|
||||||
|
def __init__(self, model_id: str | None = None, language: str | None = None):
|
||||||
|
settings = get_settings()
|
||||||
|
self.model_id = model_id or settings.qwen3_model
|
||||||
|
self.language = language or settings.language
|
||||||
|
self._model = None
|
||||||
|
self._sample_rate = 24000
|
||||||
|
|
||||||
|
def _ensure_loaded(self) -> None:
|
||||||
|
if self._model is None:
|
||||||
|
from mlx_audio.tts.utils import load_model
|
||||||
|
self._model = load_model(self.model_id)
|
||||||
|
|
||||||
|
def default_voice(self) -> VoiceSpec:
|
||||||
|
return VoiceSpec(preset=get_settings().qwen3_default_voice)
|
||||||
|
|
||||||
|
def _gen_kwargs(self, voice: VoiceSpec) -> dict:
|
||||||
|
kwargs: dict = {"language": self.language, "speed": voice.speed}
|
||||||
|
if voice.ref_audio: # mode clonage
|
||||||
|
kwargs["ref_audio"] = voice.ref_audio
|
||||||
|
if voice.ref_text:
|
||||||
|
kwargs["ref_text"] = voice.ref_text
|
||||||
|
else: # mode voix preglee
|
||||||
|
kwargs["voice"] = voice.preset or get_settings().qwen3_default_voice
|
||||||
|
return kwargs
|
||||||
|
|
||||||
|
def synthesize(self, text: str, voice: VoiceSpec) -> tuple[np.ndarray, int]:
|
||||||
|
self._ensure_loaded()
|
||||||
|
kwargs = self._gen_kwargs(voice)
|
||||||
|
pieces: list[np.ndarray] = []
|
||||||
|
for chunk in chunk_text(text, max_chars=_QWEN_MAX_CHARS):
|
||||||
|
for result in self._model.generate(text=chunk, **kwargs):
|
||||||
|
self._sample_rate = getattr(result, "sample_rate", self._sample_rate)
|
||||||
|
pieces.append(to_mono_float32(result.audio))
|
||||||
|
if not pieces:
|
||||||
|
return np.zeros(0, dtype=np.float32), self._sample_rate
|
||||||
|
return np.concatenate(pieces), self._sample_rate
|
||||||
22
backend/inkflow/util.py
Normal file
22
backend/inkflow/util.py
Normal file
@@ -0,0 +1,22 @@
|
|||||||
|
"""Petits utilitaires partages (slug, noms de fichiers surs)."""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import re
|
||||||
|
import unicodedata
|
||||||
|
|
||||||
|
_SLUG_STRIP = re.compile(r"[^a-z0-9]+")
|
||||||
|
_FS_UNSAFE = re.compile(r'[<>:"/\\|?*\x00-\x1f]')
|
||||||
|
|
||||||
|
|
||||||
|
def slugify(text: str) -> str:
|
||||||
|
"""Slug ascii minuscule, utilise pour les identifiants de dossiers internes."""
|
||||||
|
norm = unicodedata.normalize("NFKD", text)
|
||||||
|
norm = norm.encode("ascii", "ignore").decode("ascii").lower()
|
||||||
|
return _SLUG_STRIP.sub("-", norm).strip("-") or "livre"
|
||||||
|
|
||||||
|
|
||||||
|
def safe_filename(name: str) -> str:
|
||||||
|
"""Nettoie un nom de fichier en conservant les accents (sortie utilisateur)."""
|
||||||
|
name = _FS_UNSAFE.sub("", name).strip()
|
||||||
|
name = re.sub(r"\s+", " ", name)
|
||||||
|
return name or "sans-titre"
|
||||||
40
backend/pyproject.toml
Normal file
40
backend/pyproject.toml
Normal file
@@ -0,0 +1,40 @@
|
|||||||
|
[project]
|
||||||
|
name = "inkflow"
|
||||||
|
version = "0.1.0"
|
||||||
|
description = "EPUB -> livre audio, 100% local sur Mac (MLX). Analyse Gemma + TTS Qwen3/Kokoro."
|
||||||
|
requires-python = ">=3.11"
|
||||||
|
dependencies = [
|
||||||
|
# MLX (Apple Silicon)
|
||||||
|
"mlx",
|
||||||
|
"mlx-lm",
|
||||||
|
"mlx-audio",
|
||||||
|
"misaki", # phonemizer pour Kokoro (français inclus)
|
||||||
|
# Parsing EPUB
|
||||||
|
"ebooklib",
|
||||||
|
"beautifulsoup4",
|
||||||
|
"lxml",
|
||||||
|
# Audio
|
||||||
|
"soundfile", # lecture/ecriture wav
|
||||||
|
"numpy", # concat audio + normalisation
|
||||||
|
"mutagen", # tags id3 + cover (encodage mp3 via ffmpeg CLI)
|
||||||
|
# API web
|
||||||
|
"fastapi",
|
||||||
|
"uvicorn[standard]",
|
||||||
|
"websockets",
|
||||||
|
"python-multipart", # upload de fichiers
|
||||||
|
# Divers
|
||||||
|
"pydantic>=2",
|
||||||
|
"rich", # logs CLI lisibles
|
||||||
|
"typer", # CLI
|
||||||
|
]
|
||||||
|
|
||||||
|
[project.scripts]
|
||||||
|
inkflow = "inkflow.cli:app"
|
||||||
|
|
||||||
|
[build-system]
|
||||||
|
requires = ["setuptools>=68"]
|
||||||
|
build-backend = "setuptools.build_meta"
|
||||||
|
|
||||||
|
[tool.setuptools.packages.find]
|
||||||
|
where = ["."]
|
||||||
|
include = ["inkflow*"]
|
||||||
87
backend/scripts/setup_models.py
Normal file
87
backend/scripts/setup_models.py
Normal file
@@ -0,0 +1,87 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
"""Verifie l'environnement InkFlow et pre-telecharge les modeles MLX.
|
||||||
|
|
||||||
|
Usage :
|
||||||
|
python scripts/setup_models.py # tout verifier + telecharger
|
||||||
|
python scripts/setup_models.py --check # verifier sans telecharger
|
||||||
|
|
||||||
|
Pre-requis systeme : Apple Silicon, Python >= 3.11, ffmpeg (brew install ffmpeg).
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import platform
|
||||||
|
import shutil
|
||||||
|
import sys
|
||||||
|
|
||||||
|
# Permet de lancer le script directement depuis backend/.
|
||||||
|
sys.path.insert(0, str(__import__("pathlib").Path(__file__).resolve().parents[1]))
|
||||||
|
|
||||||
|
from inkflow.config import ( # noqa: E402
|
||||||
|
GEMMA_MODEL,
|
||||||
|
KOKORO_MODEL,
|
||||||
|
QWEN3_TTS_MODEL,
|
||||||
|
ensure_dirs,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def check_env() -> bool:
|
||||||
|
ok = True
|
||||||
|
print(f"• Plateforme : {platform.platform()} ({platform.machine()})")
|
||||||
|
if platform.machine() != "arm64":
|
||||||
|
print(" ! Attendu arm64 (Apple Silicon) — MLX ne sera pas optimal.")
|
||||||
|
print(f"• Python : {sys.version.split()[0]}")
|
||||||
|
if sys.version_info < (3, 11):
|
||||||
|
print(" ! Python >= 3.11 requis."); ok = False
|
||||||
|
|
||||||
|
for mod in ("mlx", "mlx_lm", "mlx_audio", "ebooklib", "bs4",
|
||||||
|
"soundfile", "mutagen", "fastapi"):
|
||||||
|
try:
|
||||||
|
__import__(mod)
|
||||||
|
print(f"• import {mod:12s}: OK")
|
||||||
|
except Exception as exc: # noqa: BLE001
|
||||||
|
print(f"• import {mod:12s}: ECHEC ({exc})"); ok = False
|
||||||
|
|
||||||
|
ff = shutil.which("ffmpeg")
|
||||||
|
print(f"• ffmpeg : {ff or 'INTROUVABLE — brew install ffmpeg'}")
|
||||||
|
ok = ok and bool(ff)
|
||||||
|
return ok
|
||||||
|
|
||||||
|
|
||||||
|
def download_lm(model_id: str) -> None:
|
||||||
|
from mlx_lm import load
|
||||||
|
print(f" -> LM {model_id}")
|
||||||
|
load(model_id)
|
||||||
|
|
||||||
|
|
||||||
|
def download_tts(model_id: str) -> None:
|
||||||
|
from mlx_audio.tts.utils import load_model
|
||||||
|
print(f" -> TTS {model_id}")
|
||||||
|
load_model(model_id)
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> int:
|
||||||
|
ap = argparse.ArgumentParser()
|
||||||
|
ap.add_argument("--check", action="store_true", help="verifier sans telecharger")
|
||||||
|
args = ap.parse_args()
|
||||||
|
|
||||||
|
ensure_dirs()
|
||||||
|
print("== Verification de l'environnement ==")
|
||||||
|
env_ok = check_env()
|
||||||
|
|
||||||
|
if args.check:
|
||||||
|
return 0 if env_ok else 1
|
||||||
|
if not env_ok:
|
||||||
|
print("\nEnvironnement incomplet — corrige les points ci-dessus avant de continuer.")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
print("\n== Telechargement des modeles (peut etre long la 1re fois) ==")
|
||||||
|
download_lm(GEMMA_MODEL)
|
||||||
|
download_tts(KOKORO_MODEL)
|
||||||
|
download_tts(QWEN3_TTS_MODEL)
|
||||||
|
print("\nTout est pret.")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
raise SystemExit(main())
|
||||||
204
backend/tests/test_incises.py
Normal file
204
backend/tests/test_incises.py
Normal file
@@ -0,0 +1,204 @@
|
|||||||
|
"""Tests de la detection deterministe des incises.
|
||||||
|
|
||||||
|
`detect_incises` / `incise_speaker` / `iter_incise_pieces` sont pures et
|
||||||
|
testables sans Gemma. Deux passes : inversion verbe-pronom ("dit-il") et
|
||||||
|
nominale consciente du casting ("compatit Holden", "informa le soldat").
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from inkflow.analysis.segmenter import (
|
||||||
|
detect_incises,
|
||||||
|
incise_speaker,
|
||||||
|
iter_incise_pieces,
|
||||||
|
)
|
||||||
|
|
||||||
|
NAMES = {"Holden", "Kajri", "Camina Drummer"}
|
||||||
|
|
||||||
|
|
||||||
|
def _pieces(text: str, names=NAMES) -> list[tuple[bool, str]]:
|
||||||
|
return iter_incise_pieces(text, detect_incises(text, names=names))
|
||||||
|
|
||||||
|
|
||||||
|
# --- Passe inversion (verbe-pronom) -----------------------------------------
|
||||||
|
|
||||||
|
def test_inversion_au_milieu():
|
||||||
|
assert _pieces("James Holden, coupa-t-elle. Je sais qui vous êtes.") == [
|
||||||
|
(False, "James Holden,"),
|
||||||
|
(True, "coupa-t-elle."),
|
||||||
|
(False, "Je sais qui vous êtes."),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_inversion_en_fin():
|
||||||
|
assert _pieces("C'est fini, dit-elle.") == [
|
||||||
|
(False, "C'est fini,"),
|
||||||
|
(True, "dit-elle."),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_inversion_reflechi_exclamation():
|
||||||
|
assert _pieces("Viens ici, s'écria-t-il !") == [
|
||||||
|
(False, "Viens ici,"),
|
||||||
|
(True, "s'écria-t-il !"),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_inversion_fermee_par_virgule():
|
||||||
|
assert _pieces("Pars, répondit-elle, et ne reviens pas.") == [
|
||||||
|
(False, "Pars,"),
|
||||||
|
(True, "répondit-elle,"),
|
||||||
|
(False, "et ne reviens pas."),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_inversion_complements_apres_pronom():
|
||||||
|
assert _pieces("Trop tard, murmura-t-il en souriant. Partons.") == [
|
||||||
|
(False, "Trop tard,"),
|
||||||
|
(True, "murmura-t-il en souriant."),
|
||||||
|
(False, "Partons."),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_double_inversion():
|
||||||
|
assert _pieces("Stop, dit-il. Non, reprit-elle.") == [
|
||||||
|
(False, "Stop,"),
|
||||||
|
(True, "dit-il."),
|
||||||
|
(False, "Non,"),
|
||||||
|
(True, "reprit-elle."),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
# --- Incise en fin de parole : tout le reste de la replique est narration ----
|
||||||
|
|
||||||
|
def test_incise_apres_fin_de_phrase_va_jusqu_au_bout():
|
||||||
|
# Apres "…" la parole est close : "dit-il ... provisoires." est narration.
|
||||||
|
text = ("Dans une minute, oui. Je voudrais juste… dit-il avec un geste vague, "
|
||||||
|
"comme si tout cela n'avait plus d'importance.")
|
||||||
|
assert _pieces(text) == [
|
||||||
|
(False, "Dans une minute, oui. Je voudrais juste…"),
|
||||||
|
(True, "dit-il avec un geste vague, comme si tout cela n'avait plus "
|
||||||
|
"d'importance."),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_incise_apres_virgule_reprend_le_dialogue():
|
||||||
|
# Apres une simple virgule, le dialogue reprend (contraste avec ci-dessus).
|
||||||
|
assert _pieces("Pars, répondit-elle, et ne reviens pas.") == [
|
||||||
|
(False, "Pars,"),
|
||||||
|
(True, "répondit-elle,"),
|
||||||
|
(False, "et ne reviens pas."),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_incise_nominale_apres_point_interrogation_va_au_bout():
|
||||||
|
text = "Vraiment ? demanda-t-il en se levant. Il s'éloigna."
|
||||||
|
assert _pieces(text) == [
|
||||||
|
(False, "Vraiment ?"),
|
||||||
|
(True, "demanda-t-il en se levant. Il s'éloigna."),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
# --- Passe nominale (verbe + sujet connu) -----------------------------------
|
||||||
|
|
||||||
|
def test_nominale_nom_propre():
|
||||||
|
assert _pieces("Toutes mes condoléances, compatit Holden.") == [
|
||||||
|
(False, "Toutes mes condoléances,"),
|
||||||
|
(True, "compatit Holden."),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_nominale_alias_apres_ponctuation_forte():
|
||||||
|
# "?" comme delimiteur a gauche + sujet = alias d'un personnage connu.
|
||||||
|
assert _pieces("Flippant, cet enfoiré, hein ? lança Drummer.") == [
|
||||||
|
(False, "Flippant, cet enfoiré, hein ?"),
|
||||||
|
(True, "lança Drummer."),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_nominale_clitic_et_nom_de_role():
|
||||||
|
assert _pieces("Vous venez, monsieur ? lui demanda un garde.") == [
|
||||||
|
(False, "Vous venez, monsieur ?"),
|
||||||
|
(True, "lui demanda un garde."),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
# --- incise_speaker : seeding du locuteur explicite -------------------------
|
||||||
|
|
||||||
|
def test_seed_speaker_nom_propre():
|
||||||
|
text = "Toutes mes condoléances, compatit Holden."
|
||||||
|
inc = detect_incises(text, names=NAMES)[0]
|
||||||
|
assert incise_speaker(text, inc, NAMES) == "Holden"
|
||||||
|
|
||||||
|
|
||||||
|
def test_seed_speaker_alias_vers_canonique():
|
||||||
|
text = "Hein ? lança Drummer."
|
||||||
|
inc = detect_incises(text, names=NAMES)[0]
|
||||||
|
assert incise_speaker(text, inc, NAMES) == "Camina Drummer"
|
||||||
|
|
||||||
|
|
||||||
|
def test_seed_speaker_role_non_nomme_est_none():
|
||||||
|
# Un nom de role ("un garde") n'est pas un personnage du casting -> pas de seed.
|
||||||
|
text = "Vous venez ? lui demanda un garde."
|
||||||
|
inc = detect_incises(text, names=NAMES)[0]
|
||||||
|
assert incise_speaker(text, inc, NAMES) is None
|
||||||
|
|
||||||
|
|
||||||
|
def test_seed_speaker_inversion_est_none():
|
||||||
|
text = "C'est fini, dit-elle."
|
||||||
|
inc = detect_incises(text, names=NAMES)[0]
|
||||||
|
assert incise_speaker(text, inc, NAMES) is None
|
||||||
|
|
||||||
|
|
||||||
|
def test_seed_nom_propre_absent_du_casting():
|
||||||
|
# Le nom est ecrit dans l'incise -> seede meme si l'extraction l'a rate.
|
||||||
|
text = "Bonjour, lança Drummer."
|
||||||
|
inc = detect_incises(text, names=set())[0]
|
||||||
|
assert incise_speaker(text, inc, set()) == "Drummer"
|
||||||
|
assert _pieces(text, names=set()) == [
|
||||||
|
(False, "Bonjour,"),
|
||||||
|
(True, "lança Drummer."),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
# --- Faux positifs a NE PAS detecter ----------------------------------------
|
||||||
|
|
||||||
|
def test_vocatif_adresse_pas_incise():
|
||||||
|
# Le personnage est interpelle, pas une incise (aucun verbe de parole).
|
||||||
|
text = "Vous n'avez pas l'air en mesure de rendre service, capitaine Holden."
|
||||||
|
assert detect_incises(text, names=NAMES) == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_imperatif_sans_incise():
|
||||||
|
assert detect_incises("Donne-le-moi.", names=NAMES) == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_pronom_tu_exclu():
|
||||||
|
assert detect_incises("Crois-tu ?", names=NAMES) == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_replique_simple_sans_incise():
|
||||||
|
assert detect_incises("Bonjour à tous.", names=NAMES) == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_sans_noms_inversion_seule():
|
||||||
|
# Sans casting fourni, la passe inversion fonctionne toujours.
|
||||||
|
assert _pieces("C'est fini, dit-elle.", names=set()) == [
|
||||||
|
(False, "C'est fini,"),
|
||||||
|
(True, "dit-elle."),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
# --- Invariants -------------------------------------------------------------
|
||||||
|
|
||||||
|
def test_texte_preserve_modulo_espaces():
|
||||||
|
text = "James Holden, coupa-t-elle. Je sais qui vous êtes."
|
||||||
|
joined = "".join(p for _, p in _pieces(text))
|
||||||
|
assert joined.replace(" ", "") == text.replace(" ", "")
|
||||||
|
|
||||||
|
|
||||||
|
def test_bornes_non_chevauchantes_et_triees():
|
||||||
|
text = "Stop, dit-il. Non, reprit-elle."
|
||||||
|
incs = detect_incises(text, names=NAMES)
|
||||||
|
assert all(incs[i].end <= incs[i + 1].start for i in range(len(incs) - 1))
|
||||||
|
for inc in incs:
|
||||||
|
assert 0 <= inc.start < inc.end <= len(text)
|
||||||
40
frontend/dist/assets/index-CMUl6Yfl.js
vendored
Normal file
40
frontend/dist/assets/index-CMUl6Yfl.js
vendored
Normal file
File diff suppressed because one or more lines are too long
1
frontend/dist/assets/index-DlPmWkkU.css
vendored
Normal file
1
frontend/dist/assets/index-DlPmWkkU.css
vendored
Normal file
File diff suppressed because one or more lines are too long
13
frontend/dist/index.html
vendored
Normal file
13
frontend/dist/index.html
vendored
Normal file
@@ -0,0 +1,13 @@
|
|||||||
|
<!doctype html>
|
||||||
|
<html lang="fr">
|
||||||
|
<head>
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||||
|
<title>InkFlow — EPUB → Livre audio</title>
|
||||||
|
<script type="module" crossorigin src="/assets/index-CMUl6Yfl.js"></script>
|
||||||
|
<link rel="stylesheet" crossorigin href="/assets/index-DlPmWkkU.css">
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<div id="root"></div>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
12
frontend/index.html
Normal file
12
frontend/index.html
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
<!doctype html>
|
||||||
|
<html lang="fr">
|
||||||
|
<head>
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||||
|
<title>InkFlow — EPUB → Livre audio</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<div id="root"></div>
|
||||||
|
<script type="module" src="/src/main.jsx"></script>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
2767
frontend/package-lock.json
generated
Normal file
2767
frontend/package-lock.json
generated
Normal file
File diff suppressed because it is too large
Load Diff
22
frontend/package.json
Normal file
22
frontend/package.json
Normal file
@@ -0,0 +1,22 @@
|
|||||||
|
{
|
||||||
|
"name": "inkflow-frontend",
|
||||||
|
"private": true,
|
||||||
|
"version": "0.1.0",
|
||||||
|
"type": "module",
|
||||||
|
"scripts": {
|
||||||
|
"dev": "vite",
|
||||||
|
"build": "vite build",
|
||||||
|
"preview": "vite preview"
|
||||||
|
},
|
||||||
|
"dependencies": {
|
||||||
|
"react": "^18.3.1",
|
||||||
|
"react-dom": "^18.3.1"
|
||||||
|
},
|
||||||
|
"devDependencies": {
|
||||||
|
"@vitejs/plugin-react": "^4.3.4",
|
||||||
|
"autoprefixer": "^10.4.20",
|
||||||
|
"postcss": "^8.4.49",
|
||||||
|
"tailwindcss": "^3.4.17",
|
||||||
|
"vite": "^6.0.7"
|
||||||
|
}
|
||||||
|
}
|
||||||
6
frontend/postcss.config.js
Normal file
6
frontend/postcss.config.js
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
export default {
|
||||||
|
plugins: {
|
||||||
|
tailwindcss: {},
|
||||||
|
autoprefixer: {},
|
||||||
|
},
|
||||||
|
};
|
||||||
245
frontend/src/AnalysisEditor.jsx
Normal file
245
frontend/src/AnalysisEditor.jsx
Normal file
@@ -0,0 +1,245 @@
|
|||||||
|
import React, { useEffect, useMemo, useState } from "react";
|
||||||
|
import { api } from "./api.js";
|
||||||
|
import { Spinner } from "./ui.jsx";
|
||||||
|
|
||||||
|
const NARRATOR = "narrateur";
|
||||||
|
let _seq = 0;
|
||||||
|
const nextId = () => ++_seq;
|
||||||
|
|
||||||
|
export default function AnalysisEditor({ slug, book, state }) {
|
||||||
|
// Chapitres analysés (intersection ordre du livre x analyzed_chapters).
|
||||||
|
const analyzed = useMemo(() => {
|
||||||
|
const set = new Set(state.analyzed_chapters || []);
|
||||||
|
return book.chapters.filter((c) => set.has(c.index));
|
||||||
|
}, [book, state.analyzed_chapters]);
|
||||||
|
|
||||||
|
const [index, setIndex] = useState(() => analyzed[0]?.index ?? null);
|
||||||
|
const [analysis, setAnalysis] = useState(null); // { index, title, segments:[{_id,type,text,speaker}] }
|
||||||
|
const [names, setNames] = useState([]); // noms de personnages pour la datalist
|
||||||
|
const [loading, setLoading] = useState(false);
|
||||||
|
const [saved, setSaved] = useState(false);
|
||||||
|
// Derniere selection de texte dans une replique (pour "marquer comme incise").
|
||||||
|
const [sel, setSel] = useState({ id: null, start: 0, end: 0 });
|
||||||
|
|
||||||
|
// Filtres d'affichage (n'altèrent pas la sauvegarde).
|
||||||
|
const [query, setQuery] = useState("");
|
||||||
|
const [typeFilter, setTypeFilter] = useState("all");
|
||||||
|
const [speakerFilter, setSpeakerFilter] = useState("all");
|
||||||
|
|
||||||
|
// Si la liste des chapitres analysés change et que l'index courant disparaît.
|
||||||
|
useEffect(() => {
|
||||||
|
if (index == null || !analyzed.some((c) => c.index === index)) {
|
||||||
|
setIndex(analyzed[0]?.index ?? null);
|
||||||
|
}
|
||||||
|
}, [analyzed]); // eslint-disable-line react-hooks/exhaustive-deps
|
||||||
|
|
||||||
|
// Noms des personnages du casting (une fois).
|
||||||
|
useEffect(() => {
|
||||||
|
api.getCast(slug)
|
||||||
|
.then((d) => setNames((d.cast?.characters || []).map((c) => c.name)))
|
||||||
|
.catch(() => setNames([]));
|
||||||
|
}, [slug]);
|
||||||
|
|
||||||
|
// Chargement de l'analyse du chapitre sélectionné.
|
||||||
|
useEffect(() => {
|
||||||
|
if (index == null) { setAnalysis(null); return; }
|
||||||
|
setLoading(true);
|
||||||
|
setSaved(false);
|
||||||
|
api.getChapter(slug, index).then((d) => {
|
||||||
|
if (d.analysis) {
|
||||||
|
setAnalysis({
|
||||||
|
index: d.analysis.index,
|
||||||
|
title: d.analysis.title,
|
||||||
|
segments: (d.analysis.segments || []).map((s) => ({ ...s, _id: nextId() })),
|
||||||
|
});
|
||||||
|
} else {
|
||||||
|
setAnalysis({ index, title: d.chapter?.title || "", segments: null });
|
||||||
|
}
|
||||||
|
}).finally(() => setLoading(false));
|
||||||
|
}, [slug, index]);
|
||||||
|
|
||||||
|
const speakerOptions = useMemo(() => {
|
||||||
|
const set = new Set([NARRATOR, ...names]);
|
||||||
|
(analysis?.segments || []).forEach((s) => s.speaker && set.add(s.speaker));
|
||||||
|
return [...set];
|
||||||
|
}, [names, analysis]);
|
||||||
|
|
||||||
|
if (!analyzed.length)
|
||||||
|
return <p className="text-ink-muted">Lancez d'abord l'<b>Analyse</b> sur un chapitre.</p>;
|
||||||
|
|
||||||
|
const touch = (segments) => { setAnalysis((a) => ({ ...a, segments })); setSaved(false); };
|
||||||
|
|
||||||
|
const setSeg = (id, patch) =>
|
||||||
|
touch(analysis.segments.map((s) => {
|
||||||
|
if (s._id !== id) return s;
|
||||||
|
const next = { ...s, ...patch };
|
||||||
|
if (next.type === "narration") { next.speaker = NARRATOR; next.incises = []; }
|
||||||
|
// Edition du texte : on ecarte les incises devenues hors-bornes.
|
||||||
|
if (patch.text !== undefined) {
|
||||||
|
const len = next.text.length;
|
||||||
|
next.incises = (next.incises || []).filter(
|
||||||
|
(inc) => inc.start < inc.end && inc.end <= len);
|
||||||
|
}
|
||||||
|
return next;
|
||||||
|
}));
|
||||||
|
|
||||||
|
// Marque la portion [start,end) d'une replique comme incise (voix narrateur).
|
||||||
|
const addIncise = (id, start, end) =>
|
||||||
|
touch(analysis.segments.map((s) => {
|
||||||
|
if (s._id !== id) return s;
|
||||||
|
const incises = [...(s.incises || []), { start, end }]
|
||||||
|
.sort((a, b) => a.start - b.start)
|
||||||
|
.filter((inc, i, arr) => i === 0 || inc.start >= arr[i - 1].end);
|
||||||
|
return { ...s, incises };
|
||||||
|
}));
|
||||||
|
|
||||||
|
const removeIncise = (id, i) =>
|
||||||
|
touch(analysis.segments.map((s) =>
|
||||||
|
s._id !== id ? s : { ...s, incises: (s.incises || []).filter((_, k) => k !== i) }));
|
||||||
|
|
||||||
|
const removeSeg = (id) => touch(analysis.segments.filter((s) => s._id !== id));
|
||||||
|
|
||||||
|
const insertAfter = (id) => {
|
||||||
|
const segs = analysis.segments;
|
||||||
|
const pos = id == null ? segs.length : segs.findIndex((s) => s._id === id) + 1;
|
||||||
|
const next = [...segs];
|
||||||
|
next.splice(pos, 0, { _id: nextId(), type: "narration", text: "", speaker: NARRATOR });
|
||||||
|
touch(next);
|
||||||
|
};
|
||||||
|
|
||||||
|
const save = async () => {
|
||||||
|
const payload = {
|
||||||
|
index: analysis.index,
|
||||||
|
title: analysis.title,
|
||||||
|
segments: analysis.segments.map(({ _id, ...s }) => s),
|
||||||
|
};
|
||||||
|
await api.putAnalysis(slug, analysis.index, payload);
|
||||||
|
setSaved(true);
|
||||||
|
};
|
||||||
|
|
||||||
|
const segments = analysis?.segments;
|
||||||
|
const visible = (segments || []).filter((s) => {
|
||||||
|
if (typeFilter !== "all" && s.type !== typeFilter) return false;
|
||||||
|
if (speakerFilter !== "all" && s.speaker !== speakerFilter) return false;
|
||||||
|
if (query && !s.text.toLowerCase().includes(query.toLowerCase())) return false;
|
||||||
|
return true;
|
||||||
|
});
|
||||||
|
const dialogueCount = (segments || []).filter((s) => s.type === "dialogue").length;
|
||||||
|
|
||||||
|
return (
|
||||||
|
<div className="space-y-4">
|
||||||
|
<datalist id="speaker-list">
|
||||||
|
{speakerOptions.map((n) => <option key={n} value={n} />)}
|
||||||
|
</datalist>
|
||||||
|
|
||||||
|
{/* Barre de contrôle */}
|
||||||
|
<div className="card flex flex-wrap items-center gap-3 p-3">
|
||||||
|
<label className="text-sm text-ink-muted">Chapitre</label>
|
||||||
|
<select className="input" value={index ?? ""}
|
||||||
|
onChange={(e) => setIndex(Number(e.target.value))}>
|
||||||
|
{analyzed.map((c) => (
|
||||||
|
<option key={c.index} value={c.index}>{c.index} — {c.title}</option>
|
||||||
|
))}
|
||||||
|
</select>
|
||||||
|
{segments && (
|
||||||
|
<span className="text-xs text-ink-muted">
|
||||||
|
{segments.length} segments · {dialogueCount} dialogues
|
||||||
|
</span>
|
||||||
|
)}
|
||||||
|
<button className="btn-primary ml-auto" disabled={!segments} onClick={save}>
|
||||||
|
{saved ? "✓ enregistré" : "Enregistrer"}
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{loading && <p className="text-ink-muted"><Spinner /> chargement de l'analyse…</p>}
|
||||||
|
|
||||||
|
{!loading && segments === null && (
|
||||||
|
<p className="text-ink-muted">Ce chapitre n'a pas encore d'analyse. Lancez l'<b>Analyse</b>.</p>
|
||||||
|
)}
|
||||||
|
|
||||||
|
{!loading && segments && (
|
||||||
|
<>
|
||||||
|
{/* Filtres d'affichage */}
|
||||||
|
<div className="card flex flex-wrap items-center gap-3 p-3">
|
||||||
|
<input className="input flex-1 min-w-[12rem]" placeholder="Rechercher dans le texte…"
|
||||||
|
value={query} onChange={(e) => setQuery(e.target.value)} />
|
||||||
|
<select className="input" value={typeFilter} onChange={(e) => setTypeFilter(e.target.value)}>
|
||||||
|
<option value="all">tous types</option>
|
||||||
|
<option value="narration">narration</option>
|
||||||
|
<option value="dialogue">dialogue</option>
|
||||||
|
</select>
|
||||||
|
<select className="input" value={speakerFilter} onChange={(e) => setSpeakerFilter(e.target.value)}>
|
||||||
|
<option value="all">tous locuteurs</option>
|
||||||
|
{speakerOptions.map((n) => <option key={n} value={n}>{n}</option>)}
|
||||||
|
</select>
|
||||||
|
{visible.length !== segments.length && (
|
||||||
|
<span className="text-xs text-ink-muted">{visible.length} affichés</span>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div className="card divide-y divide-ink-edge">
|
||||||
|
{visible.map((s) => {
|
||||||
|
const canMark = s.type === "dialogue"
|
||||||
|
&& sel.id === s._id && sel.end > sel.start;
|
||||||
|
const incises = s.incises || [];
|
||||||
|
return (
|
||||||
|
<div key={s._id} className="px-4 py-2.5">
|
||||||
|
<div className="flex items-start gap-3">
|
||||||
|
<select className="input w-28 shrink-0" value={s.type}
|
||||||
|
onChange={(e) => setSeg(s._id, { type: e.target.value })}>
|
||||||
|
<option value="narration">narration</option>
|
||||||
|
<option value="dialogue">dialogue</option>
|
||||||
|
</select>
|
||||||
|
<textarea className="input flex-1 min-h-[2.5rem] resize-y font-serif text-sm"
|
||||||
|
rows={Math.min(6, Math.ceil((s.text.length || 1) / 80))}
|
||||||
|
value={s.text}
|
||||||
|
onSelect={(e) => s.type === "dialogue" && setSel({
|
||||||
|
id: s._id, start: e.target.selectionStart, end: e.target.selectionEnd })}
|
||||||
|
onChange={(e) => setSeg(s._id, { text: e.target.value })} />
|
||||||
|
<input className="input w-40 shrink-0" list="speaker-list"
|
||||||
|
placeholder="locuteur"
|
||||||
|
value={s.speaker} disabled={s.type === "narration"}
|
||||||
|
onChange={(e) => setSeg(s._id, { speaker: e.target.value })} />
|
||||||
|
<div className="flex shrink-0 gap-1">
|
||||||
|
<button className="btn-ghost" title="Insérer après"
|
||||||
|
onClick={() => insertAfter(s._id)}>+</button>
|
||||||
|
<button className="btn-ghost" title="Supprimer"
|
||||||
|
onClick={() => removeSeg(s._id)}>✕</button>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{/* Incises : portions lues par le narrateur dans la réplique */}
|
||||||
|
{s.type === "dialogue" && (incises.length > 0 || canMark) && (
|
||||||
|
<div className="mt-1.5 ml-[7.75rem] flex flex-wrap items-center gap-1.5">
|
||||||
|
<span className="text-[11px] uppercase tracking-wide text-ink-muted">incises</span>
|
||||||
|
{incises.map((inc, i) => (
|
||||||
|
<span key={i}
|
||||||
|
className="inline-flex items-center gap-1 rounded bg-ink-edge/40 px-1.5 py-0.5 text-xs"
|
||||||
|
title="Lu par la voix du narrateur">
|
||||||
|
<span className="text-ink-muted">🎙</span>
|
||||||
|
<span className="font-serif">{s.text.slice(inc.start, inc.end)}</span>
|
||||||
|
<button className="text-ink-muted hover:text-ink"
|
||||||
|
title="Retirer l'incise"
|
||||||
|
onClick={() => removeIncise(s._id, i)}>✕</button>
|
||||||
|
</span>
|
||||||
|
))}
|
||||||
|
{canMark && (
|
||||||
|
<button className="btn-ghost text-xs"
|
||||||
|
onClick={() => { addIncise(s._id, sel.start, sel.end);
|
||||||
|
setSel({ id: null, start: 0, end: 0 }); }}>
|
||||||
|
+ marquer la sélection
|
||||||
|
</button>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
); })}
|
||||||
|
<div className="px-4 py-2.5">
|
||||||
|
<button className="btn-ghost" onClick={() => insertAfter(null)}>+ ajouter un segment</button>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
}
|
||||||
44
frontend/src/App.jsx
Normal file
44
frontend/src/App.jsx
Normal file
@@ -0,0 +1,44 @@
|
|||||||
|
import React, { useState } from "react";
|
||||||
|
import Library from "./Library.jsx";
|
||||||
|
import BookView from "./BookView.jsx";
|
||||||
|
import Settings from "./Settings.jsx";
|
||||||
|
|
||||||
|
export default function App() {
|
||||||
|
// Permet d'ouvrir un livre directement via #slug (deep-link).
|
||||||
|
const [slug, setSlug] = useState(
|
||||||
|
() => (location.hash ? decodeURIComponent(location.hash.slice(1)) : null)
|
||||||
|
);
|
||||||
|
const [showSettings, setShowSettings] = useState(false);
|
||||||
|
|
||||||
|
const goHome = () => { setShowSettings(false); setSlug(null); };
|
||||||
|
|
||||||
|
return (
|
||||||
|
<div className="min-h-screen bg-ink-bg text-ink-text">
|
||||||
|
<header className="border-b border-ink-edge">
|
||||||
|
<div className="mx-auto flex max-w-6xl items-center gap-3 px-6 py-4">
|
||||||
|
<button onClick={goHome} className="flex items-center gap-2">
|
||||||
|
<span className="text-2xl">🖋️</span>
|
||||||
|
<span className="font-serif text-xl tracking-wide">
|
||||||
|
Ink<span className="text-ink-accent">Flow</span>
|
||||||
|
</span>
|
||||||
|
</button>
|
||||||
|
<span className="ml-2 hidden text-sm text-ink-muted sm:inline">
|
||||||
|
EPUB → livre audio · local · MLX
|
||||||
|
</span>
|
||||||
|
<button onClick={() => setShowSettings(true)} title="Réglages techniques"
|
||||||
|
className="ml-auto text-xl text-ink-muted hover:text-ink-text">⚙</button>
|
||||||
|
</div>
|
||||||
|
</header>
|
||||||
|
|
||||||
|
<main className="mx-auto max-w-6xl px-6 py-8">
|
||||||
|
{showSettings ? (
|
||||||
|
<Settings onBack={goHome} />
|
||||||
|
) : slug ? (
|
||||||
|
<BookView slug={slug} onBack={() => setSlug(null)} />
|
||||||
|
) : (
|
||||||
|
<Library onOpen={setSlug} />
|
||||||
|
)}
|
||||||
|
</main>
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
}
|
||||||
99
frontend/src/BookView.jsx
Normal file
99
frontend/src/BookView.jsx
Normal file
@@ -0,0 +1,99 @@
|
|||||||
|
import React, { useEffect, useState } from "react";
|
||||||
|
import { api, subscribeState } from "./api.js";
|
||||||
|
import { StatusChip, ProgressBar, Spinner } from "./ui.jsx";
|
||||||
|
import Chapters from "./Chapters.jsx";
|
||||||
|
import AnalysisEditor from "./AnalysisEditor.jsx";
|
||||||
|
import CastEditor from "./CastEditor.jsx";
|
||||||
|
import PronunciationEditor from "./PronunciationEditor.jsx";
|
||||||
|
|
||||||
|
const STAGES = [
|
||||||
|
{ key: "analyze", label: "Analyse", action: (s) => api.analyze(s), hint: "Découpe le texte, détecte les locuteurs et le casting." },
|
||||||
|
{ key: "cast", label: "Casting", action: (s) => api.castAuto(s), hint: "Attribue une voix à chaque personnage." },
|
||||||
|
{ key: "pronounce", label: "Prononciations", action: (s) => api.pronounce(s), hint: "Repère les mots à risque de mauvaise prononciation." },
|
||||||
|
];
|
||||||
|
|
||||||
|
export default function BookView({ slug, onBack }) {
|
||||||
|
const [data, setData] = useState(null);
|
||||||
|
const [state, setState] = useState(null);
|
||||||
|
const [tab, setTab] = useState("chapters");
|
||||||
|
|
||||||
|
useEffect(() => {
|
||||||
|
api.getBook(slug).then((d) => { setData(d); setState(d.state); });
|
||||||
|
const unsub = subscribeState(slug, setState);
|
||||||
|
return unsub;
|
||||||
|
}, [slug]);
|
||||||
|
|
||||||
|
if (!data) return <p className="text-ink-muted"><Spinner /> chargement…</p>;
|
||||||
|
const { book } = data;
|
||||||
|
const st = state || data.state;
|
||||||
|
const busy = !!st.active_stage;
|
||||||
|
|
||||||
|
return (
|
||||||
|
<div className="space-y-6">
|
||||||
|
<button onClick={onBack} className="text-sm text-ink-muted hover:text-ink-text">← Bibliothèque</button>
|
||||||
|
|
||||||
|
<div className="flex gap-5">
|
||||||
|
{book.cover_file && (
|
||||||
|
<img src={api.coverUrl(slug)} alt="" className="h-44 rounded-md border border-ink-edge object-cover" />
|
||||||
|
)}
|
||||||
|
<div className="flex-1">
|
||||||
|
<h1 className="font-serif text-2xl">{book.title}</h1>
|
||||||
|
<p className="text-ink-muted">{book.author}</p>
|
||||||
|
<p className="mt-1 text-sm text-ink-muted">{book.chapters.filter((c) => c.render).length} chapitres à narrer</p>
|
||||||
|
|
||||||
|
{busy && (
|
||||||
|
<div className="mt-4 max-w-md space-y-1">
|
||||||
|
<div className="flex justify-between text-xs text-ink-accent">
|
||||||
|
<span>{st.active_detail || st.active_stage}</span>
|
||||||
|
<span>{Math.round((st.active_progress || 0) * 100)}%</span>
|
||||||
|
</div>
|
||||||
|
<ProgressBar value={st.active_progress} />
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{/* Pipeline */}
|
||||||
|
<div className="grid grid-cols-1 gap-3 sm:grid-cols-3">
|
||||||
|
{STAGES.map((stage) => {
|
||||||
|
const status = st.stages?.[stage.key] || "pending";
|
||||||
|
return (
|
||||||
|
<div key={stage.key} className="card p-4">
|
||||||
|
<div className="flex items-center justify-between">
|
||||||
|
<span className="font-medium">{stage.label}</span>
|
||||||
|
<StatusChip status={status} />
|
||||||
|
</div>
|
||||||
|
<p className="mt-1 text-xs text-ink-muted">{stage.hint}</p>
|
||||||
|
<button className="btn-ghost mt-3" disabled={busy}
|
||||||
|
onClick={() => stage.action(slug)}>
|
||||||
|
{status === "done" ? "Relancer" : "Lancer"}
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
})}
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{/* Onglets */}
|
||||||
|
<div className="flex gap-1 border-b border-ink-edge">
|
||||||
|
{[
|
||||||
|
["chapters", "Chapitres"],
|
||||||
|
["analysis", "Analyse"],
|
||||||
|
["cast", "Casting"],
|
||||||
|
["pron", "Prononciation"],
|
||||||
|
].map(([key, label]) => (
|
||||||
|
<button key={key} onClick={() => setTab(key)}
|
||||||
|
className={`px-4 py-2 text-sm ${tab === key
|
||||||
|
? "border-b-2 border-ink-accent text-ink-text"
|
||||||
|
: "text-ink-muted hover:text-ink-text"}`}>
|
||||||
|
{label}
|
||||||
|
</button>
|
||||||
|
))}
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{tab === "chapters" && <Chapters slug={slug} book={book} state={st} busy={busy} />}
|
||||||
|
{tab === "analysis" && <AnalysisEditor slug={slug} book={book} state={st} />}
|
||||||
|
{tab === "cast" && <CastEditor slug={slug} busy={busy} />}
|
||||||
|
{tab === "pron" && <PronunciationEditor slug={slug} />}
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
}
|
||||||
119
frontend/src/CastEditor.jsx
Normal file
119
frontend/src/CastEditor.jsx
Normal file
@@ -0,0 +1,119 @@
|
|||||||
|
import React, { useEffect, useState } from "react";
|
||||||
|
import { api } from "./api.js";
|
||||||
|
import { Spinner } from "./ui.jsx";
|
||||||
|
|
||||||
|
function VoiceSelect({ voices, value, onChange }) {
|
||||||
|
return (
|
||||||
|
<select className="input" value={value || ""} onChange={(e) => onChange(e.target.value)}>
|
||||||
|
<option value="">— aucune —</option>
|
||||||
|
{voices.map((v) => (
|
||||||
|
<option key={v.id} value={v.id}>
|
||||||
|
{v.label || v.id} ({v.gender === "male" ? "H" : v.gender === "female" ? "F" : "?"})
|
||||||
|
</option>
|
||||||
|
))}
|
||||||
|
</select>
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
export default function CastEditor({ slug, busy }) {
|
||||||
|
const [cast, setCast] = useState(null);
|
||||||
|
const [voices, setVoices] = useState([]);
|
||||||
|
const [saved, setSaved] = useState(false);
|
||||||
|
const [playing, setPlaying] = useState(null);
|
||||||
|
const [msg, setMsg] = useState(null);
|
||||||
|
const dedupPending = React.useRef(false);
|
||||||
|
|
||||||
|
const reload = () =>
|
||||||
|
api.getCast(slug).then((d) => { setCast(d.cast); setVoices(d.voicebank.entries); });
|
||||||
|
|
||||||
|
useEffect(() => { reload(); }, [slug]);
|
||||||
|
// Recharge le casting quand un job de fond (dédup / casting chapitre) se termine.
|
||||||
|
useEffect(() => {
|
||||||
|
if (busy) return;
|
||||||
|
reload().then(() => {
|
||||||
|
if (dedupPending.current) {
|
||||||
|
dedupPending.current = false;
|
||||||
|
api.getCast(slug).then((d) =>
|
||||||
|
setMsg(`✓ déduplication terminée — ${d.cast.characters.length} personnages`));
|
||||||
|
}
|
||||||
|
});
|
||||||
|
}, [busy]);
|
||||||
|
|
||||||
|
const dedup = async () => {
|
||||||
|
setMsg(null);
|
||||||
|
try {
|
||||||
|
dedupPending.current = true;
|
||||||
|
await api.castDedup(slug);
|
||||||
|
setMsg("Déduplication lancée…");
|
||||||
|
} catch (e) {
|
||||||
|
dedupPending.current = false;
|
||||||
|
setMsg("Échec : " + e + " (le serveur backend est-il à jour ? redémarre-le)");
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
if (!cast) return <p className="text-ink-muted"><Spinner /> chargement du casting…</p>;
|
||||||
|
if (!cast.characters.length)
|
||||||
|
return <p className="text-ink-muted">Lancez d'abord l'<b>Analyse</b> puis le <b>Casting</b>.</p>;
|
||||||
|
|
||||||
|
const update = (patch) => { setCast({ ...cast, ...patch }); setSaved(false); };
|
||||||
|
const setChar = (name, voiceId) =>
|
||||||
|
update({ characters: cast.characters.map((c) => c.name === name ? { ...c, voice_id: voiceId } : c) });
|
||||||
|
|
||||||
|
const preview = async (voiceId) => {
|
||||||
|
if (!voiceId) return;
|
||||||
|
setPlaying(voiceId);
|
||||||
|
try {
|
||||||
|
const url = await api.previewVoice(voiceId, "Bonjour, voici un aperçu de cette voix.");
|
||||||
|
const a = new Audio(url);
|
||||||
|
a.onended = () => setPlaying(null);
|
||||||
|
a.play();
|
||||||
|
} catch { setPlaying(null); }
|
||||||
|
};
|
||||||
|
|
||||||
|
const save = async () => { await api.putCast(slug, cast); setSaved(true); };
|
||||||
|
|
||||||
|
return (
|
||||||
|
<div className="space-y-4">
|
||||||
|
<div className="card flex items-center gap-3 p-3">
|
||||||
|
<span className="text-sm text-ink-muted">Narrateur</span>
|
||||||
|
<VoiceSelect voices={voices} value={cast.narrator_voice_id}
|
||||||
|
onChange={(v) => update({ narrator_voice_id: v })} />
|
||||||
|
<button className="btn-ghost" onClick={() => preview(cast.narrator_voice_id)}>
|
||||||
|
{playing === cast.narrator_voice_id ? "♪" : "▶"} écouter
|
||||||
|
</button>
|
||||||
|
<button className="btn-ghost ml-auto" disabled={busy}
|
||||||
|
title="Fusionne les variantes d'un même personnage (Holden / James Holden / James)"
|
||||||
|
onClick={dedup}>
|
||||||
|
{busy ? "…" : "Dédupliquer"}
|
||||||
|
</button>
|
||||||
|
<button className="btn-primary" onClick={save}>
|
||||||
|
{saved ? "✓ enregistré" : "Enregistrer"}
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{msg && <p className="px-1 text-sm text-ink-muted">{msg}</p>}
|
||||||
|
|
||||||
|
<div className="card divide-y divide-ink-edge">
|
||||||
|
{cast.characters.map((c) => (
|
||||||
|
<div key={c.name} className="flex items-center gap-3 px-4 py-2.5">
|
||||||
|
<div className="flex-1 min-w-0">
|
||||||
|
<p className="truncate font-serif text-sm">{c.name}</p>
|
||||||
|
{c.aliases?.length > 0 && (
|
||||||
|
<p className="truncate text-xs text-ink-muted">alias : {c.aliases.join(", ")}</p>
|
||||||
|
)}
|
||||||
|
{c.description && <p className="truncate text-xs text-ink-muted">{c.description}</p>}
|
||||||
|
</div>
|
||||||
|
<span className="chip bg-ink-edge text-ink-muted">
|
||||||
|
{c.gender === "male" ? "homme" : c.gender === "female" ? "femme" : "?"}
|
||||||
|
</span>
|
||||||
|
<VoiceSelect voices={voices} value={c.voice_id}
|
||||||
|
onChange={(v) => setChar(c.name, v)} />
|
||||||
|
<button className="btn-ghost" onClick={() => preview(c.voice_id)}>
|
||||||
|
{playing === c.voice_id ? "♪" : "▶"}
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
))}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
}
|
||||||
98
frontend/src/Chapters.jsx
Normal file
98
frontend/src/Chapters.jsx
Normal file
@@ -0,0 +1,98 @@
|
|||||||
|
import React, { useEffect, useState } from "react";
|
||||||
|
import { api } from "./api.js";
|
||||||
|
import { StatusChip, ProgressBar } from "./ui.jsx";
|
||||||
|
|
||||||
|
export default function Chapters({ slug, book, state, busy }) {
|
||||||
|
const chapters = book.chapters.filter((c) => c.render);
|
||||||
|
const [backend, setBackend] = useState("kokoro");
|
||||||
|
const [mono, setMono] = useState(false);
|
||||||
|
const [selected, setSelected] = useState(() => new Set());
|
||||||
|
|
||||||
|
// Initialise le moteur sur le backend par defaut des reglages.
|
||||||
|
useEffect(() => {
|
||||||
|
api.getSettings().then((s) => s?.default_backend && setBackend(s.default_backend)).catch(() => {});
|
||||||
|
}, []);
|
||||||
|
|
||||||
|
const toggle = (idx) => {
|
||||||
|
const next = new Set(selected);
|
||||||
|
next.has(idx) ? next.delete(idx) : next.add(idx);
|
||||||
|
setSelected(next);
|
||||||
|
};
|
||||||
|
|
||||||
|
const renderChapters = (indexes) => {
|
||||||
|
if (!indexes.length) return;
|
||||||
|
api.render(slug, indexes, backend, mono);
|
||||||
|
};
|
||||||
|
|
||||||
|
return (
|
||||||
|
<div className="space-y-4">
|
||||||
|
<div className="card flex flex-wrap items-center gap-3 p-3">
|
||||||
|
<label className="text-sm text-ink-muted">Moteur</label>
|
||||||
|
<select className="input" value={backend} onChange={(e) => setBackend(e.target.value)}>
|
||||||
|
<option value="kokoro">Kokoro (rapide)</option>
|
||||||
|
<option value="qwen3">Qwen3 (qualité + clonage)</option>
|
||||||
|
</select>
|
||||||
|
<label className="flex items-center gap-2 text-sm text-ink-muted">
|
||||||
|
<input type="checkbox" checked={mono} onChange={(e) => setMono(e.target.checked)} />
|
||||||
|
mono-narrateur
|
||||||
|
</label>
|
||||||
|
<div className="ml-auto flex gap-2">
|
||||||
|
<button className="btn-ghost" disabled={busy || !selected.size}
|
||||||
|
onClick={() => renderChapters([...selected])}>
|
||||||
|
Rendre la sélection ({selected.size})
|
||||||
|
</button>
|
||||||
|
<button className="btn-primary" disabled={busy}
|
||||||
|
onClick={() => renderChapters(chapters.map((c) => c.index))}>
|
||||||
|
Rendre tout
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div className="card divide-y divide-ink-edge">
|
||||||
|
{chapters.map((c) => {
|
||||||
|
const rs = state.render?.[c.index] || state.render?.[String(c.index)] || {};
|
||||||
|
const analyzed = (state.analyzed_chapters || []).includes(c.index);
|
||||||
|
return (
|
||||||
|
<div key={c.index} className="flex items-center gap-3 px-4 py-2.5">
|
||||||
|
<input type="checkbox" checked={selected.has(c.index)}
|
||||||
|
onChange={() => toggle(c.index)} />
|
||||||
|
<div className="w-9 text-center text-xs text-ink-muted">{c.index}</div>
|
||||||
|
<div className="flex-1 min-w-0">
|
||||||
|
<p className="truncate font-serif text-sm">{c.title}</p>
|
||||||
|
<div className="mt-0.5 flex items-center gap-2 text-xs text-ink-muted">
|
||||||
|
<span>{c.word_count} mots</span>
|
||||||
|
{c.pov && <span className="chip bg-ink-edge text-ink-muted">{c.pov}</span>}
|
||||||
|
{analyzed && <span className="text-emerald-400">analysé</span>}
|
||||||
|
</div>
|
||||||
|
{rs.status === "running" && (
|
||||||
|
<div className="mt-1.5 max-w-xs"><ProgressBar value={rs.progress} /></div>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
{rs.status && <StatusChip status={rs.status} />}
|
||||||
|
{rs.mp3 && (
|
||||||
|
<>
|
||||||
|
<audio controls src={api.audioUrl(slug, c.index)} className="h-8" />
|
||||||
|
<a className="btn-ghost" href={api.audioUrl(slug, c.index)} download>↓</a>
|
||||||
|
</>
|
||||||
|
)}
|
||||||
|
{!busy && (
|
||||||
|
<>
|
||||||
|
<button className="btn-ghost" title={analyzed ? "Ré-analyser ce chapitre" : "Analyser ce chapitre"}
|
||||||
|
onClick={() => api.analyze(slug, [c.index])}>
|
||||||
|
{analyzed ? "Ré-analyser" : "Analyser"}
|
||||||
|
</button>
|
||||||
|
<button className="btn-ghost" title="Ré-analyser le casting de ce chapitre (sans re-segmenter)"
|
||||||
|
onClick={() => api.castAnalyze(slug, [c.index])}>
|
||||||
|
Casting
|
||||||
|
</button>
|
||||||
|
<button className="btn-ghost" title="Rendre ce chapitre"
|
||||||
|
onClick={() => renderChapters([c.index])}>▶</button>
|
||||||
|
</>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
})}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
}
|
||||||
80
frontend/src/Library.jsx
Normal file
80
frontend/src/Library.jsx
Normal file
@@ -0,0 +1,80 @@
|
|||||||
|
import React, { useEffect, useRef, useState } from "react";
|
||||||
|
import { api } from "./api.js";
|
||||||
|
import { Spinner } from "./ui.jsx";
|
||||||
|
|
||||||
|
export default function Library({ onOpen }) {
|
||||||
|
const [books, setBooks] = useState(null);
|
||||||
|
const [uploading, setUploading] = useState(false);
|
||||||
|
const [error, setError] = useState(null);
|
||||||
|
const fileRef = useRef();
|
||||||
|
|
||||||
|
const refresh = () => api.listBooks().then(setBooks).catch((e) => setError(String(e)));
|
||||||
|
useEffect(() => { refresh(); }, []);
|
||||||
|
|
||||||
|
const upload = async (file) => {
|
||||||
|
if (!file) return;
|
||||||
|
setUploading(true);
|
||||||
|
setError(null);
|
||||||
|
try {
|
||||||
|
const { slug } = await api.uploadBook(file);
|
||||||
|
await refresh();
|
||||||
|
onOpen(slug);
|
||||||
|
} catch (e) {
|
||||||
|
setError("Échec de l'import : " + e);
|
||||||
|
} finally {
|
||||||
|
setUploading(false);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
return (
|
||||||
|
<div className="space-y-8">
|
||||||
|
<section
|
||||||
|
onDragOver={(e) => e.preventDefault()}
|
||||||
|
onDrop={(e) => { e.preventDefault(); upload(e.dataTransfer.files[0]); }}
|
||||||
|
className="card flex flex-col items-center justify-center gap-3 border-dashed py-12 text-center"
|
||||||
|
>
|
||||||
|
<div className="text-4xl">📖</div>
|
||||||
|
<p className="font-serif text-lg">Déposez un fichier EPUB</p>
|
||||||
|
<p className="text-sm text-ink-muted">ou</p>
|
||||||
|
<button className="btn-primary" disabled={uploading}
|
||||||
|
onClick={() => fileRef.current?.click()}>
|
||||||
|
{uploading ? <Spinner /> : null}
|
||||||
|
{uploading ? "Import en cours…" : "Choisir un fichier"}
|
||||||
|
</button>
|
||||||
|
<input ref={fileRef} type="file" accept=".epub" className="hidden"
|
||||||
|
onChange={(e) => upload(e.target.files[0])} />
|
||||||
|
</section>
|
||||||
|
|
||||||
|
{error && <p className="text-sm text-red-400">{error}</p>}
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<h2 className="mb-3 font-serif text-lg text-ink-muted">Bibliothèque</h2>
|
||||||
|
{books === null ? (
|
||||||
|
<p className="text-ink-muted"><Spinner /> chargement…</p>
|
||||||
|
) : books.length === 0 ? (
|
||||||
|
<p className="text-ink-muted">Aucun livre pour l'instant.</p>
|
||||||
|
) : (
|
||||||
|
<div className="grid grid-cols-2 gap-4 sm:grid-cols-3 lg:grid-cols-4">
|
||||||
|
{books.map((b) => (
|
||||||
|
<button key={b.slug} onClick={() => onOpen(b.slug)}
|
||||||
|
className="card group overflow-hidden text-left transition-transform hover:-translate-y-1">
|
||||||
|
<div className="aspect-[2/3] w-full bg-ink-edge">
|
||||||
|
{b.cover && (
|
||||||
|
<img src={b.cover} alt="" className="h-full w-full object-cover" />
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
<div className="p-3">
|
||||||
|
<p className="line-clamp-2 font-serif text-sm">{b.title}</p>
|
||||||
|
<p className="mt-1 text-xs text-ink-muted">{b.author}</p>
|
||||||
|
<p className="mt-2 text-xs text-ink-accent">
|
||||||
|
{b.rendered}/{b.chapters} chapitres rendus
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
</button>
|
||||||
|
))}
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
</section>
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
}
|
||||||
59
frontend/src/PronunciationEditor.jsx
Normal file
59
frontend/src/PronunciationEditor.jsx
Normal file
@@ -0,0 +1,59 @@
|
|||||||
|
import React, { useEffect, useState } from "react";
|
||||||
|
import { api } from "./api.js";
|
||||||
|
import { Spinner } from "./ui.jsx";
|
||||||
|
|
||||||
|
export default function PronunciationEditor({ slug }) {
|
||||||
|
const [entries, setEntries] = useState(null);
|
||||||
|
const [saved, setSaved] = useState(false);
|
||||||
|
|
||||||
|
useEffect(() => {
|
||||||
|
api.getPron(slug).then((d) => setEntries(d.entries || []));
|
||||||
|
}, [slug]);
|
||||||
|
|
||||||
|
if (entries === null) return <p className="text-ink-muted"><Spinner /> chargement…</p>;
|
||||||
|
|
||||||
|
const dirty = () => setSaved(false);
|
||||||
|
const setRow = (i, patch) => {
|
||||||
|
setEntries(entries.map((e, j) => (j === i ? { ...e, ...patch } : e)));
|
||||||
|
dirty();
|
||||||
|
};
|
||||||
|
const add = () => { setEntries([...entries, { term: "", replacement: "", enabled: true }]); dirty(); };
|
||||||
|
const remove = (i) => { setEntries(entries.filter((_, j) => j !== i)); dirty(); };
|
||||||
|
const save = async () => {
|
||||||
|
await api.putPron(slug, { entries: entries.filter((e) => e.term) });
|
||||||
|
setSaved(true);
|
||||||
|
};
|
||||||
|
|
||||||
|
return (
|
||||||
|
<div className="space-y-4">
|
||||||
|
<div className="flex items-center gap-3">
|
||||||
|
<p className="text-sm text-ink-muted">
|
||||||
|
Corrigez la graphie des mots mal prononcés. La colonne « prononciation » remplace le terme avant la synthèse.
|
||||||
|
</p>
|
||||||
|
<button className="btn-ghost ml-auto" onClick={add}>+ ajouter</button>
|
||||||
|
<button className="btn-primary" onClick={save}>{saved ? "✓ enregistré" : "Enregistrer"}</button>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{entries.length === 0 ? (
|
||||||
|
<p className="text-ink-muted">Aucune entrée. Lancez l'étape <b>Prononciations</b> ou ajoutez-en.</p>
|
||||||
|
) : (
|
||||||
|
<div className="card divide-y divide-ink-edge">
|
||||||
|
<div className="grid grid-cols-[1fr_1fr_auto_auto] gap-3 px-4 py-2 text-xs uppercase text-ink-muted">
|
||||||
|
<span>Terme</span><span>Prononciation</span><span>Actif</span><span></span>
|
||||||
|
</div>
|
||||||
|
{entries.map((e, i) => (
|
||||||
|
<div key={i} className="grid grid-cols-[1fr_1fr_auto_auto] items-center gap-3 px-4 py-2">
|
||||||
|
<input className="input" value={e.term}
|
||||||
|
onChange={(ev) => setRow(i, { term: ev.target.value })} />
|
||||||
|
<input className="input" value={e.replacement}
|
||||||
|
onChange={(ev) => setRow(i, { replacement: ev.target.value })} />
|
||||||
|
<input type="checkbox" checked={e.enabled !== false}
|
||||||
|
onChange={(ev) => setRow(i, { enabled: ev.target.checked })} />
|
||||||
|
<button className="text-ink-muted hover:text-red-400" onClick={() => remove(i)}>✕</button>
|
||||||
|
</div>
|
||||||
|
))}
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
}
|
||||||
142
frontend/src/Settings.jsx
Normal file
142
frontend/src/Settings.jsx
Normal file
@@ -0,0 +1,142 @@
|
|||||||
|
import React, { useEffect, useState } from "react";
|
||||||
|
import { api } from "./api.js";
|
||||||
|
import { Spinner } from "./ui.jsx";
|
||||||
|
|
||||||
|
// Description declarative des champs, groupes par section.
|
||||||
|
const SECTIONS = [
|
||||||
|
{
|
||||||
|
title: "Modèles (identifiants MLX / HuggingFace)",
|
||||||
|
hint: "Changer un identifiant recharge un autre modèle (peut déclencher un téléchargement au prochain usage).",
|
||||||
|
fields: [
|
||||||
|
{ key: "gemma_model", label: "Gemma (analyse)", type: "text" },
|
||||||
|
{ key: "qwen3_model", label: "Qwen3-TTS (rendu)", type: "text" },
|
||||||
|
{ key: "kokoro_model", label: "Kokoro (preview)", type: "text" },
|
||||||
|
],
|
||||||
|
},
|
||||||
|
{
|
||||||
|
title: "Génération Gemma",
|
||||||
|
hint: "Paramètres d'échantillonnage de l'analyse (locuteurs, personnages, prononciations).",
|
||||||
|
fields: [
|
||||||
|
{ key: "gemma_temperature", label: "Température", type: "number", step: 0.05, min: 0, max: 2 },
|
||||||
|
{ key: "gemma_max_tokens", label: "Max tokens", type: "number", step: 1, min: 64, max: 8192 },
|
||||||
|
],
|
||||||
|
},
|
||||||
|
{
|
||||||
|
title: "Prompts système (analyse)",
|
||||||
|
hint: "Instructions envoyées à Gemma avant chaque tâche. Le modèle doit répondre en JSON.",
|
||||||
|
fields: [
|
||||||
|
{ key: "prompt_speakers", label: "Attribution des locuteurs", type: "textarea" },
|
||||||
|
{ key: "prompt_characters", label: "Extraction des personnages", type: "textarea" },
|
||||||
|
{ key: "prompt_pronunciation", label: "Mots à risque (prononciation)", type: "textarea" },
|
||||||
|
],
|
||||||
|
},
|
||||||
|
{
|
||||||
|
title: "Casting (déduplication)",
|
||||||
|
hint: "Le rapprochement des variantes de noms (Holden / James Holden / James) est heuristique et sûr. La passe Gemma ajoute les variantes non évidentes (diminutifs, titres) mais, avec un petit modèle local, produit des fusions erronées.",
|
||||||
|
fields: [
|
||||||
|
{ key: "dedup_use_gemma", label: "Affiner la déduplication avec Gemma (moins sûr)", type: "checkbox" },
|
||||||
|
],
|
||||||
|
},
|
||||||
|
{
|
||||||
|
title: "TTS (voix par défaut)",
|
||||||
|
hint: "Backend et voix utilisés par défaut pour le rendu et les replis.",
|
||||||
|
fields: [
|
||||||
|
{ key: "default_backend", label: "Backend par défaut", type: "select",
|
||||||
|
options: [["kokoro", "Kokoro (rapide)"], ["qwen3", "Qwen3 (qualité + clonage)"]] },
|
||||||
|
{ key: "language", label: "Langue (Qwen3)", type: "text" },
|
||||||
|
{ key: "kokoro_lang_code", label: "Code langue Kokoro", type: "text" },
|
||||||
|
{ key: "kokoro_default_voice", label: "Voix Kokoro par défaut", type: "text" },
|
||||||
|
{ key: "qwen3_default_voice", label: "Voix Qwen3 par défaut", type: "text" },
|
||||||
|
],
|
||||||
|
},
|
||||||
|
{
|
||||||
|
title: "Audio (encodage final)",
|
||||||
|
hint: "Appliqué à la concaténation et à l'export MP3.",
|
||||||
|
fields: [
|
||||||
|
{ key: "target_sample_rate", label: "Sample rate (Hz)", type: "number", step: 1000, min: 8000, max: 48000 },
|
||||||
|
{ key: "mp3_bitrate", label: "Bitrate MP3", type: "text" },
|
||||||
|
{ key: "target_dbfs", label: "Normalisation (dBFS)", type: "number", step: 0.5, min: -40, max: 0 },
|
||||||
|
],
|
||||||
|
},
|
||||||
|
];
|
||||||
|
|
||||||
|
function Field({ field, value, onChange }) {
|
||||||
|
const common = "input w-full";
|
||||||
|
if (field.type === "checkbox")
|
||||||
|
return <input type="checkbox" className="h-4 w-4"
|
||||||
|
checked={!!value} onChange={(e) => onChange(e.target.checked)} />;
|
||||||
|
if (field.type === "textarea")
|
||||||
|
return <textarea className={`${common} min-h-[5rem] resize-y text-sm`} rows={4}
|
||||||
|
value={value ?? ""} onChange={(e) => onChange(e.target.value)} />;
|
||||||
|
if (field.type === "select")
|
||||||
|
return <select className={common} value={value ?? ""} onChange={(e) => onChange(e.target.value)}>
|
||||||
|
{field.options.map(([v, lbl]) => <option key={v} value={v}>{lbl}</option>)}
|
||||||
|
</select>;
|
||||||
|
if (field.type === "number")
|
||||||
|
return <input className={common} type="number"
|
||||||
|
step={field.step} min={field.min} max={field.max}
|
||||||
|
value={value ?? ""} onChange={(e) => onChange(e.target.value === "" ? "" : Number(e.target.value))} />;
|
||||||
|
return <input className={common} type="text"
|
||||||
|
value={value ?? ""} onChange={(e) => onChange(e.target.value)} />;
|
||||||
|
}
|
||||||
|
|
||||||
|
export default function Settings({ onBack }) {
|
||||||
|
const [settings, setSettings] = useState(null);
|
||||||
|
const [saved, setSaved] = useState(false);
|
||||||
|
const [error, setError] = useState(null);
|
||||||
|
|
||||||
|
useEffect(() => {
|
||||||
|
api.getSettings().then(setSettings).catch((e) => setError(String(e)));
|
||||||
|
}, []);
|
||||||
|
|
||||||
|
if (error) return <p className="text-sm text-red-400">{error}</p>;
|
||||||
|
if (!settings) return <p className="text-ink-muted"><Spinner /> chargement des réglages…</p>;
|
||||||
|
|
||||||
|
const set = (key, val) => { setSettings({ ...settings, [key]: val }); setSaved(false); };
|
||||||
|
|
||||||
|
const save = async () => {
|
||||||
|
setError(null);
|
||||||
|
try { await api.putSettings(settings); setSaved(true); }
|
||||||
|
catch (e) { setError("Échec de l'enregistrement : " + e); }
|
||||||
|
};
|
||||||
|
|
||||||
|
return (
|
||||||
|
<div className="space-y-6">
|
||||||
|
<div className="flex items-center gap-3">
|
||||||
|
<button onClick={onBack} className="text-sm text-ink-muted hover:text-ink-text">← Bibliothèque</button>
|
||||||
|
<h1 className="font-serif text-2xl">Réglages techniques</h1>
|
||||||
|
<button className="btn-primary ml-auto" onClick={save}>
|
||||||
|
{saved ? "✓ enregistré" : "Enregistrer"}
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<p className="text-sm text-ink-muted">
|
||||||
|
Réglages globaux appliqués à toute l'app. Les changements de modèle prennent effet au
|
||||||
|
prochain lancement d'analyse ou de rendu.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
{SECTIONS.map((sec) => (
|
||||||
|
<section key={sec.title} className="card p-4 space-y-3">
|
||||||
|
<div>
|
||||||
|
<h2 className="font-medium">{sec.title}</h2>
|
||||||
|
{sec.hint && <p className="text-xs text-ink-muted">{sec.hint}</p>}
|
||||||
|
</div>
|
||||||
|
<div className="grid gap-3">
|
||||||
|
{sec.fields.map((f) => (
|
||||||
|
<label key={f.key} className="grid gap-1">
|
||||||
|
<span className="text-sm text-ink-muted">{f.label}</span>
|
||||||
|
<Field field={f} value={settings[f.key]} onChange={(v) => set(f.key, v)} />
|
||||||
|
</label>
|
||||||
|
))}
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
|
))}
|
||||||
|
|
||||||
|
<div className="flex justify-end">
|
||||||
|
<button className="btn-primary" onClick={save}>
|
||||||
|
{saved ? "✓ enregistré" : "Enregistrer"}
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
}
|
||||||
64
frontend/src/api.js
Normal file
64
frontend/src/api.js
Normal file
@@ -0,0 +1,64 @@
|
|||||||
|
// Client API InkFlow : wrappers fetch + abonnement WebSocket a l'etat.
|
||||||
|
|
||||||
|
async function j(url, opts) {
|
||||||
|
const res = await fetch(url, opts);
|
||||||
|
if (!res.ok) throw new Error(`${res.status} ${await res.text()}`);
|
||||||
|
const ct = res.headers.get("content-type") || "";
|
||||||
|
return ct.includes("application/json") ? res.json() : res;
|
||||||
|
}
|
||||||
|
|
||||||
|
const json = (method, body) => ({
|
||||||
|
method,
|
||||||
|
headers: { "Content-Type": "application/json" },
|
||||||
|
body: body ? JSON.stringify(body) : undefined,
|
||||||
|
});
|
||||||
|
|
||||||
|
export const api = {
|
||||||
|
listBooks: () => j("/api/books"),
|
||||||
|
uploadBook: (file) => {
|
||||||
|
const fd = new FormData();
|
||||||
|
fd.append("file", file);
|
||||||
|
return j("/api/books", { method: "POST", body: fd });
|
||||||
|
},
|
||||||
|
getBook: (slug) => j(`/api/books/${slug}`),
|
||||||
|
getChapter: (slug, idx) => j(`/api/books/${slug}/chapters/${idx}`),
|
||||||
|
putAnalysis: (slug, idx, analysis) =>
|
||||||
|
j(`/api/books/${slug}/chapters/${idx}/analysis`, json("PUT", analysis)),
|
||||||
|
analyze: (slug, chapters) => j(`/api/books/${slug}/analyze`, json("POST", { chapters })),
|
||||||
|
pronounce: (slug) => j(`/api/books/${slug}/pronounce`, json("POST")),
|
||||||
|
castAuto: (slug) => j(`/api/books/${slug}/cast/auto`, json("POST")),
|
||||||
|
castAnalyze: (slug, chapters) =>
|
||||||
|
j(`/api/books/${slug}/cast/analyze`, json("POST", { chapters })),
|
||||||
|
castDedup: (slug) => j(`/api/books/${slug}/cast/dedup`, json("POST")),
|
||||||
|
render: (slug, chapters, backend, mono) =>
|
||||||
|
j(`/api/books/${slug}/render`, json("POST", { chapters, backend, mono })),
|
||||||
|
getCast: (slug) => j(`/api/books/${slug}/cast`),
|
||||||
|
putCast: (slug, cast) => j(`/api/books/${slug}/cast`, json("PUT", cast)),
|
||||||
|
getPron: (slug) => j(`/api/books/${slug}/pronunciation`),
|
||||||
|
putPron: (slug, pron) => j(`/api/books/${slug}/pronunciation`, json("PUT", pron)),
|
||||||
|
getSettings: () => j("/api/settings"),
|
||||||
|
putSettings: (settings) => j("/api/settings", json("PUT", settings)),
|
||||||
|
audioUrl: (slug, idx) => `/api/books/${slug}/audio/${idx}`,
|
||||||
|
coverUrl: (slug) => `/api/books/${slug}/cover`,
|
||||||
|
previewVoice: async (voiceId, text) => {
|
||||||
|
const res = await fetch("/api/voicebank/preview", json("POST", { voice_id: voiceId, text }));
|
||||||
|
if (!res.ok) throw new Error("preview");
|
||||||
|
return URL.createObjectURL(await res.blob());
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
// Abonnement temps reel a l'etat d'un livre. Reconnecte automatiquement.
|
||||||
|
export function subscribeState(slug, onState) {
|
||||||
|
let ws, closed = false;
|
||||||
|
const connect = () => {
|
||||||
|
const proto = location.protocol === "https:" ? "wss" : "ws";
|
||||||
|
ws = new WebSocket(`${proto}://${location.host}/ws/${slug}`);
|
||||||
|
ws.onmessage = (e) => {
|
||||||
|
const msg = JSON.parse(e.data);
|
||||||
|
if (msg.type === "state") onState(msg.state);
|
||||||
|
};
|
||||||
|
ws.onclose = () => { if (!closed) setTimeout(connect, 1500); };
|
||||||
|
};
|
||||||
|
connect();
|
||||||
|
return () => { closed = true; ws && ws.close(); };
|
||||||
|
}
|
||||||
37
frontend/src/index.css
Normal file
37
frontend/src/index.css
Normal file
@@ -0,0 +1,37 @@
|
|||||||
|
@tailwind base;
|
||||||
|
@tailwind components;
|
||||||
|
@tailwind utilities;
|
||||||
|
|
||||||
|
:root {
|
||||||
|
color-scheme: dark;
|
||||||
|
}
|
||||||
|
|
||||||
|
body {
|
||||||
|
margin: 0;
|
||||||
|
background: #14110f;
|
||||||
|
color: #ede4d8;
|
||||||
|
font-family: system-ui, -apple-system, "Segoe UI", sans-serif;
|
||||||
|
}
|
||||||
|
|
||||||
|
@layer components {
|
||||||
|
.btn {
|
||||||
|
@apply inline-flex items-center gap-2 rounded-md px-3 py-1.5 text-sm font-medium
|
||||||
|
transition-colors disabled:opacity-40 disabled:cursor-not-allowed;
|
||||||
|
}
|
||||||
|
.btn-primary {
|
||||||
|
@apply btn bg-ink-accent text-ink-bg hover:bg-ink-accent2;
|
||||||
|
}
|
||||||
|
.btn-ghost {
|
||||||
|
@apply btn border border-ink-edge text-ink-text hover:bg-ink-edge;
|
||||||
|
}
|
||||||
|
.card {
|
||||||
|
@apply rounded-lg border border-ink-edge bg-ink-panel;
|
||||||
|
}
|
||||||
|
.chip {
|
||||||
|
@apply inline-flex items-center rounded-full px-2 py-0.5 text-xs font-medium;
|
||||||
|
}
|
||||||
|
.input {
|
||||||
|
@apply rounded-md border border-ink-edge bg-ink-bg px-2 py-1 text-sm
|
||||||
|
text-ink-text outline-none focus:border-ink-accent;
|
||||||
|
}
|
||||||
|
}
|
||||||
6
frontend/src/main.jsx
Normal file
6
frontend/src/main.jsx
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
import React from "react";
|
||||||
|
import { createRoot } from "react-dom/client";
|
||||||
|
import App from "./App.jsx";
|
||||||
|
import "./index.css";
|
||||||
|
|
||||||
|
createRoot(document.getElementById("root")).render(<App />);
|
||||||
35
frontend/src/ui.jsx
Normal file
35
frontend/src/ui.jsx
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
// Petits widgets partages.
|
||||||
|
import React from "react";
|
||||||
|
|
||||||
|
const STATUS_STYLE = {
|
||||||
|
done: "bg-emerald-900/50 text-emerald-300",
|
||||||
|
running: "bg-ink-accent/20 text-ink-accent",
|
||||||
|
error: "bg-red-900/50 text-red-300",
|
||||||
|
pending: "bg-ink-edge text-ink-muted",
|
||||||
|
};
|
||||||
|
const STATUS_LABEL = { done: "terminé", running: "en cours", error: "erreur", pending: "en attente" };
|
||||||
|
|
||||||
|
export function StatusChip({ status }) {
|
||||||
|
return (
|
||||||
|
<span className={`chip ${STATUS_STYLE[status] || STATUS_STYLE.pending}`}>
|
||||||
|
{STATUS_LABEL[status] || status}
|
||||||
|
</span>
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
export function ProgressBar({ value }) {
|
||||||
|
return (
|
||||||
|
<div className="h-1.5 w-full overflow-hidden rounded-full bg-ink-edge">
|
||||||
|
<div
|
||||||
|
className="h-full bg-ink-accent transition-all duration-300"
|
||||||
|
style={{ width: `${Math.round((value || 0) * 100)}%` }}
|
||||||
|
/>
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
export function Spinner() {
|
||||||
|
return (
|
||||||
|
<span className="inline-block h-3.5 w-3.5 animate-spin rounded-full border-2 border-ink-accent border-t-transparent" />
|
||||||
|
);
|
||||||
|
}
|
||||||
23
frontend/tailwind.config.js
Normal file
23
frontend/tailwind.config.js
Normal file
@@ -0,0 +1,23 @@
|
|||||||
|
/** @type {import('tailwindcss').Config} */
|
||||||
|
export default {
|
||||||
|
content: ["./index.html", "./src/**/*.{js,jsx}"],
|
||||||
|
theme: {
|
||||||
|
extend: {
|
||||||
|
colors: {
|
||||||
|
ink: {
|
||||||
|
bg: "#14110f",
|
||||||
|
panel: "#1d1916",
|
||||||
|
edge: "#2c2622",
|
||||||
|
muted: "#9a8c7d",
|
||||||
|
text: "#ede4d8",
|
||||||
|
accent: "#d9a441",
|
||||||
|
accent2: "#b9763f",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
fontFamily: {
|
||||||
|
serif: ["Georgia", "Cambria", "serif"],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
plugins: [],
|
||||||
|
};
|
||||||
14
frontend/vite.config.js
Normal file
14
frontend/vite.config.js
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
import { defineConfig } from "vite";
|
||||||
|
import react from "@vitejs/plugin-react";
|
||||||
|
|
||||||
|
// En dev, l'UI tourne sur 5173 et proxifie l'API/WS vers le backend (8000).
|
||||||
|
export default defineConfig({
|
||||||
|
plugins: [react()],
|
||||||
|
server: {
|
||||||
|
port: 5173,
|
||||||
|
proxy: {
|
||||||
|
"/api": { target: "http://127.0.0.1:8000", changeOrigin: true },
|
||||||
|
"/ws": { target: "ws://127.0.0.1:8000", ws: true },
|
||||||
|
},
|
||||||
|
},
|
||||||
|
});
|
||||||
BIN
voicebank/clips/f_bella.wav
Normal file
BIN
voicebank/clips/f_bella.wav
Normal file
Binary file not shown.
BIN
voicebank/clips/f_emma.wav
Normal file
BIN
voicebank/clips/f_emma.wav
Normal file
Binary file not shown.
BIN
voicebank/clips/f_heart.wav
Normal file
BIN
voicebank/clips/f_heart.wav
Normal file
Binary file not shown.
BIN
voicebank/clips/f_nicole.wav
Normal file
BIN
voicebank/clips/f_nicole.wav
Normal file
Binary file not shown.
BIN
voicebank/clips/fr_f_siwis.wav
Normal file
BIN
voicebank/clips/fr_f_siwis.wav
Normal file
Binary file not shown.
BIN
voicebank/clips/m_eric.wav
Normal file
BIN
voicebank/clips/m_eric.wav
Normal file
Binary file not shown.
BIN
voicebank/clips/m_fenrir.wav
Normal file
BIN
voicebank/clips/m_fenrir.wav
Normal file
Binary file not shown.
BIN
voicebank/clips/m_george.wav
Normal file
BIN
voicebank/clips/m_george.wav
Normal file
Binary file not shown.
BIN
voicebank/clips/m_lewis.wav
Normal file
BIN
voicebank/clips/m_lewis.wav
Normal file
Binary file not shown.
BIN
voicebank/clips/m_michael.wav
Normal file
BIN
voicebank/clips/m_michael.wav
Normal file
Binary file not shown.
BIN
voicebank/clips/m_santa.wav
Normal file
BIN
voicebank/clips/m_santa.wav
Normal file
Binary file not shown.
114
voicebank/metadata.json
Normal file
114
voicebank/metadata.json
Normal file
@@ -0,0 +1,114 @@
|
|||||||
|
{
|
||||||
|
"entries": [
|
||||||
|
{
|
||||||
|
"id": "fr_f_siwis",
|
||||||
|
"kokoro_voice": "ff_siwis",
|
||||||
|
"gender": "female",
|
||||||
|
"age": "adult",
|
||||||
|
"lang": "fr",
|
||||||
|
"label": "Siwis (FR)",
|
||||||
|
"ref_audio": "clips/fr_f_siwis.wav",
|
||||||
|
"ref_text": "L'univers est toujours plus étrange qu'on ne le croit. Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "f_bella",
|
||||||
|
"kokoro_voice": "af_bella",
|
||||||
|
"gender": "female",
|
||||||
|
"age": "adult",
|
||||||
|
"lang": "fr",
|
||||||
|
"label": "Bella",
|
||||||
|
"ref_audio": "clips/f_bella.wav",
|
||||||
|
"ref_text": "L'univers est toujours plus étrange qu'on ne le croit. Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "f_heart",
|
||||||
|
"kokoro_voice": "af_heart",
|
||||||
|
"gender": "female",
|
||||||
|
"age": "young",
|
||||||
|
"lang": "fr",
|
||||||
|
"label": "Heart",
|
||||||
|
"ref_audio": "clips/f_heart.wav",
|
||||||
|
"ref_text": "L'univers est toujours plus étrange qu'on ne le croit. Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "f_emma",
|
||||||
|
"kokoro_voice": "bf_emma",
|
||||||
|
"gender": "female",
|
||||||
|
"age": "adult",
|
||||||
|
"lang": "fr",
|
||||||
|
"label": "Emma",
|
||||||
|
"ref_audio": "clips/f_emma.wav",
|
||||||
|
"ref_text": "L'univers est toujours plus étrange qu'on ne le croit. Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "f_nicole",
|
||||||
|
"kokoro_voice": "af_nicole",
|
||||||
|
"gender": "female",
|
||||||
|
"age": "adult",
|
||||||
|
"lang": "fr",
|
||||||
|
"label": "Nicole",
|
||||||
|
"ref_audio": "clips/f_nicole.wav",
|
||||||
|
"ref_text": "L'univers est toujours plus étrange qu'on ne le croit. Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "m_fenrir",
|
||||||
|
"kokoro_voice": "am_fenrir",
|
||||||
|
"gender": "male",
|
||||||
|
"age": "adult",
|
||||||
|
"lang": "fr",
|
||||||
|
"label": "Fenrir",
|
||||||
|
"ref_audio": "clips/m_fenrir.wav",
|
||||||
|
"ref_text": "L'univers est toujours plus étrange qu'on ne le croit. Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "m_michael",
|
||||||
|
"kokoro_voice": "am_michael",
|
||||||
|
"gender": "male",
|
||||||
|
"age": "adult",
|
||||||
|
"lang": "fr",
|
||||||
|
"label": "Michael",
|
||||||
|
"ref_audio": "clips/m_michael.wav",
|
||||||
|
"ref_text": "L'univers est toujours plus étrange qu'on ne le croit. Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "m_george",
|
||||||
|
"kokoro_voice": "bm_george",
|
||||||
|
"gender": "male",
|
||||||
|
"age": "adult",
|
||||||
|
"lang": "fr",
|
||||||
|
"label": "George",
|
||||||
|
"ref_audio": "clips/m_george.wav",
|
||||||
|
"ref_text": "L'univers est toujours plus étrange qu'on ne le croit. Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "m_lewis",
|
||||||
|
"kokoro_voice": "bm_lewis",
|
||||||
|
"gender": "male",
|
||||||
|
"age": "adult",
|
||||||
|
"lang": "fr",
|
||||||
|
"label": "Lewis",
|
||||||
|
"ref_audio": "clips/m_lewis.wav",
|
||||||
|
"ref_text": "L'univers est toujours plus étrange qu'on ne le croit. Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "m_eric",
|
||||||
|
"kokoro_voice": "am_eric",
|
||||||
|
"gender": "male",
|
||||||
|
"age": "young",
|
||||||
|
"lang": "fr",
|
||||||
|
"label": "Eric",
|
||||||
|
"ref_audio": "clips/m_eric.wav",
|
||||||
|
"ref_text": "L'univers est toujours plus étrange qu'on ne le croit. Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "m_santa",
|
||||||
|
"kokoro_voice": "am_santa",
|
||||||
|
"gender": "male",
|
||||||
|
"age": "old",
|
||||||
|
"lang": "fr",
|
||||||
|
"label": "Santa",
|
||||||
|
"ref_audio": "clips/m_santa.wav",
|
||||||
|
"ref_text": "L'univers est toujours plus étrange qu'on ne le croit. Chaque nouvelle merveille pose les bases d'une découverte plus éblouissante encore."
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user