Experiment Constructor

This page walks you through assembling a complete ASR experiment step by step -- from choosing your data to ready-to-run code you can copy and execute.

If you already know what you need, skip the explanations and jump straight to the interactive builder at the bottom.

Step 1. What data will you measure on?

Before picking models and metrics, you need to decide what data you will evaluate on.

Built-in datasets

plantain2asr ships with loaders for several Russian speech corpora. Each loader parses the corpus structure automatically and provides a uniform AudioSample interface.

Golos

An open-source corpus by Sber. ~1 200 hours of Russian speech. Two subsets:

crowd -- crowdsourced recordings (clean, diverse speakers)
farfield -- far-field microphone recordings (noisier, more realistic)


Size	~1 200 h
Audio format	WAV / OGG
Download	github.com/sberdevices/golos
Loader	`GolosDataset("data/golos")`
Auto-download	yes (`auto_download=True`)

from plantain2asr import GolosDataset

ds = GolosDataset("data/golos")
crowd = ds.filter(lambda s: s.meta["subset"] == "crowd")

DaGRuS

A conversational Russian speech corpus with detailed annotations: laughter, noise, unclear words, fillers.


Size	~60 h
Key feature	Conversational speech, event annotations
Download	available on request from corpus authors
Loader	`DagrusDataset("data/dagrus")`

Normalization for DaGRuS

Use DagrusNormalizer() -- it knows how to strip corpus-specific annotations ([laugh], [noise], {word*}) and normalize colloquial forms.

from plantain2asr import DagrusDataset, DagrusNormalizer

ds = DagrusDataset("data/dagrus")
norm = ds >> DagrusNormalizer()

RuDevices

A corpus of recordings from various devices (laptops, phones, smart speakers).


Loader	`RuDevicesDataset("data/rudevices")`
Key feature	Different devices and recording conditions

from plantain2asr import RuDevicesDataset

ds = RuDevicesDataset("data/rudevices")

Using your own data

If your data is not covered by the built-in loaders, there are two paths.

Path 1: NeMo-format JSONL

If you have audio files and a JSONL manifest, use NeMoDataset:

{"audio_filepath": "audio/001.wav", "text": "hello world", "duration": 2.1}
{"audio_filepath": "audio/002.wav", "text": "how are you", "duration": 1.8}

from plantain2asr import NeMoDataset

ds = NeMoDataset(root_dir="data/my_corpus", manifest_path="data/my_corpus/manifest.jsonl")

Path 2: custom loader class

Subclass BaseASRDataset and return a list of AudioSample:

from plantain2asr.dataloaders.base import BaseASRDataset
from plantain2asr.dataloaders.types import AudioSample

class MyDataset(BaseASRDataset):
    def __init__(self, root_dir):
        super().__init__()
        self.name = "my-dataset"
        self._samples = [
            AudioSample(id="s1", audio_path=f"{root_dir}/001.wav", text="reference text"),
        ]

More details: Extending -> Custom Model

Step 2. Which metrics do you need?

Metrics show how well a model recognized speech.

Core metrics

Metric	What it measures	When to use
WER (Word Error Rate)	Fraction of erroneous words. Counts insertions, deletions, and substitutions at the word level.	Universal primary metric. Always include.
CER (Character Error Rate)	Same idea, but at the character level.	When spelling accuracy matters, not just words.
MER (Match Error Rate)	Normalized variant of WER accounting for both string lengths.	More stable on short utterances.
Accuracy	`1 - MER`. The fraction of correctly recognized content.	When you want an intuitive "percent correct" number.

Additional metrics

Metric	What it measures
WIL	Word Information Lost
WIP	Word Information Preserved
IDR	Insertion / Deletion Ratio
LengthRatio	Hypothesis length divided by reference length
BERTScore	Semantic similarity via BERT embeddings (requires `analysis` extra)
POSAnalysis	POS-tag error analysis (requires `analysis` extra)

What should I choose?

Recommendation

For a first evaluation, use Metrics.composite() -- it computes WER, CER, MER, WIL, WIP, Accuracy, IDR, and LengthRatio in a single pass.

from plantain2asr import Metrics

norm >> Metrics.composite()

If you only need one metric:

norm >> Metrics.WER()

Step 3. Which models to compare?

plantain2asr supports several ASR model families. They all share the same interface: dataset >> Models.XXX().

Local models

Model	Description	Device	pip extra	When to choose
GigaAM v3	Large Sber model, e2e-RNNT architecture. Best Russian quality.	CUDA / MPS / CPU	`gigaam`	When quality matters and you have a GPU
GigaAM v2	Previous GigaAM generation.	CUDA / MPS / CPU	`gigaam`	For comparison with v3
Whisper	OpenAI model, large-v3. Strong multilingual baseline.	CUDA / MPS / CPU	`whisper`	Universal baseline
T-One	T-Bank model on ONNX Runtime. Fast inference.	CUDA / CPU	`tone` + T-One source archive	When speed matters
Vosk	Lightweight offline model on Kaldi. CPU only.	CPU	`vosk`	No GPU, need offline
Canary	NVIDIA NeMo Canary. Heavy, requires GPU.	CUDA	`canary`	Research comparisons

Cloud models

Model	Description	Extra	When to choose
SaluteSpeech	Sber cloud API.	none	Cloud-based recognition

Installation

Each model requires its own set of dependencies. Install only what you need:

pip install plantain2asr[gigaam]
pip install plantain2asr[whisper]
pip install plantain2asr[vosk]
pip install plantain2asr[tone]
pip install "tone @ https://github.com/voicekit-team/T-one/archive/3c5b6c015038173840e62cea99e10cdb1c759116.tar.gz"

Or the full CPU/GPU stack at once:

pip install plantain2asr[asr-cpu]
pip install plantain2asr[asr-gpu]

Running models

from plantain2asr import Models

ds >> Models.GigaAM_v3()
ds >> Models.Whisper()
ds >> Models.Vosk(model_path="path/to/vosk-model")

Results are cached: re-running skips already processed samples.

Step 4. Text normalization

Before computing metrics, you need to bring references and hypotheses to a common form: remove punctuation, normalize case, handle corpus-specific markup.

Normalizer	What it does	When to use
`SimpleNormalizer()`	Lowercase, strip punctuation, `ё` -> `е`, collapse whitespace	Most corpora
`DagrusNormalizer()`	Everything SimpleNormalizer does + strips DaGRuS markup + normalizes colloquial forms	DaGRuS corpus
No normalization	Metrics are computed on raw text	Only if texts are already normalized

from plantain2asr import SimpleNormalizer

norm = ds >> SimpleNormalizer()

Step 5. Assemble the `>>` chain

Now that you have chosen data, models, normalizer, and metrics, assemble them into a pipeline using the >> operator:

from plantain2asr import GolosDataset, Models, SimpleNormalizer, Metrics

ds = GolosDataset("data/golos")

# step 1: run models
ds >> Models.GigaAM_v3()
ds >> Models.Whisper()

# step 2: normalize
norm = ds >> SimpleNormalizer()

# step 3: compute metrics
norm >> Metrics.composite()

# step 4: view results
df = norm.to_pandas()
print(df.groupby("model")[["WER", "CER"]].mean().sort_values("WER"))

Each >> creates a new results layer on top of the dataset. You can branch (.filter()), subsample (.take(n)), and recombine at any point.

`Experiment` convenience wrapper

If you don't need manual control, Experiment wraps the same >> steps:

from plantain2asr import Experiment, GolosDataset, Models, SimpleNormalizer

experiment = Experiment(
    dataset=GolosDataset("data/golos"),
    models=[Models.GigaAM_v3(), Models.Whisper()],
    normalizer=SimpleNormalizer(),
)

experiment.compare_on_corpus(metrics=["WER", "CER", "Accuracy"])

Method	What it does
`compare_on_corpus()`	Model comparison with metric table
`prepare_thesis_tables()`	CSV tables for thesis/paper
`export_appendix_bundle()`	Full package: tables + report + benchmark
`benchmark_models()`	Latency, throughput, RTF measurements

Interactive Builder

Pick your components below, and the builder will show you ready-to-use code, the install command, and a list of output artifacts.

1 What result do you need?

2 Which dataset?

3 Which models?

4 Normalizer

5 Metrics

6 Additional outputs

HTML reportself-contained static file Browser reportinteractive, with audio Benchmarklatency / throughput / RTF Pandas DataFramefor custom analysis

Experiment Constructor

Step 1. What data will you measure on?

Built-in datasets

Golos

DaGRuS

RuDevices

Using your own data

Path 1: NeMo-format JSONL

Path 2: custom loader class

Step 2. Which metrics do you need?

Core metrics

Additional metrics

What should I choose?

Step 3. Which models to compare?

Local models

Cloud models

Installation

Running models

Step 4. Text normalization

Step 5. Assemble the `>>` chain

`Experiment` convenience wrapper

Interactive Builder

Install command

Ready-to-run code

Artifacts

What's next?

Experiment Constructor

Step 1. What data will you measure on?

Built-in datasets

Golos

DaGRuS

RuDevices

Using your own data

Path 1: NeMo-format JSONL

Path 2: custom loader class

Step 2. Which metrics do you need?

Core metrics

Additional metrics

What should I choose?

Step 3. Which models to compare?

Local models

Cloud models

Installation

Running models

Step 4. Text normalization

Step 5. Assemble the >> chain

Experiment convenience wrapper

Interactive Builder

Install command

Ready-to-run code Copy

Artifacts

What's next?

Step 5. Assemble the `>>` chain

`Experiment` convenience wrapper

Ready-to-run code