fix: use wrong sample_rate by Moskize91 · Pull Request #5 · oomol-lab/epub2speech

Moskize91 · 2025-09-25T11:59:11Z

No description provided.

coderabbitai · 2025-09-25T11:59:18Z

Summary by CodeRabbit

New Features
- Automatic sample-rate alignment across all generated audio segments for consistent playback without manual configuration.
Bug Fixes
- Resolves mismatched audio rates between segments that could cause playback artifacts.
Tests
- Simplified Chapter TTS tests to run without external services.
- Added end-to-end M4B generation tests, including multiple voice variations, executed when credentials are available.
Chores
- Added a new dependency to support audio resampling.

Walkthrough

epub2speech/chapter_tts.py: Removed sample_rate from ChapterTTS.init; audio processing now infers final sample rate from the first segment. Implemented resampling of subsequent segments to the first segment’s rate using scipy.signal.resample. Collected audio_segments as (array, rate) tuples and added typing hints. Final output written at the first segment’s rate.
pyproject.toml: Added scipy (>=1.10.0,<2.0.0) dependency.
tests/test_chapter_tts.py: Removed Azure-backed integration test; kept and refactored CJK sentence-splitting using a MockTTSProtocol and new temp paths.
tests/test_m4b_generation.py: Added new integration tests for EPUB-to-M4B conversion via Azure TTS, including voice variants and file existence/size checks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request has no description, so there is no author-provided context or summary of the changes to confirm their relevance or purpose.	Please add a concise description of the pull request that outlines its objectives and summarizes the key changes, such as the removal of the sample_rate parameter and the introduction of dynamic resampling.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title follows the required `<type>: <subject>` format and references the sample_rate change, but it only partially summarizes the main fix since it does not clearly describe the removal of the sample_rate parameter and the new dynamic resampling logic.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (6)

epub2speech/chapter_tts.py (2)
7-7: Prefer resample_poly for better quality and speed.

scipy.signal.resample (FFT) can introduce ringing and is heavier. resample_poly generally yields better quality and performance for arbitrary ratios.

Apply this diff to switch to polyphase resampling:
-from scipy.signal import resample
+from scipy.signal import resample_poly
And replace the resampling call inside the loop:
-                        target_len = max(1, round(len(audio_data) * first_sample_rate / sample_rate))
-                        resampled_audio = resample(audio_data, target_len, axis=0)
+                        # polyphase resampling with integer factors
+                        from math import gcd
+                        g = gcd(first_sample_rate, sample_rate)
+                        up = first_sample_rate // g
+                        down = sample_rate // g
+                        resampled_audio = resample_poly(audio_data, up, down, axis=0)
54-56: Consistency: prefer builtin generics or typing generics uniformly.

The file mixes list[...] and List[...]. Consider standardizing on builtin generics (Python 3.10+).
tests/test_chapter_tts.py (2)
76-82: Assert output was created and encoded as expected.

Strengthen the test by verifying the WAV exists, is non-empty, and uses the first segment’s sample rate (24k).

Apply this diff:
             output_path = output_dir / f"mock_{test_case['name'].replace(' ', '_').lower()}.wav"
             chapter_tts.process_chapter(
                 text=test_case['text'],
                 output_path=output_path,
                 workspace_path=temp_dir,
                 voice="mock-voice"
             )
+            # Validate output
+            self.assertTrue(output_path.exists(), f"Output not created: {output_path}")
+            import soundfile as sf
+            data, sr = sf.read(output_path)
+            self.assertGreater(len(data), 0, "Output audio is empty")
+            self.assertEqual(sr, 24000, "Final output should use the first segment's sample rate")
63-67: Clean up temp artifacts in tests.

Use TemporaryDirectory or remove created files on success to avoid polluting the repo.
tests/test_m4b_generation.py (2)
1-1: Remove shebang or make file executable.

Shebang in a non-executable test triggers EXE001. Easiest fix: remove it.

Apply this diff:
-#!/usr/bin/env python3
21-26: Skip messaging is fine; consider gating via env var too.

Optionally add an env flag (e.g., RUN_AZURE_TTS_TESTS) to skip quickly in CI without filesystem checks.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2daac05 and 397a110.

⛔ Files ignored due to path filters (1)

poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (4)

epub2speech/chapter_tts.py (3 hunks)
pyproject.toml (1 hunks)
tests/test_chapter_tts.py (1 hunks)
tests/test_m4b_generation.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tests/test_m4b_generation.py (3)

epub2speech/convertor.py (1)

convert_epub_to_m4b (121-139)

epub2speech/tts/azure_provider.py (1)

AzureTextToSpeech (10-71)

tests/utils/config.py (2)

TTSConfig (9-43)

get_azure_config (29-30)

🪛 Ruff (0.13.1)

tests/test_m4b_generation.py

1-1: Shebang is present but file is not executable

(EXE001)

🔇 Additional comments (2)

epub2speech/chapter_tts.py (1)

41-48: Signature change safe: no remaining sample_rate args
Verified no calls to ChapterTTS(...) include a sample_rate parameter in tests or production code.

pyproject.toml (1)

36-36: No change needed: existing SciPy constraints ensure prebuilt wheels for Python 3.12/3.13.
Pip will install the latest SciPy (<2.0.0)—SciPy 1.15.x provides wheels for both Python 3.12 and 3.13.

coderabbitai · 2025-09-25T12:04:32Z

epub2speech/chapter_tts.py

+            if audio_segments:
+                _, first_sample_rate = audio_segments[0]
+                resampled_segments = []
+                for audio_data, sample_rate in audio_segments:
+                    if sample_rate != first_sample_rate:
+                        resampled_length = int(len(audio_data) * first_sample_rate / sample_rate)
+                        resampled_audio = resample(audio_data, resampled_length)
+                        resampled_segments.append(resampled_audio)
+                    else:
+                        resampled_segments.append(audio_data)
+
+                final_audio = np.concatenate(resampled_segments)
+                sf.write(output_path, final_audio, first_sample_rate)



⚠️ Potential issue | 🟠 Major

Channel mismatch can break concatenation; align channels before resampling/concat.

If some segments are mono and others stereo, np.concatenate will fail or produce inconsistent results. Align to the first segment’s channel count before resampling.

Apply this diff to make channel count consistent and specify resampling axis:

- if audio_segments: - _, first_sample_rate = audio_segments[0] - resampled_segments = [] - for audio_data, sample_rate in audio_segments: - if sample_rate != first_sample_rate: - resampled_length = int(len(audio_data) * first_sample_rate / sample_rate) - resampled_audio = resample(audio_data, resampled_length) - resampled_segments.append(resampled_audio) - else: - resampled_segments.append(audio_data) - - final_audio = np.concatenate(resampled_segments) - sf.write(output_path, final_audio, first_sample_rate) + if audio_segments: + first_audio, first_sample_rate = audio_segments[0] + first_channels = 1 if first_audio.ndim == 1 else first_audio.shape[1] + + resampled_segments: list[np.ndarray] = [] + for audio_data, sample_rate in audio_segments: + # Channel align to match the first segment + if audio_data.ndim == 1 and first_channels > 1: + audio_data = np.repeat(audio_data[:, None], first_channels, axis=1) + elif audio_data.ndim == 2: + ch = audio_data.shape[1] + if ch != first_channels: + if first_channels == 1: + audio_data = audio_data.mean(axis=1) # downmix to mono + else: + # upmix by duplicating the first channel + audio_data = np.tile(audio_data[:, [0]], (1, first_channels)) + + # Resample if needed + if sample_rate != first_sample_rate: + target_len = max(1, round(len(audio_data) * first_sample_rate / sample_rate)) + resampled_audio = resample(audio_data, target_len, axis=0) + resampled_segments.append(resampled_audio.astype(np.float32, copy=False)) + else: + resampled_segments.append(audio_data.astype(np.float32, copy=False)) + + final_audio = np.concatenate(resampled_segments, axis=0) + sf.write(output_path, final_audio, first_sample_rate)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if audio_segments:

_, first_sample_rate = audio_segments[0]

resampled_segments = []

for audio_data, sample_rate in audio_segments:

if sample_rate != first_sample_rate:

resampled_length = int(len(audio_data) * first_sample_rate / sample_rate)

resampled_audio = resample(audio_data, resampled_length)

resampled_segments.append(resampled_audio)

else:

resampled_segments.append(audio_data)

final_audio = np.concatenate(resampled_segments)

sf.write(output_path, final_audio, first_sample_rate)

if audio_segments:

first_audio, first_sample_rate = audio_segments[0]

first_channels = 1 if first_audio.ndim == 1 else first_audio.shape[1]

resampled_segments: list[np.ndarray] = []

for audio_data, sample_rate in audio_segments:

# Channel align to match the first segment

if audio_data.ndim == 1 and first_channels > 1:

audio_data = np.repeat(audio_data[:, None], first_channels, axis=1)

elif audio_data.ndim == 2:

ch = audio_data.shape[1]

if ch != first_channels:

if first_channels == 1:

audio_data = audio_data.mean(axis=1) # downmix to mono

else:

# upmix by duplicating the first channel

audio_data = np.tile(audio_data[:, [0]], (1, first_channels))

# Resample if needed

if sample_rate != first_sample_rate:

target_len = max(1, round(len(audio_data) * first_sample_rate / sample_rate))

resampled_audio = resample(audio_data, target_len, axis=0)

resampled_segments.append(resampled_audio.astype(np.float32, copy=False))

else:

resampled_segments.append(audio_data.astype(np.float32, copy=False))

final_audio = np.concatenate(resampled_segments, axis=0)

sf.write(output_path, final_audio, first_sample_rate)

Moskize91 added 3 commits September 25, 2025 17:06

test(project): generate a real m4b file

c19ec05

fix: use wrong sample_rate

7b4fd41

feat(project): support resample logic

397a110

Moskize91 merged commit dd44e9e into main Sep 25, 2025
2 of 3 checks passed

Moskize91 deleted the sample branch September 25, 2025 12:01

coderabbitai bot reviewed Sep 25, 2025

View reviewed changes

coderabbitai bot mentioned this pull request Jan 19, 2026

feat: support the field max_tts_segment_chars #13

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use wrong sample_rate#5

fix: use wrong sample_rate#5
Moskize91 merged 3 commits intomainfrom
sample

Moskize91 commented Sep 25, 2025

Uh oh!

coderabbitai bot commented Sep 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-            if audio_segments:
-                _, first_sample_rate = audio_segments[0]
-                resampled_segments = []
-                for audio_data, sample_rate in audio_segments:
-                    if sample_rate != first_sample_rate:
-                        resampled_length = int(len(audio_data) * first_sample_rate / sample_rate)
-                        resampled_audio = resample(audio_data, resampled_length)
-                        resampled_segments.append(resampled_audio)
-                    else:
-                        resampled_segments.append(audio_data)
-                final_audio = np.concatenate(resampled_segments)
-                sf.write(output_path, final_audio, first_sample_rate)
+            if audio_segments:
+                first_audio, first_sample_rate = audio_segments[0]
+                first_channels = 1 if first_audio.ndim == 1 else first_audio.shape[1]
+                resampled_segments: list[np.ndarray] = []
+                for audio_data, sample_rate in audio_segments:
+                    # Channel align to match the first segment
+                    if audio_data.ndim == 1 and first_channels > 1:
+                        audio_data = np.repeat(audio_data[:, None], first_channels, axis=1)
+                    elif audio_data.ndim == 2:
+                        ch = audio_data.shape[1]
+                        if ch != first_channels:
+                            if first_channels == 1:
+                                audio_data = audio_data.mean(axis=1)  # downmix to mono
+                            else:
+                                # upmix by duplicating the first channel
+                                audio_data = np.tile(audio_data[:, [0]], (1, first_channels))
+                    # Resample if needed
+                    if sample_rate != first_sample_rate:
+                        target_len = max(1, round(len(audio_data) * first_sample_rate / sample_rate))
+                        resampled_audio = resample(audio_data, target_len, axis=0)
+                        resampled_segments.append(resampled_audio.astype(np.float32, copy=False))
+                    else:
+                        resampled_segments.append(audio_data.astype(np.float32, copy=False))
+                final_audio = np.concatenate(resampled_segments, axis=0)
+                sf.write(output_path, final_audio, first_sample_rate)

Conversation

Moskize91 commented Sep 25, 2025

Uh oh!

coderabbitai bot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Estimated code review effort

Pre-merge checks

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Sep 25, 2025 •

edited

Loading