OpenAI

Adding Text-to-Speech to Your Blog: Building an OpenAI TTS Pipeline with Smart Chunking and AWS S3

A technical deep-dive into building a production-ready text-to-speech pipeline for blog posts using OpenAI's TTS API, smart text processing with NLP, automatic chunking for long content, and AWS S3 for scalable audio hosting.

Published June 29, 20258 min readFiled in June 2025

Back to posts Browse this topic Browse this month

Adding Text-to-Speech to Your Blog: Building an OpenAI TTS Pipeline with Smart Chunking and AWS S3

Intro

If you're reading this post then you probably want to add audio versions to your blog posts. Perhaps you've noticed more sites offering "listen to this article" features, or maybe you just want to make your content more accessible.

Whatever your reason, I'll show you exactly how I built a complete text-to-speech pipeline that automatically generates high-quality audio for every post on this blog—including the one you're reading right now.

This post assumes the following of you:

You have a Node.js-based blog or can integrate Node scripts into your build process
You have an OpenAI API key (for their TTS service)
You have an AWS account with S3 access
You're comfortable with basic command-line tools
You want professional-quality audio without manual recording

Alright, let's get to it.

The Architecture

Here's how the pipeline works end-to-end:

The beauty of this system is that it's fully automated. Write a post, run the build, and audio appears. No manual steps, no recording equipment, just code.

Text Processing: Making Markdown Sound Natural

The first challenge is that blog posts aren't written to be read aloud. They contain:

Code blocks that shouldn't be narrated
Abbreviations like "API" or "AWS"
Special formatting like $53k or "Dec 2021"
Emojis and special characters
Links and images

Here's how I handle text extraction using the Compromise NLP library (full source):

JavaScript
1// Extract and normalize text content from markdown
2function extractTextFromMarkdown(markdown) {
3  // Remove frontmatter
4  let content = markdown.replace(/^---[\s\S]*?---\n/, '');
5
6  // Remove all emojis and special Unicode characters
7  content = content.replace(/[\u{1F300}-\u{1F9FF}]|[\u{1F600}-\u{1F64F}]|[\u{1F680}-\u{1F6FF}]|[\u{2600}-\u{26FF}]|[\u{2700}-\u{27BF}]|[\u{1F900}-\u{1F9FF}]|[\u{1F1E0}-\u{1F1FF}]/gu, '');
8
9  // Remove code blocks entirely
10  content = content.replace(/```[\s\S]*?```/g, '');
11
12  // Handle inline code - replace with the word or phrase without backticks
13  content = content.replace(/`([^`]+)`/g, '$1');
14
15  // Extract link text, removing the URL
16  content = content.replace(/\[([^\]]+)\]\([^)]+\)/g, '$1');
17
18  // Use compromise to process the text
19  let doc = nlp(content);
20
21  // Expand contractions
22  doc.contractions().expand();
23
24  // Process money values
25  const moneyMatches = doc.match('$#Value');
26  moneyMatches.forEach(m => {
27    const text = m.text();
28    if (text.match(/\$\d+k/i)) {
29      const num = text.match(/\d+/)[0];
30      m.replaceWith(`${num} thousand dollars`);
31    }
32  });
33
34  // Handle common abbreviations
35  const abbreviations = {
36    'API': 'A P I',
37    'URL': 'U R L',
38    'HTTP': 'H T T P',
39    'HTTPS': 'H T T P S',
40    'AWS': 'A W S',
41    'GPU': 'G P U',
42    // ... many more
43  };
44
45  return content.trim();
46}

Example Processing Output

Here's what the normalization does to actual text:

Original:

Text
1This year, I successfully paid off my private student loans by paying down the remaining $53k I had left.
2I've been working on the API for NormConf using AWS.

Processed:

Text
1This year, I successfully paid off my private student loans by paying down the remaining 53 thousand dollars I had left.
2I have been working on the A P I for NormConf using A W S.

The difference is subtle but crucial for natural-sounding speech.

Chunking: Working Around OpenAI's 4096 Character Limit

OpenAI's TTS API has a hard limit of 4096 characters per request. For longer posts (like my student loans story at 43,138 characters), we need intelligent chunking (view on GitHub):

JavaScript
1function splitTextIntoChunks(text, maxChars) {
2  if (text.length <= maxChars) {
3    return [text];
4  }
5
6  const chunks = [];
7
8  // First try to split by double newlines (paragraphs)
9  const paragraphs = text.split(/\n\n+/);
10  let currentChunk = '';
11
12  for (const paragraph of paragraphs) {
13    const trimmedParagraph = paragraph.trim();
14    if (!trimmedParagraph) continue;
15
16    // If a single paragraph is too long, split by sentences
17    if (trimmedParagraph.length > maxChars) {
18      if (currentChunk.trim()) {
19        chunks.push(currentChunk.trim());
20        currentChunk = '';
21      }
22
23      // Use NLP to split by sentences
24      const doc = nlp(trimmedParagraph);
25      const sentences = doc.sentences().out('array');
26
27      for (const sentence of sentences) {
28        if ((currentChunk + ' ' + sentence).length > maxChars && currentChunk.length > 0) {
29          chunks.push(currentChunk.trim());
30          currentChunk = sentence;
31        } else {
32          currentChunk += (currentChunk ? ' ' : '') + sentence;
33        }
34      }
35    } else {
36      // Check if adding this paragraph would exceed the limit
37      const separator = currentChunk ? '\n\n' : '';
38      const combined = currentChunk + separator + trimmedParagraph;
39
40      if (combined.length > maxChars && currentChunk.length > 0) {
41        chunks.push(currentChunk.trim());
42        currentChunk = trimmedParagraph;
43      } else {
44        currentChunk = combined;
45      }
46    }
47  }
48
49  return chunks;
50}

This approach ensures we:

Never break in the middle of a sentence
Prefer paragraph boundaries when possible
Handle edge cases like single paragraphs longer than 4096 chars

Audio Generation and Concatenation

Once we have our chunks, we generate audio for each and use FFmpeg to concatenate them seamlessly:

JavaScript
1// Generate audio for each chunk
2const chunkPaths = [];
3for (let i = 0; i < chunks.length; i++) {
4  const chunkPath = path.join(AUDIO_OUTPUT_DIR, `${filename}_chunk_${i}.mp3`);
5  console.log(`  Generating chunk ${i + 1}/${chunks.length} (${chunks[i].length} chars)...`);
6
7  await generateAudio(chunks[i], chunkPath);
8  chunkPaths.push(chunkPath);
9}
10
11// Concatenate with FFmpeg
12if (hasFfmpeg) {
13  console.log(`  Concatenating ${chunks.length} chunks with ffmpeg...`);
14  await concatenateAudioFiles(chunkPaths, audioPath);
15
16  // Clean up chunk files
17  for (const chunkPath of chunkPaths) {
18    await fs.unlink(chunkPath);
19  }
20}

The FFmpeg concatenation ensures there are no gaps or glitches between chunks—the audio flows naturally as if it were generated in one piece.

Caching: Don't Regenerate Unchanged Content

To avoid unnecessary API calls and costs, I implement content-based caching:

JavaScript
1// Calculate hash of processed text
2const contentHash = calculateHash(textContent);
3
4// Check if audio already exists and content hasn't changed
5if (!forceRegenerate && cache[filename] && cache[filename].hash === contentHash) {
6  try {
7    await fs.access(audioPath);
8    console.log(`  ✓ Audio already exists and is up to date`);
9    return { filename, audioFilename, status: 'cached' };
10  } catch {
11    console.log(`  Audio file missing, regenerating...`);
12  }
13}

The cache tracks:

Content hash (MD5 of processed text)
Generation timestamp
Character count
Number of chunks
Whether the file is complete (all chunks concatenated)

S3 Upload and Distribution

Once audio files are generated, they're uploaded to S3 for global distribution:

JavaScript
1// Upload to S3 with caching headers
2const command = new PutObjectCommand({
3  Bucket: BUCKET_NAME,
4  Key: `audio/${filename}`,
5  Body: fileContent,
6  ContentType: 'audio/mpeg',
7  CacheControl: 'public, max-age=31536000', // Cache for 1 year
8  Metadata: {
9    'generated-by': 'blog-audio-generator',
10    'source': 'openai-tts'
11  }
12});
13
14await s3Client.send(command);

The upload script also generates a manifest file mapping post slugs to S3 URLs:

JSON
1{
2  "2022_reflection": "https://tech-notes-blog.s3.us-west-2.amazonaws.com/audio/2022_reflection.mp3",
3  "building_an_https_model_apI_for_cheap": "https://tech-notes-blog.s3.us-west-2.amazonaws.com/audio/building_an_https_model_apI_for_cheap.mp3",
4  // ... more posts
5}

Frontend: The Audio Player Component

The React audio player provides a clean interface with all the controls readers expect (full component):

JSX
1const AudioPlayer = ({ audioUrl, title }) => {
2  const [isPlaying, setIsPlaying] = useState(false);
3  const [currentTime, setCurrentTime] = useState(0);
4  const [duration, setDuration] = useState(0);
5  const [playbackRate, setPlaybackRate] = useState(1);
6
7  // ... audio event handlers
8
9  return (
10    <div className="audio-player">
11      <div className="audio-player-header">
12        <span className="audio-player-title">{title}</span>
13      </div>
14
15      <div className="audio-player-controls">
16        <button onClick={togglePlayPause}>
17          {isPlaying ? <PauseIcon /> : <PlayIcon />}
18        </button>
19
20        <div className="audio-player-time">
21          {formatTime(currentTime)} / {formatTime(duration)}
22        </div>
23
24        <div className="audio-player-progress" onClick={handleProgressClick}>
25          <div
26            className="audio-player-progress-fill"
27            style={{ width: `${progressPercentage}%` }}
28          />
29        </div>
30
31        <button onClick={handleSpeedChange}>
32          {playbackRate}x
33        </button>
34      </div>
35    </div>
36  );
37};

Features include:

Play/pause toggle
Progress bar with seeking
Time display (current/total)
Playback speed control (1x, 1.25x, 1.5x, 1.75x, 2x)
Loading states and error handling

Results and Performance

The complete pipeline processes all 14 posts on this blog in about 15 minutes:

11 posts required chunking (2-11 chunks each)
Total of 33 audio chunks generated
Longest post: 43,138 characters (11 chunks)
All audio seamlessly concatenated with FFmpeg
Zero manual intervention required

Cost Analysis

OpenAI TTS pricing:

tts-1-hd: $0.030 per 1,000 characters
Average blog post: ~10,000 characters = $0.30
Total for 14 posts: ~$4.20

AWS S3 costs:

Storage: ~100MB total = $0.0023/month
Bandwidth: Depends on traffic, but audio files are cached for 1 year

The Command Line Interface

Simple npm scripts make the whole process painless:

Bash
1# Generate audio for all posts
2npm run generate-audio
3
4# Generate for specific post
5npm run generate-audio post-name
6
7# Force regenerate (ignore cache)
8npm run generate-audio post-name --force
9
10# Upload to S3
11npm run upload-audio
12
13# Full pipeline
14npm run process-audio

Lessons Learned

Text normalization is crucial - Raw markdown sounds terrible when read aloud
Smart chunking matters - Breaking at sentence boundaries maintains flow
Caching saves money - Content-based hashing prevents unnecessary regeneration
FFmpeg is your friend - Seamless audio concatenation with one command
S3 + CloudFront works great - Fast global delivery with minimal configuration

Try It Yourself

If you want to implement this for your own blog, you'll need:

OpenAI API key (get one at platform.openai.com)
AWS account with S3 bucket
Node.js environment
FFmpeg installed locally
About an hour to set everything up

The complete implementation is running on this blog—in fact, you can listen to this very post by clicking the audio player at the top.

Source Code

All the code for this TTS pipeline is available on GitHub:

Audio Generation Script: scripts/generate-audio.js - Core logic for text extraction, NLP processing, chunking, and OpenAI API integration
S3 Upload Script: scripts/upload-audio-s3.js - Handles uploading audio files to S3 and generating the manifest
Audio Player Component: src/components/AudioPlayer.tsx - React component with full playback controls
Post Detail Integration: src/components/PostDetail.tsx - Shows how the audio player is integrated into blog posts
Audio Manifest: src/config/audioManifest.json - Maps post slugs to S3 audio URLs
Setup Documentation: docs/AUDIO_SETUP.md - Detailed setup instructions

Happy listening!

Published June 29, 20258 min read9 topics

Listen to this post

OpenAI TTS AWS S3 React NLP Audio FFmpeg Node.js

Move to the adjacent posts in the archive.

Newer post

2025: My Year In Review

Reflections on a year of milestones—getting engaged in Florence, publishing with O'Reilly and the AEA, raising a Series A at Workhelix, and overcoming health challenges. Plus, my goals for 2026.

December 31, 20255 min read

Read newer post →

Older post

Extending 'GPTs Are GPTs' to Firms

A new paper on the impact of AI on labor demand at the firm level.

June 2, 20253 min read

Read older post →