Newer post
2025: My Year In Review
Reflections on a year of milestones—getting engaged in Florence, publishing with O'Reilly and the AEA, raising a Series A at Workhelix, and overcoming health challenges. Plus, my goals for 2026.
OpenAI
A technical deep-dive into building a production-ready text-to-speech pipeline for blog posts using OpenAI's TTS API, smart text processing with NLP, automatic chunking for long content, and AWS S3 for scalable audio hosting.

If you're reading this post then you probably want to add audio versions to your blog posts. Perhaps you've noticed more sites offering "listen to this article" features, or maybe you just want to make your content more accessible.
Whatever your reason, I'll show you exactly how I built a complete text-to-speech pipeline that automatically generates high-quality audio for every post on this blog—including the one you're reading right now.
This post assumes the following of you:
Alright, let's get to it.
Here's how the pipeline works end-to-end:
The beauty of this system is that it's fully automated. Write a post, run the build, and audio appears. No manual steps, no recording equipment, just code.
The first challenge is that blog posts aren't written to be read aloud. They contain:
$53k or "Dec 2021"Here's how I handle text extraction using the Compromise NLP library (full source):
JavaScript1// Extract and normalize text content from markdown 2function extractTextFromMarkdown(markdown) { 3 // Remove frontmatter 4 let content = markdown.replace(/^---[\s\S]*?---\n/, ''); 5 6 // Remove all emojis and special Unicode characters 7 content = content.replace(/[\u{1F300}-\u{1F9FF}]|[\u{1F600}-\u{1F64F}]|[\u{1F680}-\u{1F6FF}]|[\u{2600}-\u{26FF}]|[\u{2700}-\u{27BF}]|[\u{1F900}-\u{1F9FF}]|[\u{1F1E0}-\u{1F1FF}]/gu, ''); 8 9 // Remove code blocks entirely 10 content = content.replace(/```[\s\S]*?```/g, ''); 11 12 // Handle inline code - replace with the word or phrase without backticks 13 content = content.replace(/`([^`]+)`/g, '$1'); 14 15 // Extract link text, removing the URL 16 content = content.replace(/\[([^\]]+)\]\([^)]+\)/g, '$1'); 17 18 // Use compromise to process the text 19 let doc = nlp(content); 20 21 // Expand contractions 22 doc.contractions().expand(); 23 24 // Process money values 25 const moneyMatches = doc.match('$#Value'); 26 moneyMatches.forEach(m => { 27 const text = m.text(); 28 if (text.match(/\$\d+k/i)) { 29 const num = text.match(/\d+/)[0]; 30 m.replaceWith(`${num} thousand dollars`); 31 } 32 }); 33 34 // Handle common abbreviations 35 const abbreviations = { 36 'API': 'A P I', 37 'URL': 'U R L', 38 'HTTP': 'H T T P', 39 'HTTPS': 'H T T P S', 40 'AWS': 'A W S', 41 'GPU': 'G P U', 42 // ... many more 43 }; 44 45 return content.trim(); 46}
Here's what the normalization does to actual text:
Original:
Text1This year, I successfully paid off my private student loans by paying down the remaining $53k I had left. 2I've been working on the API for NormConf using AWS.
Processed:
Text1This year, I successfully paid off my private student loans by paying down the remaining 53 thousand dollars I had left. 2I have been working on the A P I for NormConf using A W S.
The difference is subtle but crucial for natural-sounding speech.
OpenAI's TTS API has a hard limit of 4096 characters per request. For longer posts (like my student loans story at 43,138 characters), we need intelligent chunking (view on GitHub):
JavaScript1function splitTextIntoChunks(text, maxChars) { 2 if (text.length <= maxChars) { 3 return [text]; 4 } 5 6 const chunks = []; 7 8 // First try to split by double newlines (paragraphs) 9 const paragraphs = text.split(/\n\n+/); 10 let currentChunk = ''; 11 12 for (const paragraph of paragraphs) { 13 const trimmedParagraph = paragraph.trim(); 14 if (!trimmedParagraph) continue; 15 16 // If a single paragraph is too long, split by sentences 17 if (trimmedParagraph.length > maxChars) { 18 if (currentChunk.trim()) { 19 chunks.push(currentChunk.trim()); 20 currentChunk = ''; 21 } 22 23 // Use NLP to split by sentences 24 const doc = nlp(trimmedParagraph); 25 const sentences = doc.sentences().out('array'); 26 27 for (const sentence of sentences) { 28 if ((currentChunk + ' ' + sentence).length > maxChars && currentChunk.length > 0) { 29 chunks.push(currentChunk.trim()); 30 currentChunk = sentence; 31 } else { 32 currentChunk += (currentChunk ? ' ' : '') + sentence; 33 } 34 } 35 } else { 36 // Check if adding this paragraph would exceed the limit 37 const separator = currentChunk ? '\n\n' : ''; 38 const combined = currentChunk + separator + trimmedParagraph; 39 40 if (combined.length > maxChars && currentChunk.length > 0) { 41 chunks.push(currentChunk.trim()); 42 currentChunk = trimmedParagraph; 43 } else { 44 currentChunk = combined; 45 } 46 } 47 } 48 49 return chunks; 50}
This approach ensures we:
Once we have our chunks, we generate audio for each and use FFmpeg to concatenate them seamlessly:
JavaScript1// Generate audio for each chunk 2const chunkPaths = []; 3for (let i = 0; i < chunks.length; i++) { 4 const chunkPath = path.join(AUDIO_OUTPUT_DIR, `${filename}_chunk_${i}.mp3`); 5 console.log(` Generating chunk ${i + 1}/${chunks.length} (${chunks[i].length} chars)...`); 6 7 await generateAudio(chunks[i], chunkPath); 8 chunkPaths.push(chunkPath); 9} 10 11// Concatenate with FFmpeg 12if (hasFfmpeg) { 13 console.log(` Concatenating ${chunks.length} chunks with ffmpeg...`); 14 await concatenateAudioFiles(chunkPaths, audioPath); 15 16 // Clean up chunk files 17 for (const chunkPath of chunkPaths) { 18 await fs.unlink(chunkPath); 19 } 20}
The FFmpeg concatenation ensures there are no gaps or glitches between chunks—the audio flows naturally as if it were generated in one piece.
To avoid unnecessary API calls and costs, I implement content-based caching:
JavaScript1// Calculate hash of processed text 2const contentHash = calculateHash(textContent); 3 4// Check if audio already exists and content hasn't changed 5if (!forceRegenerate && cache[filename] && cache[filename].hash === contentHash) { 6 try { 7 await fs.access(audioPath); 8 console.log(` ✓ Audio already exists and is up to date`); 9 return { filename, audioFilename, status: 'cached' }; 10 } catch { 11 console.log(` Audio file missing, regenerating...`); 12 } 13}
The cache tracks:
Once audio files are generated, they're uploaded to S3 for global distribution:
JavaScript1// Upload to S3 with caching headers 2const command = new PutObjectCommand({ 3 Bucket: BUCKET_NAME, 4 Key: `audio/${filename}`, 5 Body: fileContent, 6 ContentType: 'audio/mpeg', 7 CacheControl: 'public, max-age=31536000', // Cache for 1 year 8 Metadata: { 9 'generated-by': 'blog-audio-generator', 10 'source': 'openai-tts' 11 } 12}); 13 14await s3Client.send(command);
The upload script also generates a manifest file mapping post slugs to S3 URLs:
JSON1{ 2 "2022_reflection": "https://tech-notes-blog.s3.us-west-2.amazonaws.com/audio/2022_reflection.mp3", 3 "building_an_https_model_apI_for_cheap": "https://tech-notes-blog.s3.us-west-2.amazonaws.com/audio/building_an_https_model_apI_for_cheap.mp3", 4 // ... more posts 5}
The React audio player provides a clean interface with all the controls readers expect (full component):
JSX1const AudioPlayer = ({ audioUrl, title }) => { 2 const [isPlaying, setIsPlaying] = useState(false); 3 const [currentTime, setCurrentTime] = useState(0); 4 const [duration, setDuration] = useState(0); 5 const [playbackRate, setPlaybackRate] = useState(1); 6 7 // ... audio event handlers 8 9 return ( 10 <div className="audio-player"> 11 <div className="audio-player-header"> 12 <span className="audio-player-title">{title}</span> 13 </div> 14 15 <div className="audio-player-controls"> 16 <button onClick={togglePlayPause}> 17 {isPlaying ? <PauseIcon /> : <PlayIcon />} 18 </button> 19 20 <div className="audio-player-time"> 21 {formatTime(currentTime)} / {formatTime(duration)} 22 </div> 23 24 <div className="audio-player-progress" onClick={handleProgressClick}> 25 <div 26 className="audio-player-progress-fill" 27 style={{ width: `${progressPercentage}%` }} 28 /> 29 </div> 30 31 <button onClick={handleSpeedChange}> 32 {playbackRate}x 33 </button> 34 </div> 35 </div> 36 ); 37};
Features include:
The complete pipeline processes all 14 posts on this blog in about 15 minutes:
OpenAI TTS pricing:
AWS S3 costs:
Simple npm scripts make the whole process painless:
Bash1# Generate audio for all posts 2npm run generate-audio 3 4# Generate for specific post 5npm run generate-audio post-name 6 7# Force regenerate (ignore cache) 8npm run generate-audio post-name --force 9 10# Upload to S3 11npm run upload-audio 12 13# Full pipeline 14npm run process-audio
If you want to implement this for your own blog, you'll need:
The complete implementation is running on this blog—in fact, you can listen to this very post by clicking the audio player at the top.
All the code for this TTS pipeline is available on GitHub:
Happy listening!
Related reading
Newer post
Reflections on a year of milestones—getting engaged in Florence, publishing with O'Reilly and the AEA, raising a Series A at Workhelix, and overcoming health challenges. Plus, my goals for 2026.
Older post
A new paper on the impact of AI on labor demand at the firm level.
Stay in the loop
No spam — just updates when something new ships or the book hits a milestone.