The Podcasting Infrastructure Crisis: A $2 Billion Industry Built on Digital Duct Tape — The Bootstrapped Founder 402


Dear founder,

Over the last 18 months, I’ve been building Podscan, a business that processes millions of podcast episodes every day—tens of thousands of new episodes daily, with 33 million episodes now in our database. We transcribe everything and do AI-based analysis, search, and mention tracking for brands and marketers.

And here’s what I’ve learned: the success stories in podcasting happen despite the infrastructure, not because of it.

Podcasting is clearly having a moment. It’s a $2.1 billion industry in advertising revenue alone. Hundreds of millions of people in the US listen to podcasts every month. Acquisitions are happening everywhere. But underneath all of this momentum is a technical infrastructure that feels like it’s held together with digital duct tape and 20-year-old protocols that were never designed for what we’re asking them to do.

Today, I want to break down the biggest technical and social problems holding this industry back—because every single one of these problems represents a massive opportunity for founders who want to build something meaningful.

The Paradox: RSS Meets Reality

Let’s start with the foundation. Podcasting is built on RSS feeds—a technology designed around the year 2000 for blog syndication. RSS was meant to distribute text files or links to blog posts that were easily digestible. Now we’re using it to distribute gigabyte audio files to millions of people.

Think about that for a moment. We’re taking a protocol designed for sharing lightweight text content and asking it to handle massive audio distribution at scale.

And it shows. There’s little standardization, even with efforts like Podcasting 2.0. Episode numbering is inconsistent or missing entirely. Descriptions are malformed—people put HTML into fields that should be text, or they’ll claim to include an SRT transcript but actually dump a full HTML page into that field. Published dates are completely arbitrary. Different platforms handle and interpret RSS feeds differently.

Every podcast hosting company has their own way of providing, updating, and caching these feeds. Everything is decentralized, which sounds good in theory, but creates chaos in practice.

Then on the other side, you have the walled gardens: Apple, Spotify, YouTube. These are proprietary systems with their own internal hosting and listening apps. The Spotify app only plays Spotify content. Apple Podcasts is exclusive to Apple’s ecosystem. From a founder’s perspective, these platforms are incredibly difficult to work with—authentication problems, rate limiting, inconsistent APIs.

Problem #1: The Data Quality Nightmare

When you process podcasts at the scale we do at Potscan, edge cases become the norm. All those weird little podcasts with their individual quirks—they add up to become a systematic problem.

Here’s what I mean: over 60% of podcasts are missing basic metadata like proper categorization, good descriptions, or even contact emails where you can reach the podcaster. Show titles and episode titles are formatted completely differently across platforms. Some use markdown, some use HTML, some use no formatting at all.

Language detection is a particular nightmare. We’ve processed episodes that claim to be English but are actually Spanish, or vice versa. And there’s no reliable way to determine explicit content without actually analyzing the audio itself.

But here’s where it gets really messy: duplication. Popular podcasts that have switched hosting companies over their lifetime often end up with duplicate feeds. You’ll have one feed with the first 200 episodes, then they switch providers and create a new feed that brings over 100 episodes from the old feed plus their new content. Now you have overlapping, duplicate content scattered across multiple feeds, and no clear way to deduplicate or connect them.

Problem #2: The Measurement Black Hole

This is probably the most frustrating problem for anyone trying to build a business around podcasts. In web content, you can track exactly how much someone read, which parts they engaged with, how long they spent on each section. YouTube knows precisely where someone started watching, where they paused, how long they stayed engaged.

But podcasting has a measurement black hole.

With RSS-based distribution, all you know as a podcast owner is that someone’s device requested the MP3 file. That’s it. You don’t know if they actually listened. You don’t know how long they listened. You don’t even know if it was a human—Podscan technically downloads all these files for analysis, and we’re just one of many automated systems crawling podcast feeds.

Maybe someone subscribed to your show but never actually listens. Maybe they’re your biggest fan and listen to every episode immediately. You have no idea, because there’s no unified way to report listening behavior.

Apple and Spotify have this data because they control both the file delivery and the player. They know everything: when you pause, when you skip, whether you’re using headphones, whether the app is in the foreground. But this information stays locked in their walled gardens.

The rest of the ecosystem relies on a complex chain of tracking links—URLs that bounce through multiple analytics companies before finally delivering the file. It’s like digital Rube Goldberg machine, and it’s the only way to get any measurement at all.

Problem #3: Discovery is Fundamentally Broken

Most podcast platforms can only search titles and descriptions. But what if your competitor gets mentioned at minute 23 of a 45-minute episode? There’s no way to find that content on established platforms.

At Potscan, we’ve solved this by transcribing everything and making the full content searchable. But imagine the possibilities if this was available everywhere. You could search semantically: “I want to find a podcast for kids that talks about dinosaurs and how they came to be.” You wouldn’t need exact keyword matches—the system would understand the intent and context.

This discovery problem extends to competitive intelligence too. If you’re trying to understand how your competitors are perceived, what brands are being discussed, or what themes are trending in your industry, you’re essentially flying blind unless you have sophisticated transcription and analysis tools.

There’s so much valuable data buried in podcast conversations—demographics insights based on discussion themes, entity mentions, sponsor relationships, topic trends—but it’s all locked away because we can’t effectively search or analyze the actual content.

Problem #4: The Monetization Paradox

Monetization in podcasting is currently limited to injected ads and sponsorship reads. That’s basically it. And this limitation stems directly from the measurement problem I just described.

We’re stuck in a CPM world—cost per mille, or cost per thousand impressions—when we should be moving toward CPC, cost per click, or even better, cost per conversion. But you can’t optimize for clicks or conversions when you don’t even know if people are listening.

Spotify is actually doing interesting work here with their mobile app. When there’s an ad, you can now click through directly, and Spotify can track that engagement. But this doesn’t exist in the broader, more chaotic world of RSS-based podcasting.

Demographic targeting is rough at best—basic age, gender, location data. There’s no behavioral or interest-based targeting because it’s so hard to gather that information. And context is completely ignored. Ads are mostly inserted without understanding episode content, which is honestly pretty stupid. If someone’s talking about productivity tools, that would be the perfect time for a productivity app ad.

Brand safety is still unsolved. There’s no reliable way to ensure ads don’t appear next to inappropriate content. And ROI tracking? Forget about it. You don’t know how many people actually heard your ad, let alone acted on it.

The Opportunity: Why This Matters for Founders

Here’s why I’m excited about these problems: every single one represents a massive business opportunity.

The companies that solve these infrastructure problems—particularly the measurement problem—will enable the next wave of innovation in podcasting. We’re still in the early days of what’s possible when podcast content becomes fully searchable and analyzable.

Think about the applications you could build:

  • Better recommendation engines based on actual content, not just metadata
  • Real competitive intelligence for brands monitoring their mentions
  • Contextual advertising that actually makes sense
  • Content intelligence that reveals trends as they happen
  • Tools that help creators understand their audience and improve retention

With better infrastructure and tooling, we could get much better at understanding the content that’s out there and actually using it effectively. Monitoring for brands, tracking social movements, figuring out who should be guests on your show, identifying emerging trends—all of this becomes possible with the right foundation.

The Data Moat

Here’s something I’ve learned from processing 33 million podcast episodes: the more podcast data you process, the better your understanding becomes. Individual show analysis—downloading all episodes of one show and transcribing them—could never reveal the insights that emerge from processing data at scale.

Every company that gets deeper into podcasting data is building a data moat. The patterns, the connections, the anomalies—they only become visible when you’re looking at the entire ecosystem.

The Race is On

The question isn’t whether somebody will solve these problems—it’s who will solve them first and who will be positioned to benefit from the market that emerges.

Podcasting growth is happening right now, despite these massive technical limitations. Imagine what becomes possible when we fix the foundation.

I think about this every day as we build Podscan. Every one of these problems is solvable with the right technical approach and the willingness to acknowledge that we won’t get it perfect from the start, but we can build something that enables new solutions to be built on top.

If you’re looking for a space to build something meaningful, to solve real problems that affect millions of creators and billions of listeners, the podcasting infrastructure space is wide open. The industry is ready for solutions. The market is there. The problems are real and well-defined.

The only question is: who’s going to step up and solve them?



We're the podcast database with the best and most real-time API out there. Check out podscan.fm — and tell your friends!

Thank you for reading this week’s essay edition of The Bootstrapped Founder. Did you enjoy it? If so, please spread the word and ​share this issue on Twitter.

If you want to reach tens of thousands of creators, makers, and dreamers, you can ​apply to sponsor ​an episode of this newsletter. Or just reply to this email!

To make sure you keep getting your weekly dose of Bootstrapped Founder, please add arvid@thebootstrappedfounder.com to your address book or whitelist us.

Did someone forward you this issue of The Bootstrapped Founder? ​You can subscribe to it here!​

Want to change which emails you get from The Bootstrapped Founder or unsubscribe for good? No worries, just click this link: ​change email preferences​ or ​unsubscribe​​.

Our postal address: 113 Cherry St #92768, Seattle, WA 98104-2205

Opt-out of preference-based advertising

Arvid Kahl

Being your own boss isn't easy, but it's worth it. Learn how to build a legacy while being kind and authentic. I want to empower as many entrepreneurs as possible to help themselves (and those they choose to serve).

Read more from Arvid Kahl

Podcast, YouTube, Blog Dear founder, Vova Feldman is the founder of Freemius, a Merchant of Record payment provider. He's been at it for a while, and it wasn't always easy. Vova shares his approach to weathering the highs and lows of entrepreneurship, how to build a team (and mis-hires), and what operating in the payment industry means for a crafty founder. Enjoy! THE BOOTSTRAPPED FOUNDER • EPISODE 401 401: Vova Feldman — Mastering Entrepreneurship in the Payments Sector 41:13 MORE INFO Oh,...

Podcast, YouTube, Blog Dear founder, I just had one of those realizations that makes you stop and think differently about everything we’ve been discussing around AI tools. You know, we talk a lot about AI helping us build things faster, about automation, about whether it’s going to replace developers. But there’s a side effect happening right now that I think is either completely undervalued or just not being observed at all. And honestly? It might be more transformative than the automation...

Podcast, YouTube, Blog Dear founder, Shane Rosenthal and Simon Hamp from the NativePHP Project have brought PHP, and with it my favorite web framework Laravel, onto Mobile devices. I love this: taking established tech and porting it into places where you wouldn’t expect. I’m talking to Share and Simon about how they accomplished this, and, maybe even more impressively, how they turned this into a profitable business at a very early stage. Enjoy! THE BOOTSTRAPPED FOUNDER • EPISODE 399 399:...