Dear founder,

The journey to product-market fit is always fascinating, and my experience with Podscan has been particularly enlightening — and challenging.

While we’re not yet profitable, we’re getting closer every month, and I want to share some insights about how we’re finding our way there.

🎧 Listen to this on my podcast.

From the start, I struggled with clarity about what Podscan was going to be. The potential seemed vast: with transcripts and insights from every podcast conversation out there, what couldn’t we do? We could be an alerting system, a mention tracking platform, a tool for downloading someone’s entire “thoughtscape,” or even a comprehensive data platform for analyzing podcast trends across time, categories, and geographic regions.

All of these are valid use cases, and people actually use Podscan for each of them. But as a solopreneur, focusing on multiple directions simultaneously is challenging. Each use case requires different positioning, different product emphasis, and different ways of communicating value.

When talking to a podcast marketing agency, they don’t care about data extraction file formats or API specifications – they need to know if they can effectively place their clients on podcasts. Conversely, businesses building on top of Podscan’s firehose of transcript data don’t need our alert tracking component; they just want reliable access to raw data.

Finding Our Focus

While juggling these different possibilities, a pattern emerged: there’s one particular customer type for whom Podscan’s value is immediately apparent – podcast agencies. When they log in, they instantly understand how they can search for podcasts, track their clients’ names, and get notifications that help them place their clients on shows. It’s a clear, actionable value proposition.

This realization led to some crucial product decisions. Initially, our notifications were basic: just the podcast name, episode title, thumbnail, and the mention text itself. I thought this would be sufficient for users to take action. But through conversations with customers who use Podscan for booking podcast appearances or reaching out for sponsorships, I discovered they needed answers to two critical questions:

Is this worth my time?
How can I most easily reach out and get my person placed on this show?

The Challenge of Podcast Analytics

The first question led us to tackle one of podcasting’s biggest challenges: audience size metrics. This information is notoriously hard to come by – podcast platforms don’t publicly share listener counts, and unless hosts voluntarily share their numbers, it’s completely opaque.

Even the major platforms like Apple Podcasts only show rankings within categories, never actual listener numbers. You won’t find download counts per episode anywhere public. It’s intentionally opaque data that platforms and creators guard closely.

Building the Machine Learning System

This drove me to build something I initially thought impossible: a machine learning system for estimating podcast audience sizes. The journey started with manual data collection – I spent weeks gathering information about thousands of podcasts where hosts had publicly shared their listener counts in interviews, on social media, or during episodes.

For each podcast, I collected over 260 different data points. These include:

Public metrics: rankings, review counts, and ratings across different platforms
Content patterns: episode frequency, length, and guest appearance rates
Historical data: podcast age, episode count, and publishing consistency
Engagement signals: social media presence, website traffic indicators
Category-specific benchmarks: performance relative to similar shows

The real challenge came in building the ML model itself. I implemented it directly in my PHP application, which required careful architecture to handle the computational load efficiently. The system uses a neural network with multiple hidden layers, performing gradient descent optimization to find the best correlations between our input features and known audience sizes.

One of the trickiest aspects was handling outliers and incomplete data. Not every podcast has all 260 data points available, so the system needed to work with partial information. I implemented a weighted feature system that adjusts the model’s confidence based on the quality and quantity of available data.

The current system achieves an impressive sub-3% error rate for more than half of its estimations, meaning the projected audience size is within 3% of the actual number for these cases. Even in less accurate cases, the system typically stays within a factor of five from the real number – a podcast predicted to have 1,000 listeners might actually have between 200 and 5,000, which is still valuable for prioritization decisions.

To maintain and improve accuracy, I built a switchable model architecture. This allows me to deploy new models as they’re trained without any service interruption. Each model version is tracked and evaluated, with automated performance monitoring to ensure we’re always using the most accurate predictions.

Contact Data Enhancement

The second major challenge – streamlining outreach – required a different kind of technical solution. While we already had contact information in our database, it was scattered across different data sources:

RSS feed metadata
Episode descriptions
Show notes
Linked social profiles
Marketing websites
Historical interaction data

I built a contact information extraction pipeline that processes these sources, using natural language processing to identify and validate contact details. The system can distinguish between general show contact information and specific guest contact details, which is crucial for our users.

We also implemented a confidence scoring system for contact information. Email addresses found in official RSS feeds get higher confidence scores than those extracted from episode descriptions, helping users prioritize their outreach channels.

Streamlining the Outreach Process

With both the audience metrics and contact data systems in place, we completely redesigned our notification system. Now, when users receive a mention alert, they see:

Estimated audience size with confidence level
Audience growth trend over time
Direct contact options ranked by reliability
One-click export to popular CRM systems
Historical interaction data if available

We’ve also added bulk operations for agencies managing multiple clients. Users can create custom lists, apply filters based on audience size or category, and export entire contact datasets in formats compatible with major CRM platforms.

The results have been remarkable. Our trial-to-paid conversion rates improved significantly as users found clear value in these audience metrics for prioritizing their outreach efforts. The average time from receiving a notification to initiating contact dropped from hours to minutes.

Looking Forward

We’re not done evolving. Our next challenge is demographics – determining a podcast’s location, gender distribution, and age range of listeners. This might involve audio analysis, text processing, or both. I’m particularly excited about:

Voice analysis for demographic insights
Natural language processing for audience targeting
Geographic distribution mapping
Content categorization improvements
Automated trend detection

What I’ve learned through this journey is that product-market fit isn’t just about having valuable features – it’s about making that value immediately obvious and actionable for your users. For Podscan, this meant moving away from a one-size-fits-all approach to optimizing specific features for our best-fit customers. The API serves businesses wanting to process podcast data professionally, while the alerting system focuses on agencies needing quick insights and action paths.

This isn’t a pivot – we’re not changing what Podscan fundamentally does. We’re just getting better at showing the right value to the right users at the right time. Sometimes that means building something you initially thought impossible, like our audience estimation system. But when it serves your users’ core needs, it’s worth the effort.

If you want to track your brand mentions on podcasts, please check out podscan.fm — and tell your friends!

Thank you for reading this week’s essay edition of The Bootstrapped Founder. Did you enjoy it? If so, please spread the word and share this issue on Twitter.

If you want to reach tens of thousands of creators, makers, and dreamers, you can apply to sponsor an episode of this newsletter. Or just reply to this email!

To make sure you keep getting your weekly dose of Bootstrapped Founder, please add arvid@thebootstrappedfounder.com to your address book or whitelist us.

Did someone forward you this issue of The Bootstrapped Founder? You can subscribe to it here!

Want to change which emails you get from The Bootstrapped Founder or unsubscribe for good? No worries, just click this link: change email preferences or unsubscribe.

Our postal address: 113 Cherry St #92768, Seattle, WA 98104-2205

Opt-out of preference-based advertising

Arvid Kahl

Product-Market Fit & Time-to-First-Value — The Bootstrapped Founder 360