Dear founder,

Coming out of MicroConf, I’ve gained clarity about something I’ve been feeling for a while.

It’s time to reposition Podscan.fm, shifting away from alerting and monitoring as main features, and moving more decisively toward what it’s already becoming: a comprehensive podcast database.

THE BOOTSTRAPPED FOUNDER • EPISODE 383

383: Repositioning Podscan: From Monitoring to Data Platform

18:45

MORE INFO

This isn’t just a cosmetic change. It’s a genuine pivot in focus and functionality that aligns better with how people are actually using the product. Over the past several months, I’ve watched how customers interact with Podscan, and the signs have been increasingly clear.

The most telling indicator? API usage. More and more customers are intensively using the API to access our data. Some clients are making six-figure API requests every single day. That’s significant! And it tells me something important about the value we’re providing.

I’ve also been receiving frequent requests for data exports of the transcripts that sit in our database. To put that in perspective, we’re talking about roughly 18 million fully transcribed episodes, with many more waiting for processing or ready for a second pass to improve accuracy. When customers are asking for direct access to that kind of data volume, they’re telling you something essential about what they value.

From Presentation Pivot to Focus Pivot

This repositioning isn’t just about how we present Podscan to the world. It’s about what we prioritize in development and where we focus our resources. The shift centers around two key principles:

Data Agility - Making sure data is available faster
Data Fidelity - Ensuring data is better and more reliable

These principles are now driving our development roadmap. I’m focusing on features that customers have been requesting for a while, but that previously felt unfeasible. Now, with our new direction clarified, these features have moved to the foreground.

Understanding Customer Use Cases

What’s fascinating about this pivot is how it’s forcing me to focus more intently on the specific ways customers use our data. This varies somewhat by customer segment, but there are common patterns emerging.

One major use case that keeps coming up is host and guest tracking. Many users aren’t just interested in the content of shows - they want to know about the people behind the voices. They’re asking questions like:

How trustworthy are the people speaking?
Who is saying what?
What are the credentials of this host or guest?
What’s their background and expertise?

It’s not just about the content; it’s about context. Who someone is shapes how we interpret what they say.

From Display-Only to Entity Tracking

I’ve actually been extracting this kind of information for months, but only in a display-only form. For each podcast episode, I would extract and show the host’s name, the guest’s name, social links to their profiles, maybe their website, their occupation if I could figure it out from the episode content… and that was it.

When a new episode came in, I’d do the same data extraction again. Sometimes the new extraction would be better, sometimes worse than previous attempts, because it was always done on a per-episode basis. There was no persistence or connection between occurrences of the same person across different shows.

Now I’m working on something much more powerful: entity tracking. The concept is simple but game-changing. If a person can be reliably detected by their name on a show, and then the same name with the same social media handles and occupation shows up on another show, it’s likely they’re the same person.

This means we can say, “This is one entity, and they appear here as a host, there as a guest, they’re mentioned in this episode, and they sponsor that show.” Entity recognition and attribution are becoming core capabilities of our data platform.

Building a Podcast Graph

Unlike before, where this information was just layered on top of transcription data, we’re now tracking these entities in their own database. This opens up incredible possibilities, such as:

Following the same person across all podcasts they’ve appeared on
Connecting appearances on one podcast with mentions on different shows
Creating a graph of interconnections between people in the podcast ecosystem

This is extremely valuable data for people using Podscan for research and outreach purposes. Imagine being able to query: “Give me a list of all people who have been on this show AND on these five other shows.” That kind of capability transforms Podscan from a useful tool into an essential data platform.

The Technical Challenges

I won’t sugar-coat it - the recognition part is surprisingly complicated. Entity recognition is generally fraught with false positives.

Consider this scenario: someone says, “Hey, it’s me, I’m John from this podcast.” They never mention their full name, and there’s no additional information - no homepage, email address, or social media link. It’s just “John from podcast XYZ.”

Then there’s “John from podcast ABC.” Is that a different John? Does one person have two podcasts? Is he the host of one and a guest on another? It’s hard to determine this programmatically with limited data.

But this is where our comprehensive approach gives us an edge. Since we extract so much data from podcasts, episodes, and adjacent social media profiles, we have more reliable heuristics. We can say with reasonable confidence, “We know this person. We’ve seen them before on a similar show. It’s quite likely this is the John we’re talking about.”

The system I currently have running in my testing environment is quite reliable, but it took time to set up. The challenge was finding the right balance of flexibility and precision:

You need enough leniency to handle slightly different data that still belongs to the same entity. For example, a sponsor might use different tracking links for each episode they sponsor (different discount codes in URLs), but they’re still the same sponsor.
At the same time, you don’t want two people with similar-sounding social media profile names to be automatically attributed to one individual.

There are entire businesses that focus solely on solving these problems. For our purposes, the system works reliably for standard names and brands. For people with names that are often mistranscribed or very common, it becomes less reliable - but that’s the nature of working with massive datasets.

I’m continually refining this system to make it better and more accurate. The goal is to enable it as a searchable feature within the platform for all users, and as an API-centric feature for people to build automations on top of.

Understanding My Core Customer

This product pivot is a direct consequence of my MicroConf visit, where I finally understood who my main customer is and should be. That said, I’m not abandoning any user segments. One fascinating pattern I’ve observed is the transition path customers often take.

Many start out using Podscan manually. They’re often the first person in their agency to adopt the tool for a specific project. Later, they might get others in their organization interested and invite them to a team. Eventually, someone turns it into a more automated part of the business, and the agency starts using the API.

I’ve seen several customers follow this exact path, which tells me that maintaining both the manual interface and the powerful API is important. They serve different stages of the customer journey.

The Value of Focus

This repositioning has clarified not just what Podscan is, but what it’s not. We’re not just a monitoring tool anymore - we’re a comprehensive podcast data platform with unique capabilities around entity tracking and relationship mapping.

By focusing on this core value proposition, I can make better decisions about where to invest my development time. Features that enhance our data quality, comprehensiveness, and accessibility now take priority, while nice-to-have monitoring capabilities become secondary.

What This Means for Users

For our existing users, this pivot means more powerful data capabilities and more reliable information. For those using our API, it means richer datasets and more sophisticated query possibilities. For those considering Podscan, it means a clearer understanding of our unique value proposition.

The podcast landscape is crowded with monitoring tools, but there’s nothing quite like what Podscan is becoming: a comprehensive, interconnected database of podcast content and the people who create it.

Looking Forward

I’m incredibly excited about this direction. Building out entity tracking and attribution feels like the right next step for Podscan, one that aligns with how our most engaged users are already using the platform.

In the coming months, you’ll see these capabilities roll out both in the user interface and in our API. You’ll be able to follow entities across shows, understand connections between podcast personalities, and gain insights that simply weren’t possible before.

For those interested in the technical details or API capabilities, I’ll be publishing more detailed documentation soon. In the meantime, I’d love to hear your thoughts on this pivot and what kinds of entity-tracking capabilities would be most valuable to you.

We're the podcast database with the best and most real-time API out there. Check out podscan.fm — and tell your friends!

Thank you for reading this week’s essay edition of The Bootstrapped Founder. Did you enjoy it? If so, please spread the word and share this issue on Twitter.

If you want to reach tens of thousands of creators, makers, and dreamers, you can apply to sponsor an episode of this newsletter. Or just reply to this email!

To make sure you keep getting your weekly dose of Bootstrapped Founder, please add arvid@thebootstrappedfounder.com to your address book or whitelist us.

Did someone forward you this issue of The Bootstrapped Founder? You can subscribe to it here!

Want to change which emails you get from The Bootstrapped Founder or unsubscribe for good? No worries, just click this link: change email preferences or unsubscribe.

Our postal address: 113 Cherry St #92768, Seattle, WA 98104-2205

Opt-out of preference-based advertising

Arvid Kahl

Repositioning Podscan: From Monitoring to Data Platform — The Bootstrapped Founder 383