Dear founder,

Last week, I dove into my ideal customer. Now that I have chosen that elusive ideal customer profile, there are consequences. I have found the right people to build for, but what do they need?

I’ll walk you through how I handle product challenges today.

🎙️ Enjoy the blog post and the podcast

Let’s recap. I decided to turn Podscan into the most comprehensive podcast data platform it can be. My ideal customer is anyone who wants to build a product, a service, or a business on top of that data platform.

And that means that I’m selling something that is extremely easy to copy, clone, and abuse. The thing that I want to freely give to my paying customers —transcripts, rankings, metadata, and all other kinds of things related to podcasts— is also the thing I have to protect at all costs.

Before we dive into the details, let me introduce this week's sponsor:

Last chance to invest before this company becomes a household name

And I mean it: Ring and Nest are quite literally in my household. Where were you when Amazon acquired Ring for $1B? Or when Google bought Nest for a cool $3.2B? Not paying attention, like I did?

But hey, the next groundbreaking Smart Home innovation has arrived 一 RYSE. Their automated window shade tech just launched in Best Buy stores and their pre-IPO investment offering is open to the public for a limited time. And hurry, their share price has already grown 20% from their last round! Take part in this exclusive public offering before RYSE becomes a household name. It will be for me.

Invest today

Back to building a data platform.

Here’s the bizarre thing about APIs: the easier it is to grab data, the more people want to use them. Yet, the easier it is to grab a LOT of data, the more risky it gets for the business offering it.

There are a few problematic kinds of behaviors that software businesses have to contend with, and they are exacerbated for an API-centric business:

Scraping

The biggest threat is someone just grabbing the whole database in one go. Every single podcast, every single transcript, all connections, all ratings, the whole thing. Duplicating a valuable database is what the internet was build for. Every time we visit a website, a small copy is made on our computers, and most of the time, website owners want that. That’s how it works. But a fully-fledged database that costs hundreds of hours and tens of thousands of dollars to create?

Yeah, not so much.

So I need to prevent this. From the start, I need to stay ahead of those who would want to siphon this treasure trove into their own systems. With that in mind, I need to think defensively in a few ways:

I need to make it hard to iterate over my database entries easily. If you’re downloading record #4287, you know that there probably is a #4288 as well. That way, a scraper could be automated to grab every single record in a row. That’s why I created encoded IDs in my API, just like Stripe, that both obfuscate the underlying ID and make the record more recognizable. Podcast #4287 turns into pod_a8625b — something that looks more like a podcast and less like a random number. If someone were to get their hands on a list of these, of course, they could still scrape them, but all this needs to do is to deter people from seeing an easy opportunity.
Any API I offer needs to be severely rate-limited. Podcast information, particularly historical data, doesn’t change after the fact. Even with mild scraping, someone could eventually explore the whole API within a few months. That’s where rate limits come on. My trial plan allows a measely 100 requests per day. For a scraper, this is used up within seconds. For someone evaluating the product, it’s more than enough. Paid plans have liberal but still sensible limits. If someone needs more, they can buy an enterprise plan and get in touch. For anyone else, these limits will be sufficient — and if they’re not, I can modify them as I learn more.
Finally: no freemium! I can not and will not allow non-paying customers to access this data. If they can’t afford the $19/month plan, they can’t have it. People go to great lengths to automate account creation and data extraction in freemium products. Not going to happen here: Podscan is pay to play.

Copycats

I do this mostly because the easiest part of Podscan that a copycat founder could clone is the interface. The complicated and expensive stuff is all in the backend and the database. And that’s what people are after.

And product limitations aren’t the only barriers I can throw into their path.

Of course, I drafted terms & conditions for the API. I had that in place before I even activated it. The first sentence of these terms should make it absolutely clear what’s okay and what is not: “You can not use the Podscan API create an application or service that competes directly with Podscan’s core products.”

I also added a few sentences about storing the data — also not allowed if it’s not meant for immediately serving their customers. That’s a limitation that every API users agrees with upon connecting to the Podscan APIs.

The Problem with Limiting Access

When you limit access like this, you also limit opportunity, and that’s the hard balance to strike here. I want my users to feel they can build anything they want on top of the APIs, but I also want to very much stay in control of the data that powers these products.

I got a message earlier this week on my helpdesk chat widget from a founder who wondered just how much they could cache the data they receive from the API. Is a few seconds fine? Can they go into a cache to be sent out in an email later that day?

It got quite specific, and it reminded me just how much “just-in-time” decision-making running a software business really is about. I found a way that both the user and I were happy, and we took it from there (after all, the phrase “we may be able to offer an exemption to these rules in certain circumstances” is part of the terms & conditions too).

The more data I Podscan ingests, transcribes, and analyzes, the more critical these choices and partnership agreements will become. Right now, my users have personal access to me (and often a personal history from prior conversations on Twitter). But someday, these will be bigger and bigger businesses trying to get their hands on as much as they can.

What to Share and What to Hide

Which brings me to another conundrum. There are some kinds of data that I collect from a wide variety of sources that I might not want to share on the API at all. Audience size data is one of the best-kept secrets of the podcasting world. No hosting provider, no podcast player creator gives away even a glimpse at the actual numbers behind the podcasts they work with. The only people who know how many listeners they have are the owners of the podcasts themselves. And they don’t share.

In such a situation, what does one do? Guesstimates! One checks the Apple Podcast charts, looks for review counts, and the size of social media profiles, and then compiles them into some kind of score. Podchaser does this, ListenNotes too, and I’m working on something similar.

But I could share these metrics on my API. I have a full history of review counts on Apple. Why not add it to the API?

I struggle with this a lot. I want my users to be able to get as much as they can from the platform. But I also want to keep some secret sauce to myself. So I’ve been looking at how other platforms solve this. Most of them just don’t. If anything at all, they share a rough score — a simple ranking like “4/10” or “Top 10%”.

And even that tends to be only available in the more expensive tiers.

I think that’s what I’ll do with Podscan. Audience information is probably the most expensive non-AI-work to do for Podscan. It involves constantly scanning the web and parsing websites. Occasionally, I need proxies to reliably get results. And that has a cost.

For that reason, I think I’ll make anything indicating reach, audience, or listener data a Premium-and-higher feature. The API will not return these fields for Essentials customers and only return example data or rounded numbers for trial accounts. I’ll have to figure out how I can communicate this in the documentation and inside the product, but I think that’s the way forward. If it costs me to create, it should cost to consume.

Of course, I’ll have to make sure that all these limitations and protections are also present in the user-facing website. Scraping often happens right at that level, and I can already feel that my eagerness to present all kinds of interesting data might lead to a kind of data extraction that isn’t easily fought with rate limits and IP blocks.

No doubt I’ll run into other API- and data-related issues in the future. You might even think of one that I missed right now. Please feel free to send me a Twitter DM or an email at arvid@podscan.fm. I really appreciate all the wonderful feedback I have been getting over the last week as I’ve shared the Podscan journey in public.

I'll share a few updates about my SaaS on the pod, and I'd love to know what you think about them! Please leave a voice message at podline.fm/arvid 🥰

And if you want to track your brand mentions on podcasts, check out podscan.fm!

Classifieds

I recently launched The Bootstrapper's Bundle, which contains Zero to Sold, The Embedded Entrepreneur, and Find your Following. If you want to start a bootstrapped business and build a validated product and a personal platform while doing it, check out this bundle. It contains all eBooks, audiobooks, video courses and extra materials I ever created. It's just $50, for now.

Thank you for reading this week’s essay edition of The Bootstrapped Founder. Did you enjoy it? If so, please spread the word and share this issue on Twitter.

If you want to reach tens of thousands of creators, makers, and dreamers, you can apply to sponsor an episode of this newsletter. Or just reply to this email!

To make sure you keep getting your weekly dose of Bootstrapped Founder, please add arvid@thebootstrappedfounder.com to your address book or whitelist us.

Did someone forward you this issue of The Bootstrapped Founder? You can subscribe to it here!

Want to change which emails you get from The Bootstrapped Founder or unsubscribe for good? No worries, just click this link: change email preferences or unsubscribe.

Our postal address: 113 Cherry St #92768, Seattle, WA 98104-2205

Opt-out of preference-based advertising

Arvid Kahl

Challenges of Offering an API — The Bootstrapped Founder 313

Dear founder,

Scraping

Copycats

The Problem with Limiting Access

What to Share and What to Hide

Classifieds

Vova Feldman — Mastering Entrepreneurship in the Payments Sector— The Bootstrapped Founder 401

The Hidden Revolution: AI Is Democratizing Coding Mentorship — The Bootstrapped Founder 400

NativePHP: How Simon Hamp & Shane Rosenthal are Building & Monetizing PHP on Mobile— The Bootstrapped Founder 399