[{"content":" TL;DR\nFor most solo builders and small teams, content research is the bottleneck, not writing. A weekly Collect → Cluster → Score → Synthesize pipeline turns real customer conversations into a ranked list of topics you can scan over coffee. The pattern works for any niche where your audience talks publicly somewhere: Reddit, Indie Hackers, niche Discords, forums. The scoring prompt is where most of the leverage lives. Treat it as a markdown file you keep rewriting until its top three match the ones you\u0026rsquo;d pick by hand. As I was building out Storkly, I was pretty confident in getting us up and running to an MVP/Soft Launch phase. I had enough experience with software and web development and while there was (and still is) a lot to learn, getting from zero to one wasn\u0026rsquo;t my primary worry. It was, \u0026ldquo;how do I drive people to this product?\u0026rdquo;. I knew we needed to create content to get the word out there but I always found myself frozen at what to write about and _what to post about.\nIf you\u0026rsquo;re running content for a small brand or building out a new product as a team of one, you\u0026rsquo;ve probably noticed that this is where the real challenge lies and it\u0026rsquo;s the kind of research that historically meant a content strategist with a Reddit tab (among many others) open, a spreadsheet, and a lot of intuition.\nThat research step is the natural place to point AI. It\u0026rsquo;s information processing, not voice. Get the topic right and the rest of the workflow protects itself.\nThis post walks through an AI content research engine I built: a weekly content topic discovery system that ranks 3–5 themes from real customer conversations and drops them into a phone-readable digest every Sunday. I built it for Storkly, but the architecture is generic. I\u0026rsquo;ll show the pattern, then point at where you\u0026rsquo;d swap pieces for your own niche.\nThe content sprint problem Most small content teams end up converging on roughly the same weekly workflow:\nResearch — what should we write about this week? Founder or brand voice memo — the take, in their actual words AI draft assembly — a long-form scaffold from the memo Human edit — voice and accuracy pass Distribution — fan one piece out across channels Schedule Steps 2 through 6 keep a human in the loop because that\u0026rsquo;s where the brand actually lives. Step 1 doesn\u0026rsquo;t need one. It\u0026rsquo;s pattern recognition over public data, which is the kind of thing AI is genuinely good at. Automate it aggressively and you buy back the part of your week that only you can do.\nWhat I was optimizing for Before writing any code I pinned down what \u0026ldquo;good\u0026rdquo; looked like.\nDecision in 10 minutes. Output is a phone-readable Sunday morning digest. If I have to log into something to see it, I built the wrong thing. Anchored in real customer voice. Every theme is backed by 2–3 verbatim quotes with source links. An LLM saying \u0026ldquo;your audience cares about X\u0026rdquo; without quotes is hallucination with extra steps. Storkly uses Reddit because that\u0026rsquo;s where new parents are venting and asking questions. For SaaS or AI products you might use Indie Hackers, for hobbyists a Discord export. Brand-aligned scoring. No generic \u0026ldquo;topics in your category.\u0026rdquo; The system has to score against our specific brand narrative. A generic ranker surfaces the loudest topic in the niche every week, and that\u0026rsquo;s rarely the should be written about. Don\u0026rsquo;t Over-Engineer. Keep the focus narrow. One scheduled function, one table, three prompts. No vector DB, no agent framework, no orchestrator. When something breaks, I want to know exactly which file to open. The architecture Source ──► Collector ──► Cluster ──► Score ──► Synthesize ──► Store (HTTP) (LLM #1) (LLM #2) (LLM #3) (Airtable) Five stages, three of which are LLM calls. The Collect → Cluster → Score → Synthesize pattern is the meat of the flow and should be applicable for any niche where your audience has a public watering hole somewhere.\nFor Storkly, I started with Reddit (a handful of parenting subreddits) as the source, but will likely expand this overtime. The whole thing runs as a single Modal scheduled function, Saturday 11pm ET, around 5–10 minutes per run.\nDecisions that took thinking Once the architecture and guiding principles were defined, most of the build was straightforward. A few choices weren\u0026rsquo;t and took some playing around with.\nUnauthenticated Reddit JSON instead of PRAW or OAuth Reddit serves every public subreddit as JSON at a predictable URL. No app registration, no client secret, no token refresh:\nGET https://www.reddit.com/r/{subreddit}/top.json?t=week\u0026amp;limit=30 The weekly job is roughly 210 requests, well under the ~60 requests-per-minute unauthenticated cap at 1 req/sec pacing. Skipping OAuth removed a setup step that fails silently, two secrets from .env, and one dependency. If I ever need authenticated access, the collector swap is local and doesn\u0026rsquo;t touch anything downstream.\nPorting this: for B2B tech audiences, Hacker News\u0026rsquo; Algolia API is similarly auth-free and excellent. For hobbyist niches, Discord exports or RSS from a forum work the same way. The pattern is \u0026ldquo;find the lowest-friction read endpoint for your audience\u0026rsquo;s watering hole.\u0026rdquo;\nThree LLM calls instead of one A lot of people reach for \u0026ldquo;let one big model do everything in one prompt.\u0026rdquo; I think that\u0026rsquo;s a mistake when each step is a different kind of reasoning. I\u0026rsquo;d be lying if I said I hadn\u0026rsquo;t been fighting that concept as well (see \u0026ldquo;Don\u0026rsquo;t Over-Engineer\u0026rdquo; above).\nClustering needs broad context, so you hand it everything and let it find structure. Scoring needs sharp criteria, so you hand it a tight rubric and constrain it. Synthesis needs polished prose, so you hand it a small set of scored themes and ask for one good sentence at a time.\nCombining them muddles every prompt. Splitting them means each stage has one testable job and one clear failure mode. When the briefs read flat, I know it\u0026rsquo;s the synthesis prompt. When the ranking is off, I know it\u0026rsquo;s the scoring prompt. That separation pays for itself the first time something goes wrong.\nPorting this: don\u0026rsquo;t combine. Even in a simpler niche, the debugging cost of a one-shot prompt outweighs the latency win.\nSonnet, not Opus Claude Sonnet is handling cluster and score fine. I\u0026rsquo;ve tested with both and while the systhesis is slightly better with Opus, it wasn\u0026rsquo;t quite enough of a difference to justify the difference. If synthesis ever starts reading flat, promoting that stage to Opus is a one-line change because the prompts are decoupled.\nPorting this: start with the cheaper model. Upgrade individual stages only when output quality is actually the bottleneck.\nPrompts live in version-controlled .md files, not Python strings The scoring prompt is the highest-leverage piece of the system, and it\u0026rsquo;s going to change every week for the first month. Markdown diffs cleanly in git, reviews well on GitHub mobile, and a prompt change doesn\u0026rsquo;t drag in a code review. Each stage loads its prompt at runtime from src/prompts/. I never want to worry about a breaking error in the code just to try a new rubric. Flexibility and speed are key here.\nMy default is often to lead with code solutions (again, see \u0026ldquo;Don\u0026rsquo;t Over-Engineer\u0026rdquo; above), but when flexibility, iteration and letting the LLMs do what they do best is key .md prompts are by far the best approach.\nPorting this: non-negotiable. If your prompts live in source code, your iteration loop is broken.\nAirtable, not Notion or Postgres Review happens on a phone over coffee, so mobile UX matters more than anything else. Airtable\u0026rsquo;s mobile app beats Notion\u0026rsquo;s table view, and Postgres is overkill until I\u0026rsquo;m storing 10k+ rows. In all honestly, Airtable was also what I was used to and already had available so that\u0026rsquo;s what I went with. The key is something that is easy to update and review.\nPorting this: any mobile-friendly review surface works. Airtable, Notion, even a daily email. The constraint is \u0026ldquo;scannable on a phone in 10 minutes,\u0026rdquo; not the specific tool.\nNo SEO keyword data in v1 Audience engagement is a better proxy for topic intent than search volume. Search volume captures curiosity. A forum thread with 200 comments captures pain and engagement. Pain is what you want, because those are the conversations where someone is actively looking for an answer, not just typing a question into Google. I will likely add SEO keyword data later on to capture and synthesis intent with action. But as a phase 1 to get started, I kept the scope focused on audience engagement.\nPorting this: if your distribution is SEO-first, add Ahrefs or SEMrush later as an enrichment signal. Engagement still leads.\nThe most reusable artifact: the scoring prompt Most of the system can sit untouched for months. The scoring prompt is the one file I expect to keep editing.\nIt does two jobs. First, it tells the LLM what your brand is actually about. The fit hierarchy. This is the template that ports directly to any brand:\nCORE (fit score 8–10): [Topics that are 100% on-brand for you] ADJACENT (fit score 5–7): [Topics that touch your brand from the side] OFF-BRAND (fit score 1–4): [Topics in your category but not your story] (Score conservatively even with a clever angle — off-brand is off-brand.) For Storkly, CORE includes things like \u0026ldquo;managing extended-family communication during postpartum\u0026rdquo; and \u0026ldquo;photo privacy for newborn photos.\u0026rdquo; OFF-BRAND includes sleep training and baby gear, which are common parenting topics that just aren\u0026rsquo;t our story.\nThe fit hierarchy is the most important thing to write down for your own brand. It\u0026rsquo;s also the artifact that makes this system yours instead of generic.\nSecond, the prompt scores each theme on four dimensions:\nPain — how emotionally or practically acute is this? Signals: emotional language, exhaustion markers, late-night posts, conflict mentions. Volume — how often the theme shows up in the data. Fit — per the hierarchy above. Bias hard. Readiness — \u0026ldquo;I\u0026rsquo;d buy a solution right now\u0026rdquo; energy vs. venting. Composite:\nPain × Fit × Readiness × LOG(Volume + 1) Volume is logged so a high-volume but low-pain theme doesn\u0026rsquo;t drown out a sharp, low-volume one. Pain, Fit, and Readiness all matter linearly because they\u0026rsquo;re the dimensions that determine whether the topic is actually worth writing about. Volume is a tiebreaker.\nWhat\u0026rsquo;s actually built so far Reddit collector with 1-second pacing and 429 exponential backoff. Descriptive User-Agent header on every request, because Reddit aggressively rate-limits the default httpx/curl agents (which I learned the hard/slow way). Three Claude processing stages (cluster, score, synthesize), each loading its prompt at runtime via the Anthropic Python SDK. Airtable writer built on pyairtable, hitting a Weekly Topics schema with the composite score as a formula field. Local orchestrator plus a fixture cache so I can iterate prompts offline without re-hitting Reddit. Once you start tweaking prompts you want to push the same input data through 20 variations in quick succession without burning rate limit. The first end-to-end run produced 5 ranked topics with verbatim quotes and source links, reviewable on my phone in under 10 minutes. That was the bar I set going in.\nWhat\u0026rsquo;s next Prompt tuning. I\u0026rsquo;ll do a few weeks of manual Sunday runs. After each one, I log my picks and skips with the why. I use that to update the scoring prompt. The skips and picks with the why are the key training signal. Modal deploy. Wrap the orchestrator in a scheduled function. Airtable push notification when the digest lands. I\u0026rsquo;ve played around with scheduling this as a CRON job on a local server, but Modal is cheap enough that the couple of dollars was worth not having to potentially troubleshoot the CRON job or my server setup. Multi-source enrichment. Pinterest Trends and Google Trends. I\u0026rsquo;m leaving this for later to keep the initial deployment focused and test my engagement for interest thesis. If I start to notice a lot of repeat themes/topics then I\u0026rsquo;ll want to start pulling in more sources for a broader base. Customer signal loop. Once we start driving more traffic to Storkly directly from blog posts, reddit, instagram, etc I\u0026rsquo;ll be able to leverage that data to further refine. First-party signals are always the priority. Porting this to your own niche If you want to build something similar, here\u0026rsquo;s the rough porting guide:\nPick your audience watering hole. Consumer or lifestyle goes to Reddit. B2B and SaaS founders go to Indie Hackers and Hacker News (Algolia API). Developers go to GitHub Discussions and Stack Overflow (I miss when Stack Overflow was the go to) tags. Hobbyists go to Discord exports, niche forums, or subreddits. Vertical professionals go to industry Slack or Discord communities. Write your own fit hierarchy. The CORE / ADJACENT / OFF-BRAND template ports directly. Spend an actual hour on this. It\u0026rsquo;s the artifact that makes the system rank your topics, not generic ones. Keep the three-call structure. Cluster / Score / Synthesize is independent of niche. Resist the urge to combine. Use a mobile-friendly review surface. Airtable, Notion, a daily email. Anything you\u0026rsquo;ll actually open with coffee in the morning. Define your own success bar before you start tuning. Mine: 4 of 6 weeks the top three includes a topic worth writing about, average review under 15 minutes, and at least one piece sourced from the system tops its primary channel. Yours will be different, but write it down before you start tuning or you\u0026rsquo;ll move the goalposts. Closing thought The interesting AI work in content workflows isn\u0026rsquo;t the generation step. It\u0026rsquo;s the ranking step.\nGeneration is cheap and getting cheaper, which means generation without taste mostly produces more of what already exists, converging on the average. The work that\u0026rsquo;s actually valuable is putting the right input in front of a human whose voice the audience already wants to hear, and then getting out of the way.\nI\u0026rsquo;ll post weekly updates from the tuning phase as Phase 2 runs. If you want the scoring prompt template as a starting point for your own niche, reach out — happy to share.\n","permalink":"https://chasrobinson.com/posts/how-to-build-an-ai-content-research-engine/","summary":"\u003cblockquote\u003e\n\u003cp\u003e\u003cstrong\u003eTL;DR\u003c/strong\u003e\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eFor most solo builders and small teams, content research is the bottleneck, not writing.\u003c/li\u003e\n\u003cli\u003eA weekly \u003cstrong\u003eCollect → Cluster → Score → Synthesize\u003c/strong\u003e pipeline turns real customer conversations into a ranked list of topics you can scan over coffee.\u003c/li\u003e\n\u003cli\u003eThe pattern works for any niche where your audience talks publicly somewhere: Reddit, Indie Hackers, niche Discords, forums.\u003c/li\u003e\n\u003cli\u003eThe scoring prompt is where most of the leverage lives. Treat it as a markdown file you keep rewriting until its top three match the ones you\u0026rsquo;d pick by hand.\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eAs I was building out \u003ca href=\"https://getstorkly.com/\"\u003eStorkly\u003c/a\u003e, I was pretty confident in getting us up and running to an MVP/Soft Launch phase. I had enough experience with software and web development and while there was (and still is) a lot to learn, getting from zero to one wasn\u0026rsquo;t my primary worry. It was, \u0026ldquo;how do I drive people to this product?\u0026rdquo;. I knew we needed to create content to get the word out there but I always found myself frozen at \u003cem\u003ewhat to write about\u003c/em\u003e and _what to post about.\u003c/p\u003e","title":"How to Build an AI Content Research Engine: a Claude + Reddit Case Study"},{"content":"We were waiting on the call from the hospital for our scheduled induction. Originally we were supposed to come in Friday afternoon. The call didn\u0026rsquo;t come until Saturday evening. Almost twenty-four hours of waiting, waiting, waiting — and a constant stream of texts, calls, and \u0026ldquo;any update?\u0026rdquo; messages from family and friends who loved us and just wanted to know.\nSomewhere around hour fifteen, half-laughing, half-broken, we said it out loud: there has to be a better way to do this.\nI had an unused domain name lying around, and an old computer collecting dust. In a few hours I turned that old machine into a little web server and spun up a simple update page. We sent the link to everyone we knew and told them: we\u0026rsquo;ll keep this updated as things go. Check here instead of texting.\nIt was a hit. Friends and family loved it. We loved it. And somewhere in the quiet between updates, we knew we were onto something.\n","permalink":"https://chasrobinson.com/storkly/2026-05-13-why-storkly/","summary":"\u003cp\u003eWe were waiting on the call from the hospital for our scheduled induction. Originally we were supposed to come in Friday afternoon. The call didn\u0026rsquo;t come until Saturday evening. Almost twenty-four hours of waiting, waiting, waiting — and a constant stream of texts, calls, and \u0026ldquo;any update?\u0026rdquo; messages from family and friends who loved us and just wanted to know.\u003c/p\u003e\n\u003cp\u003eSomewhere around hour fifteen, half-laughing, half-broken, we said it out loud: there has to be a better way to do this.\u003c/p\u003e","title":"Why we're building Storkly"}]