Competitor price monitoring with AI: a Firecrawl + GPT + Slack cron, for $8.50/month

There is a moment in every founder's week where they wonder if Acme just dropped their Pro plan to undercut you. You open Acme's pricing page in one tab. Then ten more competitor tabs in ten more tabs. You eyeball the numbers, squint, and try to remember what the headline price was last week. Maybe you do this every Monday. Maybe you used to and stopped because nobody pays you to be a human price scraper.

Clay sells you a $185 to $495/month Growth plan to do this, and their own blog post on automated web scraping literally tells you the cheap alternative is "use cron jobs (via Crontab)". Reddit founders running on smaller budgets say the same thing: an r/smallbusiness thread on tracking competitor prices is full of people spending one to two hours a day on twenty listings, and r/dropshipping has a recurring "is there anything under $100/month" thread that never resolves.

Here is that alternative, with Firecrawl handling the scrape, GPT handling the structured extraction, Google Sheets holding the URL list and the history, Slack receiving the alert, and Crontap handling the clock. Twelve competitors, four plans each, scraped every six hours. About $8.50 a month.

If you want the short version: a Crontap schedule fires every 6 hours in your team's timezone against a backend route. The route reads competitor URLs from Sheets, calls Firecrawl /scrape for each one to get LLM-ready markdown, hands the markdown to gpt-4o-mini with a strict JSON schema, validates the extracted prices, diffs against the last run, and posts to Slack when any plan moves more than 5%.

Why the existing solutions are awkward

There are a lot of products in this space. None of them quite fit a solo founder watching twelve competitors.

Apify has actors for almost every site you can think of, and they will happily run them on a schedule. The catch is that the schedule lives inside Apify, the data lives inside Apify, and the alert routing lives inside Apify. The moment you want to run a quick diff against your own product's pricing logic, or post to a channel that is not on Apify's integrations list, you are back to gluing things together. The actor library is the wedge; the scheduler is just there.

n8n Cloud starts at $20/month for the Starter plan, which is fine in isolation, except your only reason to pay for it is the scheduling layer. The actual scrape and the actual LLM call still cost what they cost. You are paying $20 a month for a cron and a workflow editor.

Clay is the explicit comparison because they wrote the blog post that names the gap. Their Growth plan runs $185 to $495 a month depending on credits, and the credits go fast when you scrape pricing pages four times a day. Clay is great if you also need enrichment, outbound, and the rest of the GTM stack. For "watch twelve pricing pages and tell me when they move", it is overkill by a factor of twenty.

Visualping does change detection on web pages and starts at $14/month for 20 pages checked daily. It will catch a price change, but it sends you a screenshot diff, not structured data. You still have to open the email, find the number, and type it somewhere. There is no "Acme Pro went from $99 to $79" object you can feed into a spreadsheet or a Slack message.

A bash script with curl and grep works until the target site renders pricing in JavaScript, which is most modern SaaS pricing pages. At that point you are reaching for headless Chrome, and headless Chrome is its own ongoing problem.

The gap in the middle is: clean markdown of a JS-rendered page, structured extraction with a real schema, diff against history, alert to where your team actually reads things, on a clock that you control. That is the shape this post builds.

The shape

Four boxes, each doing one thing.

Crontap  →  HTTPS POST  →  /price-watch/sweep  →  Firecrawl  →  OpenAI  →  Sheets diff  →  Slack

The cron fires every six hours in your team's IANA timezone (the alert lands during working hours, not 03:00 UTC). The route reads the competitor URL list from a Google Sheet. For each URL it calls Firecrawl's /scrape with formats: ["markdown"], which handles the JS rendering and returns clean markdown without ads or nav clutter. The markdown goes to GPT with a strict JSON schema. The schema is validated server-side (numeric, in a sane range, plan name from a known list). The validated rows are compared against the same URL's last seven runs, stored as rows in another Sheets tab. If any plan's monthly price moved more than 5%, the route posts to Slack via incoming webhook.

You can swap any of the boxes without touching the others. Move from Firecrawl to ScrapingBee, the schema does not change. Move from GPT to Claude Haiku, the cron does not care. Move from Sheets to Postgres, the alert format stays the same.

Worked example: 12 competitors, 4 plans each

The route is one handler. It does four things in order: read URLs, scrape, extract, alert.

Step 1: scrape with Firecrawl

Firecrawl's /scrape endpoint returns markdown by default, with JS rendering enabled, in one synchronous call (or polling for slow pages). Twelve competitors at one URL each is twelve calls. The Node SDK has a single helper that hides the polling for the slow ones.

import FirecrawlApp from "@mendable/firecrawl-js";
 
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY! });
 
async function scrapePricing(url: string) {
  const result = await firecrawl.scrapeUrl(url, {
    formats: ["markdown"],
    onlyMainContent: true,
    waitFor: 1500,
  });
  if (!result.success) {
    throw new Error(`Firecrawl failed for ${url}: ${result.error}`);
  }
  return result.markdown!;
}

onlyMainContent: true strips header, footer, cookie banners, and most of the chrome. waitFor: 1500 gives JS-rendered pricing tables a chance to mount. The output is a single markdown string with the pricing tiers as headings and bullet lists. That is the shape GPT is good at parsing.

Step 2: extract with a strict JSON schema

The temptation is to ask GPT for "the prices on this page" and let it freestyle. That is how you end up with $79 plans that do not exist and feature lists pulled from a competitor's old marketing page. The fix is OpenAI's structured outputs with strict: true, plus a known-plans allow-list, plus a numeric validator.

import OpenAI from "openai";
const openai = new OpenAI();
 
const schema = {
  type: "object",
  required: ["competitor", "scraped_at", "plans"],
  additionalProperties: false,
  properties: {
    competitor: { type: "string" },
    scraped_at: { type: "string" },
    plans: {
      type: "array",
      items: {
        type: "object",
        required: ["plan", "monthly_usd", "annual_usd", "seats", "headline_feature"],
        additionalProperties: false,
        properties: {
          plan: { type: "string" },
          monthly_usd: { type: ["number", "null"] },
          annual_usd: { type: ["number", "null"] },
          seats: { type: ["integer", "null"] },
          headline_feature: { type: "string" },
        },
      },
    },
  },
};
 
async function extractPlans(competitor: string, markdown: string) {
  const completion = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    response_format: {
      type: "json_schema",
      json_schema: { name: "pricing_extract", schema, strict: true },
    },
    messages: [
      {
        role: "system",
        content:
          "You extract pricing plans from a competitor's pricing page markdown. Return only plans literally on the page. If a price is hidden behind 'Contact us', set it to null. Do not infer. Do not invent. Use commas, parentheses, or colons, not em-dashes.",
      },
      {
        role: "user",
        content: `Competitor: ${competitor}\n\nMarkdown:\n${markdown}`,
      },
    ],
  });
  return JSON.parse(completion.choices[0].message.content!);
}

The schema is strict, but the model can still return "Pro" when the page changed to "Professional", or a monthly price of 990 when the annual price is in the wrong field. The next thing the handler does is a server-side sanity check.

const KNOWN_PLANS = new Set(["Starter", "Pro", "Professional", "Business", "Enterprise", "Team"]);
 
function validate(extract: any) {
  for (const p of extract.plans) {
    if (!KNOWN_PLANS.has(p.plan)) {
      throw new Error(`Unknown plan label: ${p.plan}`);
    }
    for (const field of ["monthly_usd", "annual_usd"] as const) {
      const v = p[field];
      if (v === null) continue;
      if (typeof v !== "number" || v < 1 || v > 10_000) {
        throw new Error(`Bad ${field} for ${p.plan}: ${v}`);
      }
    }
  }
}

If validation throws, the route returns 500 and Crontap retries on the next fire. You hear about a price-page redesign within six hours, not three weeks.

Step 3: diff against the last seven runs

The history lives in a Google Sheet, one row per (competitor, plan, scraped_at). For each plan in the current extract, look up the same competitor's same plan in the previous run. If the monthly price moved more than 5%, queue an alert.

type PriceRow = {
  competitor: string;
  plan: string;
  monthly_usd: number | null;
  annual_usd: number | null;
  scraped_at: string;
};
 
function diffPlans(prev: PriceRow[], next: PriceRow[]) {
  const alerts: string[] = [];
  for (const n of next) {
    if (n.monthly_usd === null) continue;
    const p = prev.find((r) => r.competitor === n.competitor && r.plan === n.plan);
    if (!p || p.monthly_usd === null) continue;
    const delta = (n.monthly_usd - p.monthly_usd) / p.monthly_usd;
    if (Math.abs(delta) >= 0.05) {
      const sign = delta > 0 ? "+" : "";
      alerts.push(
        `${n.competitor} ${n.plan}: $${p.monthly_usd} → $${n.monthly_usd} (${sign}${(delta * 100).toFixed(1)}%)`
      );
    }
  }
  return alerts;
}

The arrow in the alert string is a real Unicode arrow, not an em-dash. The 5% threshold filters out the rounding (most pricing pages do not move; the ones that do, move by 10% or more).

Step 4: post to Slack

Slack's incoming webhook is the simplest destination on the internet: one POST, one JSON body, you are done.

async function postSlack(alerts: string[]) {
  if (alerts.length === 0) return;
  await fetch(process.env.SLACK_WEBHOOK_URL!, {
    method: "POST",
    headers: { "content-type": "application/json" },
    body: JSON.stringify({
      text: `*Competitor pricing drift*\n${alerts.map((a) => `• ${a}`).join("\n")}`,
    }),
  });
}

The Crontap setup for the whole thing is one schedule:

URL: https://yourapp.com/price-watch/sweep
Method: POST
Headers: Authorization: Bearer <CRON_SECRET>
Cadence: 0 */6 * * * (every 6 hours)
Timezone: America/New_York (or wherever your team reads Slack)
Failure alert: email on 4xx/5xx, with retry on transient Firecrawl 503

When Firecrawl 503s (it happens, especially on heavy pages), Crontap retries. When the retry also fails, you get an email while the failure is fresh, instead of finding out three weeks later that your last seven Slack alerts were silence.

Fix this in 60 seconds with Crontap. Free forever tier. No credit card. Schedule your first job →

Cost math

The pipeline is cheap because each piece is doing exactly one thing and nothing else.

Firecrawl: roughly $0.005 per scrape on the Hobby plan. Twelve competitors at four sweeps a day is 48 scrapes per day, or about 1,440 per month. That works out to roughly $7 a month. Firecrawl's Free tier covers the first 500 credits, so the first 10 days are essentially free.
OpenAI gpt-4o-mini: a pricing page in markdown is around 2,000 input tokens and the JSON extraction is around 500 output tokens. At 2026 list prices that is roughly $0.001 per scrape. 1,440 scrapes per month is about $1.50.
Google Sheets: free. The API is free up to 300 reads per minute per project, which a six-hour cadence cannot get near.
Slack incoming webhook: free on every workspace plan that supports webhooks.
Crontap: the free tier covers 2 combined schedules + uptime monitors at hourly cadence. A six-hour cadence is well inside that. Pro starts at $2.99/mo at every-1-minute cadence, in case you also have other jobs running.

Total: about $8.50 a month, including Crontap Pro if you have other schedules. Clay's Growth plan starts at $185 a month and the credits to run this exact workflow run out fast. The DIY pattern is roughly a 95% saving for the specific job of "watch twelve pricing pages".

When this pattern is the wrong fit

Three cases where you should reach for something else.

The target site IP-blocks Firecrawl. Most public SaaS pricing pages are fine. A few aggressive marketplaces and review sites are not. If Firecrawl 403s on every retry, swap to a residential-proxy provider (Decodo, Bright Data, Smartproxy) and keep the rest of the pipeline. The cron and the schema do not change.

You need browser automation, not just a scrape. If the pricing tier you care about is hidden behind a "configure your seats" stepper that mutates the DOM, plain /scrape is not enough. Reach for Browse AI, Playwright on a hosted runtime, or Firecrawl's /extract with actions. Same Crontap schedule, different work in the handler.

The price is paywalled. If you need to log in to see the number, the project scope just doubled. You need credentials management, session cookies, and a much more careful ToS read. Stick to pages that are publicly published.

FAQ

Is this kosher under each competitor's terms of service?

Public pricing pages are the cleanest case for this tutorial: they are published precisely so prospects can read them, and you are reading them at the same cadence a curious prospect might. That said, check each target's robots.txt and ToS. If the site explicitly disallows automated access to /pricing, take it off the list. Marketplaces (Amazon, eBay) and review aggregators have stricter rules; this pattern is not for those.

How do I prevent GPT from hallucinating a price?

Three layers. First, structured outputs with strict: true force the response shape and the field types. Second, a known-plans allow-list rejects new plan names that the model invented or that the page actually added (you re-approve the list when a real new tier appears). Third, a numeric validator rejects monthly prices outside a sane range (1 to 10,000 USD). Together they catch every hallucination mode I have seen in production, with the trade-off that real product launches require a tiny manual update.

Can I run this every minute instead of every six hours?

Technically yes, on Crontap Pro. In practice the target sites' WAFs will notice and Firecrawl will start 429ing you. Pricing pages do not move that often; six hours is the cadence where the cost stays low, the alerts stay meaningful, and nobody's WAF gets unhappy. If you really need real-time, you are not in this pattern's lane.

Does this work for marketplaces like Amazon or eBay?

The mechanics work; the ToS friction is real. Marketplaces invest heavily in anti-scraping and the pricing data is often explicitly off-limits to automated access. This tutorial sticks to publicly-published SaaS pricing pages where the trade-off is clean.

What if I want a daily email digest instead of per-drift Slack alerts?

Same handler, different destination. Skip the diff and the Slack POST. Append all twelve extracts to a Sheets tab. Have a second Crontap schedule run once a day at 08:00 in your timezone, read the last 24 hours of rows, render an HTML table, send it through Resend. The pattern is identical to the weekly Stripe digest pattern, just with a different data source.

Why every 6 hours and not every 24?

Six hours catches a same-day price drop early enough to react before your sales team is asked about it. Twenty-four hours misses the morning standup window if the change shipped overnight. Six is the boring middle: cheap, frequent enough to be useful, infrequent enough that the WAFs do not notice. The cadence is one knob you tune to taste.

References

Firecrawl /scrape endpoint docs
OpenAI structured outputs guide
Clay's "automated web scraping" blog post (the "use cron jobs via Crontab" line)
r/smallbusiness "Tracking competitor prices" thread
Slack incoming webhooks

Related on Crontap

Automated data sync use case. The use-case-first guide for moving structured data between APIs on a clock.
Scheduled AI and LLM jobs use case. The pattern this post fits into: external clock fires a route that calls an LLM and writes the result somewhere durable.
Scheduled reports and email digests use case. For the variant where you want a daily digest instead of per-event Slack pings.
OpenAI scheduled jobs and sentiment pipeline. The first post in this AI-on-a-cron series: the same shape applied to a sentiment + theme batch.
A weekly Stripe disputes and refunds digest with GPT. The sibling pattern: a different data source, the same citation-forcing JSON schema trick, and the same one-cron-per-handler split.

Competitor price monitoring with AI: a Firecrawl + GPT + Slack cron, for $8.50/month

Why the existing solutions are awkward

The shape

Worked example: 12 competitors, 4 plans each

Step 1: scrape with Firecrawl

Step 2: extract with a strict JSON schema

Step 3: diff against the last seven runs

Step 4: post to Slack

Cost math

When this pattern is the wrong fit

FAQ

Is this kosher under each competitor's terms of service?

How do I prevent GPT from hallucinating a price?

Can I run this every minute instead of every six hours?

Does this work for marketplaces like Amazon or eBay?

What if I want a daily email digest instead of per-drift Slack alerts?

Why every 6 hours and not every 24?

References

Related on Crontap

Guides, patterns and product updates.

Introducing Crontap built-in uptime monitoring

UptimeRobot alternative for developers who already cron

Vercel Cron every minute: beating the Hobby hourly limit

Why your WordPress scheduled tasks are missing (and how to fix wp-cron)

Cloud Run cron without Cloud Scheduler

Heroku Scheduler alternative: any cron expression without the add-on

Shopify Admin API: recurring checkout sync via external HTTP cron

Running an OpenAI sentiment pipeline on a real scheduler

Cron syntax cheat sheet with real-world examples