You shipped a feature that runs OpenAI on user feedback. Sentiment cleanup, theme extraction, summary generation, all the usual suspects. The customer wants results within a few minutes of submission. Triggering on the user's session is the wrong shape: half the time the session is gone before the LLM call returns, and the other half you are paying for a long-running request that the user does not see. Spinning up a long-lived worker process is more infrastructure than you need for what is, at the end of the day, a periodic batch. The shape that actually fits is a scheduled HTTP route firing every few minutes against a small queue.
If you just want the short version: expose /openai/gpt-{task} routes in your backend, each one processing a small batch and returning 200, and let Crontap fire them every N minutes per task. You get steady throughput inside your OpenAI rate limits, per-task failure alerts, and a clear separation between the clock and the work, for $3.25 a month billed annually.
Why OpenAI batch work needs a clock, not a user session
A user session is a bad trigger for an LLM batch. The session lives for tens of seconds, the LLM round trip is multiple seconds, and most of the work you actually care about (cleanup, theme extraction, sentiment scoring) is asynchronous to the user. Tying it to a session creates two problems:
- Latency on the hot path. If the user submits feedback and the page waits for sentiment scoring before showing a thank-you state, you have shipped a UI that depends on a network call to OpenAI. The 95th-percentile latency on that page is now whatever your slowest GPT call is. You will get a support ticket about it within the week.
- Lost work on session drop. Browsers close, sessions expire, mobile apps background. Anything you started on session that did not finish, did not finish. Now you need a queue to track which submissions still need processing, and at that point you have already built half the scheduler.
The cleaner shape is to write the user submission to your database with a status: pending flag, return 200 to the user immediately, and let a separate process pick up the pending rows on a clock. The "separate process" is a small HTTP route. The "clock" is an external scheduler.
This is the same shape you would build for any periodic batch (refund reconciliation, cache warm, search index rebuild). LLM batches just happen to have rate-limit characteristics that make the cadence non-trivial.
Rate-limit realities
OpenAI's rate limits are token-and-request based per model, per organization. The exact numbers move as your tier moves, but the shape is the same: there is a requests-per-minute cap and a tokens-per-minute cap, and you blow past whichever one you hit first.
The naive shape ("run all pending sentiment cleanup right now") fails badly here. If you have 5,000 pending feedback rows and you fire 5,000 GPT calls in parallel, you are immediately rate-limited and most of those calls return 429. You retry, you retry harder, you are still rate-limited, your queue does not drain.
The shape that does work is the boring one: a small batch on a steady cadence. Pick a batch size that comfortably fits inside your TPM, run it on a cadence that comfortably fits inside your RPM, and let the queue drain at a predictable rate.
For sentiment cleanup with gpt-4o-mini, a working pattern is:
- Batch size: 50 to 200 rows per fire (depending on input length).
- Cadence: every 4 minutes.
- Throughput: roughly 750 to 3,000 rows per hour, with headroom under most rate-limit tiers.
You tune the numbers to your specific tier and prompt, but the principle is steady, small batches, on a regular clock. OpenAI's rate limits reward exactly that shape.
The scheduler pattern
The contract is one HTTPS POST per task, per fire. Crontap owns the clock; your backend owns the batch logic; OpenAI owns the inference.
Crontap (cron) → HTTPS POST → /openai/gpt-sentiment-cleanup → pull batch from DB → OpenAI → write results
Three boxes, each doing one thing. You can swap the LLM provider tomorrow without touching the scheduler. You can change the cadence in the dashboard without redeploying the backend. You can add a new task type by adding a new route, with no scheduler changes.
Step 1: Your backend exposes /openai/gpt-{task} endpoints
Each task is a route. Same auth pattern as any other internal HTTP endpoint: a bearer token that the scheduler stores and presents on every fire.
export async function POST(request: Request) {
const auth = request.headers.get("authorization");
if (auth !== `Bearer ${process.env.CRON_SECRET}`) {
return new Response("Unauthorized", { status: 401 });
}
const batch = await pendingFeedback({ limit: 100 });
if (batch.length === 0) {
return Response.json({ processed: 0, idle: true });
}
const results = await runSentimentCleanup(batch);
await writeBack(results);
return Response.json({
processed: results.length,
failed: results.filter((r) => r.error).length,
});
}
Three things to call out:
- The route returns 200 in the idle case (no pending rows). Crontap should not alert when there is genuinely nothing to do.
- The batch size is hard-coded inside the handler, not in the schedule. The scheduler tells you when to run; the handler tells you how much.
- The handler returns quickly. If your OpenAI call takes 30 seconds and you batch 100 rows, you are already at a 30-second response. That is fine for most schedulers but you do not want to push it to 5 minutes; if the work grows, split into smaller batches and run more often.
Step 2: Crontap fires them every N minutes
Each task is a separate schedule. Same backend, different URLs, different cadences.
For a sentiment + theme + summary pipeline, that looks like:
https://yourapp.com/openai/gpt-sentiment-cleanupevery 4 minutes.https://yourapp.com/openai/gpt-theme-extractionevery 10 minutes.https://yourapp.com/openai/gpt-summary-rebuildevery hour.
Each schedule has its own cadence, its own failure alert channel, and its own runbook. If sentiment cleanup is the noisy one, you can bump its cadence up or down without touching theme extraction. If theme extraction starts hitting rate limits, you can stretch its cadence and let the queue tolerate the wait.
Step 3: Each run processes a small batch and returns 200
The scheduler's job is to fire reliably. The handler's job is to drain the queue at a sustainable rate. These are two separate concerns and they should stay that way.
A run that processes a small batch and returns 200 is the right unit. It is small enough that a transient OpenAI failure (one bad request, one rate-limit hit) only affects 100 rows, not 5,000. It is short enough that a 502 from OpenAI does not block your scheduler. It is monitorable: you can graph "rows processed per fire" and immediately see backlogs.
If you find yourself wanting "process everything until the queue is empty" inside one handler call, that is the moment to switch from a scheduled HTTP route to a long-running worker. The scheduler pattern is for steady drains, not for catch-up jobs that span hours. (More on that in the Assistants API and Batch API section below.)
Worked example: sentiment cleanup every 4 minutes
A working customer pattern from the dataset is an AI feedback platform team running sentiment cleanup every 4 minutes against an OpenAI batch route. Numbers are in the 28-job range across one Crontap account, covering sentiment, theme extraction, summary rebuilds, and a handful of operational pulses.
The full setup is one route per task and one schedule per route.
The route /openai/gpt-sentiment-cleanup reads pending rows from the feedback table (limit 100), runs a single OpenAI call with structured outputs to score each row on sentiment, writes the result back, and returns 200 with { processed, failed }.
The Crontap schedule looks like:
- URL:
https://yourapp.com/openai/gpt-sentiment-cleanup - Method:
POST - Headers:
Authorization: Bearer <CRON_SECRET> - Cadence:
*/4 * * * *(every 4 minutes) - Timezone:
UTC(the work is timezone-independent) - Failure alert: email / webhook (Slack / Discord / Telegram), fire on 4xx/5xx
Across a 24-hour window that is 360 fires, each processing up to 100 rows, for a notional ceiling of 36,000 rows per day. Real traffic comes in well below that, which is the point: the cadence is sized to drain backlogs faster than they form.
When traffic spikes (a customer ships a survey to 50,000 users overnight), the queue grows ahead of the scheduler. The scheduler does not need to know about the spike; the next fire processes the next 100 rows and the queue drains over a few hours instead of being a fire drill.
Crontap's free tier is enough to run a handful of these schedules at minute cadence; Pro is $3.25/mo billed annually for unlimited schedules at every-1-minute cadence. The 4-minute cadence in this example is reachable on every Crontap tier. The wedge is not sub-minute throughput; the wedge is steady cadence, per-task failure alerts, and not running another scheduler service yourself.
Fix this in 60 seconds with Crontap. Free tier available. No credit card. Schedule your first job →
When to reach for Assistants API or Batch API instead
External cron is a shape, not a religion. OpenAI ships two primitives that cover specific cases better than a scheduled HTTP route:
OpenAI Batch API
The Batch API accepts a JSONL file of requests, processes them within 24 hours, and returns the results as a file. Pricing is half the synchronous rate. Use it when:
- The work is genuinely batchable as a single file (overnight backfills, full-corpus re-scoring, dataset cleanup before a model swap).
- You can tolerate up to 24-hour latency.
- Throughput matters more than freshness.
The Batch API is not a replacement for a 4-minute cadence. If your customers expect results within minutes of submission, the Batch API's window is too wide. If you also have an overnight catch-up job that processes a million rows, the Batch API is exactly the right tool for that job.
You can run both. A Crontap schedule fires every 4 minutes for the steady drain; a separate scheduled job submits a Batch API request every night for the bigger backfill. Same scheduler, different cadences, different OpenAI primitives.
OpenAI Assistants API
The Assistants API is for stateful, multi-turn agents with tool use. You hand it a thread, it manages the conversation, and it calls your tools when it needs to. Use it when:
- The work is conversational and stateful (a support agent, a research assistant).
- You need built-in tool calling, file search, code interpreter.
- You are comfortable with OpenAI managing the thread state.
For a periodic sentiment batch, Assistants is overkill. A single chat-completions call with structured outputs is cheaper and simpler. Reach for Assistants when you genuinely need a long-lived conversation with tools attached.
For a brief tour of an adjacent pattern (LLM brand monitoring on a Replicate-hosted Llama 3.1 model), see our AI brand monitor with Llama post. The shape there is the same (Crontap fires, the LLM does the work, the result lands somewhere) but the model is hosted differently and the prompt has different rate-limit characteristics.
Failure modes and what to do about them
Three failure modes show up in production. The scheduler should help you find each one fast.
OpenAI returned a 429. Your batch was too big or your rate-limit tier is too small. The handler should catch the 429, log the affected rows back to pending, and return a 5xx so Crontap pushes a failure alert. You then either drop the batch size or request a tier upgrade. The queue does not lose work because the rows are still pending.
OpenAI returned a 5xx. Transient infrastructure errors happen. Retry once with a short backoff inside the handler. If the retry also fails, return a 5xx and let Crontap alert. The scheduler will fire again in 4 minutes; the same rows will be retried then.
The handler timed out. Your OpenAI call took longer than the request timeout you have configured. Crontap will mark the schedule as failed (timeout = treat as 5xx) and alert. The fix is to drop the batch size so the round trip stays inside the timeout, or split the batch across multiple smaller routes.
In all three cases, the integration loop is the same: the route knows what is wrong, returns a useful status, Crontap delivers the alert. You spend time tuning the prompt or the batch size, not stitching alert plumbing together.
When to skip the scheduler entirely
A few cases where a scheduler is the wrong shape: one-off backfills (run from a script), truly real-time work where the user is waiting on the response (call OpenAI synchronously in the handler), and stateful conversations (use the Assistants API or your own thread management). For everything in between, the scheduled HTTP route is the boring shape that works.
FAQ
Can I use this with Anthropic, Mistral, or any other LLM provider?
Yes. The pattern is provider-agnostic. The route picks a provider, sends the batch, returns 200. Swap runSentimentCleanup to call Anthropic instead of OpenAI and the scheduler does not change.
Why every 4 minutes specifically?
It is the cadence the dataset's existing customer settled on for sentiment cleanup. The number is not magic; it is the cadence that drains their backlog faster than it builds up, while staying inside their OpenAI rate-limit tier. Yours might be every minute or every 15 minutes depending on volume and tier.
Crontap's minimum cadence?
Every 1 minute on Pro. The 4-minute cadence in the worked example is reachable on every Crontap tier including the free tier. Sub-minute is not the wedge here; steady cadence and per-task failure alerts are.
What about retries on individual rows?
Retries on individual rows live in your handler, not the scheduler. The scheduler retries the whole HTTPS call on 5xx; the handler retries individual OpenAI calls inside the batch. Two retry layers, two responsibilities.
Can I run this on Vercel or Cloud Run?
Yes. The route is just an HTTP endpoint; it runs anywhere your backend runs. See the related Vercel and Cloud Run posts for platform-specific deployment notes.
How do I monitor backlog depth?
Add a tiny /openai/queue-status route that returns the pending row count. Schedule it every 5 minutes against a healthcheck endpoint or graph it in your usual observability tool. If the count keeps climbing, your cadence is slower than your inflow.
References
Related on Crontap
- Scheduled AI / LLM jobs use case. The use-case-first guide for AI teams wiring up a scheduler.
- AI brand monitor with Llama 3.1 on Replicate. The same pattern with a different LLM provider and a different prompt shape.
- Cron syntax cheat sheet. Every cron expression you will reach for, with examples.
