A NewsAPI.ai content pipeline that publishes to your own API

You want a steady stream of fresh, on-topic content flowing into your own product. A daily industry roundup on the page your users actually open, a curated brief in your app, draft article stubs your editors can polish instead of starting from a blank page. The content is out there, published by thousands of outlets every hour. The problem is getting it from "the news exists" to "a clean, summarized, attributed object sitting in my database" without it becoming a project.

Your options usually shake out to three, and none of them feel good. Hire a writer to read the news and type up a roundup every morning, which is a salary for something that is mostly summarizing. Build the pipeline yourself: a scraper or news API client, an LLM call to summarize, a publisher that writes to your CMS, plus the deploy and the monitoring and the on-call when it breaks. Or the one everyone actually does, which is paste a few headlines into ChatGPT each morning, copy the result, and hand-create the post. That last one is not a pipeline. It is a chore wearing a pipeline costume.

There is a fourth shape, and it does not need a backend service. A scheduled HTTP call to a real news API for your topic, an AI step that turns the raw articles into a publish-ready content object, and a POST to your own API. Crontap does the middle two for you.

Why the obvious fixes are awkward

Raw news APIs are firehoses, not editors. NewsAPI.ai (the API side of Event Registry) will happily hand you a JSON array of articles with title, body, URL, date, source, plus enriched metadata: recognized entities, topic categories, sentiment, a social score, a near-duplicate flag, and source ranking. That is exactly the rich signal you want, and it is also exactly not the shape your CMS wants. Your CMS wants a title, a slug, a summary, a body, some tags, and a list of sources. Somebody has to do the editorial step that turns one into the other.

The DIY version of that editorial step is three moving parts and a deploy. A fetch (with the API key in an env var), an LLM call (with its own SDK, retries, and prompt), and a publisher that authenticates to your CMS and writes a draft. Each part is a few dozen lines, which sounds fine until you are also writing the cron trigger, the failure alerting, and the runbook for the morning it returns an empty array. It is a service. Services have to be owned.

And generic RSS does not save you here. RSS gives you titles and a blurb, but not the entities, categories, sentiment, or duplicate detection that make a roundup feel curated rather than scraped. If you want "the five most important EV stories today, deduplicated, with the carmakers tagged," a plain feed cannot tell you which stories are important or that three of them are the same wire story reprinted by three outlets. NewsAPI.ai can.

Meet the pipeline

Three roles, one of which you do not have to build.

Daily schedule  →  GET NewsAPI.ai getArticles  →  AI transform (your prompt, JSON)  →  POST your API
   (your tz)          (topic, lang, sorted)          (articles  →  content object)         (/drafts or CMS hook)

NewsAPI.ai supplies the articles. Crontap's AI Integration does the editorial transform. Your own API receives the finished content and inserts a draft. There is no backend route to write, because the part that would have been your service (fetch, transform, forward) is the schedule plus its AI card.

The way the AI card works: on a Crontap schedule's form, next to the webhook "Integrations" card, there is an "AI Integrations" card. After the schedule makes its HTTP call, Crontap takes that run's response body, transforms it with a model using your plain-English prompt, and forwards the result to a URL you choose.

One constraint shapes everything else, so it is worth stating up front: the AI sees only this run's response body (truncated to roughly 100KB and handed over as untrusted, delimited data) plus a little run metadata (status code, ok or failed, duration, size). It has no tools, no browsing, no network of its own. It cannot call NewsAPI.ai itself. The Crontap schedule makes that call; the AI only reads the articles JSON that came back. So the design is: point the schedule at NewsAPI.ai, and let the AI edit whatever that one response contained.

Step 1: the NewsAPI.ai request

The endpoint you want is getArticles. It is a plain GET, which is the whole reason this works as a schedule URL:

https://eventregistry.org/api/v1/article/getArticles?resultType=articles&keyword=electric%20vehicles&lang=eng&articlesSortBy=date&articlesCount=8&articleBodyLen=600&includeArticleConcepts=true&includeArticleCategories=true&includeArticleSentiment=true&isDuplicateFilter=skipDuplicates&apiKey=API_KEY

Reading the query string left to right: resultType=articles asks for articles (not aggregates), keyword=electric vehicles is your topic (swap in conceptUri for a disambiguated entity, more on that in the FAQ), lang=eng keeps it English, articlesSortBy=date puts the newest first, and apiKey authenticates. The host is eventregistry.org, which is the same service behind newsapi.ai. There is a free API key, and the sandbox will generate the exact request (REST URL, Python, or Node) once you click your filters, so you can copy a known-good URL instead of hand-assembling one.

The response is a JSON object with an articles.results array. Here is a trimmed version of what lands, which is exactly what the AI will read:

{
  "articles": {
    "totalResults": 1180,
    "page": 1,
    "count": 8,
    "results": [
      {
        "uri": "8421550xxxx",
        "date": "2026-05-30",
        "dateTime": "2026-05-30T08:14:00Z",
        "url": "https://www.examplewire.com/ev/qx-battery-plant",
        "title": "QX Motors breaks ground on $2B battery plant",
        "body": "QX Motors said on Friday it had begun construction on a battery cell plant in Tennessee...",
        "isDuplicate": false,
        "source": { "uri": "examplewire.com", "title": "Example Wire", "ranking": { "importanceRank": 142 } },
        "concepts": [{ "label": { "eng": "Electric vehicle" }, "type": "wiki", "score": 96 }],
        "categories": [{ "label": "Business/Automotive", "wgt": 88 }],
        "sentiment": 0.31
      },
      {
        "uri": "8421551xxxx",
        "date": "2026-05-30",
        "dateTime": "2026-05-30T06:50:00Z",
        "url": "https://www.exampledaily.com/markets/ev-tax-credit",
        "title": "Revised EV tax credit clears committee",
        "body": "A revised version of the federal EV tax credit advanced out of committee on Thursday...",
        "isDuplicate": false,
        "source": { "uri": "exampledaily.com", "title": "Example Daily", "ranking": { "importanceRank": 89 } },
        "concepts": [{ "label": { "eng": "Tax credit" }, "type": "wiki", "score": 71 }],
        "categories": [{ "label": "Business/Automotive", "wgt": 74 }],
        "sentiment": 0.12
      }
    ]
  }
}

That is real signal (title, body, source, ranking, concepts, categories, sentiment, a duplicate flag) but it is not a blog post. Turning it into one is the next step.

Step 2: the Crontap schedule

Set the schedule's URL to that getArticles call, method GET, and pick a daily cadence in your timezone (say 6:30am, so the draft is waiting when your editors log in). That is the entire schedule config. Crontap fires it once a day, NewsAPI.ai returns the day's articles, and the response becomes the input to the AI card.

The one number to respect is the ~100KB body cap: Crontap truncates the response before handing it to the model, so a 50-article response with full bodies might get cut off. Reduce articlesCount if you hit that or trim each body with articleBodyLen=600 so you send a useful excerpt rather than entire articles. Only ask for the fields you actually need. A lean request keeps you comfortably under the cap and, conveniently, makes the AI step cheaper and faster too.

Step 3: the AI Integration (the editorial step)

This is where the raw articles become a content object. Open the AI Integrations card on the schedule and fill in three things.

Prompt. Plain English, describing the editorial job and the exact JSON you want back:

You are an editor. You are given a JSON response from a news API containing
recent articles on a topic (title, body excerpt, url, source, date, concepts,
categories, sentiment). Write a short, publish-ready roundup that summarizes
ONLY the articles provided. Do not add facts that are not in the articles.
Group near-duplicate stories into one item. Return JSON matching exactly:
{
  "title": string,
  "slug": string (kebab-case),
  "summary": string (max 2 sentences),
  "body_markdown": string (3 to 6 short paragraphs, no invented quotes),
  "tags": string[] (drawn from the articles' concepts/categories),
  "sources": [{ "title": string, "url": string }]
}
Every claim in body_markdown must trace to one of the sources you list.

Output format: JSON. The shape you described in the prompt comes back parsed, so your API receives an object, not a string it has to re-parse.

Forward to URL: your own API. A /drafts route, your CMS ingestion endpoint, or a headless CMS webhook. Whatever already knows how to create an unpublished post.

Leave "Also run on failure" off (you do not want a draft generated from a NewsAPI.ai error page), and decide on "Include schedule URL" deliberately, because the schedule URL contains your apiKey. If you would rather not forward your key, leave it off.

You can build and test all of this on any tier. The gate is only at Save: AI integrations are a Pro feature, and Crontap Pro starts at $2.99/mo with one AI integration per schedule at a minimum cadence of one day, which is exactly the daily roundup cadence. Ultra lifts that to unlimited integrations and a one-hour minimum. Hit "Perform test" first; you will see the content object the AI produces before you save or pay, so you can tune the prompt until the drafts read the way your editors want.

Fix this in 60 seconds with Crontap. Free tier available. No credit card. Schedule your first job →

Worked example: from articles to a draft

Take the two trimmed articles from Step 1 as the input. The AI, given the prompt above, returns this as aiOutput:

{
  "title": "EV roundup: a new battery plant and a revised tax credit",
  "slug": "ev-roundup-battery-plant-tax-credit",
  "summary": "QX Motors started building a $2B battery plant in Tennessee, and a revised federal EV tax credit cleared committee. Both point to a busier second half for the EV supply chain.",
  "body_markdown": "## Manufacturing\n\nQX Motors broke ground on a $2B battery cell plant in Tennessee, its first wholly owned cell facility. The move is aimed at reducing reliance on external suppliers.\n\n## Policy\n\nA revised version of the federal EV tax credit advanced out of committee, setting up a floor vote later this session.",
  "tags": ["electric-vehicles", "automotive", "policy"],
  "sources": [
    { "title": "QX Motors breaks ground on $2B battery plant", "url": "https://www.examplewire.com/ev/qx-battery-plant" },
    { "title": "Revised EV tax credit clears committee", "url": "https://www.exampledaily.com/markets/ev-tax-credit" }
  ]
}

Crontap wraps that object as aiOutput and POSTs an envelope to your forward URL. The envelope shape is fixed and small:

{
  "aiOutput": {
    "title": "EV roundup: a new battery plant and a revised tax credit",
    "slug": "ev-roundup-battery-plant-tax-credit",
    "summary": "QX Motors started building a $2B battery plant in Tennessee...",
    "tags": ["electric-vehicles", "automotive", "policy"],
    "sources": [ { "title": "QX Motors breaks ground on $2B battery plant", "url": "https://www.examplewire.com/ev/qx-battery-plant" } ]
  },
  "failed": false,
  "status": "Success",
  "statusCode": 200,
  "statusOk": true,
  "duration": 642,
  "durationUnit": "ms",
  "size": "71.4 kB",
  "verb": "POST",
  "goToUrl": "https://api.yourapp.com/drafts",
  "timestamp": 1748585400000,
  "url": "https://eventregistry.org/api/v1/article/getArticles?resultType=articles&keyword=electric%20vehicles&..."
}

In JSON mode aiOutput is the parsed object the model returned; the rest is run metadata. statusCode, duration, and size describe the NewsAPI.ai fetch, verb is the method Crontap used to forward (POST), goToUrl is your forward URL, timestamp is epoch milliseconds, and url is present only because you turned on "Include schedule URL."

On your side, the handler is a few lines. Read req.body.aiOutput, and insert a draft:

export async function POST(req) {
  const { aiOutput } = await req.json();
  if (!aiOutput?.title || !Array.isArray(aiOutput.sources)) {
    return Response.json({ error: "bad shape" }, { status: 400 });
  }
  await db.posts.insert({
    title: aiOutput.title,
    slug: aiOutput.slug,
    summary: aiOutput.summary,
    body: aiOutput.body_markdown,
    tags: aiOutput.tags,
    sources: aiOutput.sources,
    status: "draft", // never auto-publish
  });
  return Response.json({ ok: true });
}

No fetch, no LLM SDK, no scheduler in your codebase. Just a handler that validates and inserts a draft.

Be a responsible publisher

This is the part to take seriously, because "AI summarizes the news into my CMS" goes wrong fast if you skip it.

Summarize and attribute, do not republish. Use the articles to write your own short summary and always link back to the original source. Do not pass full article bodies through verbatim and publish them as your own. That is why the prompt asks for a summary plus a sources array, and why the request trims articleBodyLen instead of pulling whole articles.
Respect copyright and terms. Article text is the publisher's, and how you may reuse it is governed by their copyright and by NewsAPI.ai's terms. Summarizing with attribution is a much safer footing than reproducing, but it is not a blanket license; if you are commercial, read the source terms. The U.S. Copyright Office on fair use is a reasonable primer on why "I summarized and linked" is treated differently from "I copied the article."
Keep a human in the loop. Insert as a draft, never auto-publish. The AI can be confidently wrong, and a draft queue means a person reads it before your readers do. This is an assist for your editors, not a replacement for them.

Reliability and limits

A few things that keep this honest in production:

JSON mode returns parsed JSON, but validate anyway. The model returns an object, not a string, which is the nice part. Still, check the shape at your API (the handler above rejects anything missing a title or sources) so a malformed run does not write a broken draft.
There is no cross-run memory. The AI sees one run's response and nothing else, so it cannot remember what it published yesterday and will happily re-summarize a story that is still in today's results. Three ways to avoid duplicate posts: add dateStart/dateEnd (or rely on articlesSortBy=date) to the request so you only pull recent articles, use NewsAPI.ai's isDuplicateFilter=skipDuplicates to drop reprints within a single response, and dedupe by source URL at your API before inserting. The schedule stays dumb; your API stays idempotent.
Cadence matches the tier. A daily roundup fits Pro's one-day minimum exactly. If you want an hourly pull (a breaking-news ticker rather than a morning roundup), that one-hour minimum lives on Ultra.

Here is how the approaches stack up:

Approach	Editorial step	Infra you own	Rich metadata	Cost shape
Hire a writer	Human	None	n/a	Salary
DIY fetch + LLM + publisher	Your code	A service + deploy + on-call	If you wire it	Hosting + tokens + your time
Generic RSS into CMS	None (raw items)	A parser	No (titles only)	Cheap, low value
NewsAPI.ai + Crontap AI Integration	The AI card	Just your `/drafts` handler	Yes (concepts, categories, sentiment, dedup)	NewsAPI.ai pay-per-use + Crontap Pro

FAQ

Which topics or queries can I pull?

Anything getArticles supports. Use keyword for a plain phrase ("electric vehicles", "interest rates"), and switch to conceptUri when you want a disambiguated entity rather than a string match (so "Apple" the company, http://en.wikipedia.org/wiki/Apple_Inc., not the fruit). You can also filter by categoryUri, sourceUri, sourceLocationUri, and lang. The sandbox generates the exact URL for whatever combination you click.

Will the AI invent facts?

It can, which is why the prompt and the workflow are built to prevent it. The prompt says summarize only the provided articles and trace every claim to a listed source, the model only ever sees the articles in that one response (it has no browsing to wander off and confabulate), and the output is inserted as a draft a human reviews. Grounding plus human review is the safety net; do not auto-publish.

How big can the response be?

Crontap truncates the response to roughly 100KB before the model reads it. For a roundup that is plenty if you keep the request lean: cap articlesCount (8 to 15), trim bodies with articleBodyLen, and request only the fields you need. A bloated request gets cut off mid-array and the AI only sees the front of it.

How do I avoid duplicate posts?

Three layers, use as many as you need: pull only recent articles with dateStart/dateEnd and articlesSortBy=date, drop reprints inside a single response with isDuplicateFilter=skipDuplicates, and dedupe by source URL at your API before inserting. There is no cross-run memory, so the dedupe lives in the query and in your handler, not in the AI.

What are the cadence limits?

A daily roundup is Pro (one-day minimum cadence, one AI integration per schedule). Hourly pulls need Ultra (one-hour minimum, unlimited integrations). You can build and run a "Perform test" on any tier, including free; the gate is only at Save.

References

NewsAPI.ai. The product home, where you get a free API key.
NewsAPI.ai documentation and sandbox. Endpoint reference plus a sandbox that generates the exact getArticles request.
Event Registry Node.js SDK. The official client if you ever want to build the request in code (Python SDK here).
U.S. Copyright Office on fair use. Why summarizing with attribution sits differently from copying.

Related on Crontap

Introducing AI Integrations. The full tour of the card, the prompt, the output formats, and the forwarded envelope.
Scheduled AI / LLM jobs use case. The use-case-first guide for running models on a clock.
Automated data sync use case. Moving data from one API into your own systems on a schedule.
Feed to a morning brief. The lighter cousin of this post: a feed condensed into a brief instead of a CMS draft.
Normalize API responses before Zapier, Make, n8n. The JSON-shaping pattern, if your forward target is an automation platform.

Fix this in 60 seconds with Crontap. Free tier available. No credit card. Schedule your first job →

A NewsAPI.ai content pipeline that publishes to your own API

Why the obvious fixes are awkward

Meet the pipeline

Step 1: the NewsAPI.ai request

Step 2: the Crontap schedule

Step 3: the AI Integration (the editorial step)

Worked example: from articles to a draft

Be a responsible publisher

Reliability and limits

FAQ

Which topics or queries can I pull?

Will the AI invent facts?

How big can the response be?

How do I avoid duplicate posts?

What are the cadence limits?

References

Related on Crontap

Guides, patterns and product updates.

Introducing AI Integrations

Introducing Crontap built-in uptime monitoring

UptimeRobot alternative for developers who already cron

Vercel cron jobs: the Hobby once-per-day limit and how to beat it

Why your WordPress scheduled tasks are missing (and how to fix wp-cron)

Cloud Run cron without Cloud Scheduler

Heroku Scheduler alternative: any cron expression without the add-on

Shopify Admin API: recurring checkout sync via external HTTP cron

Running an OpenAI sentiment pipeline on a real scheduler

Cron syntax cheat sheet with real-world examples