BNOD

← Todas las plantillas
scrapinghnscrapecsv

Hacker News — top headlines to CSV

Open news.ycombinator.com, scrape the top story rows (title + url), download as CSV.

Instalar en BNOD

Instalar en BNOD

Opens BNOD sidepanel with this template installed. Requires BNOD extension.

Scrolling Hacker News every morning and copy-pasting interesting titles into a spreadsheet is the kind of busywork that nobody admits to doing, yet a lot of people do it. This template gives you a one-click way to grab the front page — every story title, every link — and save it as a CSV file you can open in Excel, Numbers, or import into a notebook. Useful if you write a weekly tech digest, track tone of voice on YC content, or just want a clean dataset of what got upvoted today.

How this workflow works

The workflow uses five blocks chained together in a straight line. No branches, no loops — it's the simplest possible scrape-and-export pipeline.

  1. manual_trigger — You hit Run in the sidepanel. The trigger is configured with targetTab: "new", so the workflow opens a fresh tab instead of hijacking the one you're reading.
  2. navigate — Loads https://news.ycombinator.com in that new tab.
  3. wait_for — Waits until at least one tr.athing row is visible. That's the CSS class HN uses on every story row, and it's been stable for years. matchFirst: true means the block resolves the moment the first row paints, not when all 30 are loaded.
  4. scrape_list — The actual extraction. It iterates over every tr.athing row and, for each one, reads two fields: title (from the .titleline > a text) and url (from the same anchor's href attribute). The result is an array of {title, url} objects available downstream as $('Scrape story rows').items.
  5. export_data — Takes that array, converts it to CSV, and triggers a browser download named hn-headlines.csv. The CSV header row is auto-generated from the field names you defined.

You'll see Chrome's native download bar appear with the file ready. No server round-trip, no clipboard juggling — the scrape happens inside the page you opened, in your own browser session.

Customising it for your case

A few changes you'll probably want to make.

Common gotchas

HN is one of the friendlier scraping targets — no JavaScript-heavy SPA, no Cloudflare gate, no rate limiting on the front page. But two things bite people. First, if you run the workflow back-to-back, you'll get the same 30 stories cached — the front page only refreshes every few minutes. Second, if you're logged into HN, your personal "hide" decisions affect what shows up; the scrape sees only the rows your account can see. Run in an incognito tab if you want the canonical front page.

FAQ

Do I need an API key for this? No. HN's front page is fully public HTML and no headers are required. If you'd rather hit their official API (hacker-news.firebaseio.com), see the json-feed-to-csv template instead — it's a better fit for clean JSON data.

Can I run this on a schedule? Yes. Swap the manual_trigger block for a schedule_trigger and pick a cron expression like 0 8 * * * (every day at 8 AM). The download still triggers in your active browser session, so the browser needs to be open.

Will this break if HN redesigns the page? Probably. The tr.athing and .titleline > a selectors have survived since 2019, but if YC ever ships a real redesign, you'll need to update the containerSelector and fields[].selector values. Automa users will recognise the pattern — same fragility, same fix.

Bloques utilizados

  • manual_trigger
  • navigate
  • wait_for
  • scrape_list
  • export_data

Funciona en

  • https://news.ycombinator.com/*
Instalar en BNOD

Free. No signup required.

Plantillas relacionadas