How it works

From 35+ feeds to one coherent newsstand.

Every few minutes, we pull headlines from across the Nepali press, match them across languages, group them into stories, extract the people and places inside them, file each story under a subject and a location, and ship the result to the feed you see. Here’s each stage of that pipeline — with the trade-offs we made along the way.

Ingest

Normalize

Embed

Cluster

Entities

Summarise

Organise

Render

Ingestion

Pulling 35+ newsrooms, every few minutes

A scheduled fetcher runs continuously against the RSS feeds and public sitemaps of every outlet we track. Each run collects fresh headlines, URLs, publication timestamps, and a short excerpt — never the full article body.

Respectful crawling. We honour robots.txt and skip any outlet that disallows our User-Agent. We rate-limit per domain and back off on errors so we never act like a scraper.
Headlines only. We store title, canonical URL, timestamp, and an excerpt. Every item on the site links back to the publisher’s own page for the full read.
Deduplication. Canonical URLs and content hashes suppress the same story if a publisher re-pushes it under a slightly different link.

Are you a publisher?

We’re continually adding new Nepali and English publishers — if your newsroom publishes in the public interest and you’d like to be tracked, write in. We also honour takedown and delisting requests and will respond within 72 hours. Ownership or editorial-metadata corrections go to the same address.

Add my publication Request a takedown

Normalize

Two scripts, one schema

Devanagari and Latin headlines live side-by-side in the same database. Each incoming article is tagged with its language, attached to its publisher record (including ownership type, HQ, and editorial stance), and normalised so that केपी ओली and KP Oli are addressable as the same concept later on.

No transliteration, no auto-translation at this stage — both scripts are preserved exactly as the publisher wrote them. Script mixing happens only in downstream views, explicitly.

Embed

Every headline becomes a point in meaning-space

A multilingual embedding model converts every headline (plus its excerpt) into a 1,024-dimensional vector. The key property: two headlines about the same event land close together in this space — even if one is in Nepali and the other in English, and even if they share zero words.

Vectors are L2-normalised, so cosine similarity — the angle between two points — becomes a direct measure of how much two headlines are “about” the same thing. We store them in Postgres via pgvector with an HNSW index, so “find the nearest article to this one” is a millisecond query across the whole corpus.

Cluster

Grouping the coverage of one event

When a new article arrives, we search its embedding against every article from the last ten days. If the closest neighbour exceeds our similarity threshold (cosine ≈ 0.78), the new article joins that neighbour’s cluster. Otherwise it opens a new one.

This is a single-pass, online algorithm — cheap, incremental, and self-healing as more members join. A typical busy day yields 40–60 multi-publisher clusters out of roughly 1,000 headlines. Clusters with three or more distinct publishers get promoted to the Stories view — those are the pieces of news the whole press is talking about.

Edge cases (long transitive chains, near-duplicates, cross-event spillover) are surfaced in an internal diagnostics view so we can tune the threshold over time without changing the core loop.

Entities

People, places, organisations — surfaced, not guessed

ENTITY TAGS EXTRACTED FROM A HEADLINE

प्रधानमन्त्री केपी ओलीले सर्वोच्चको फैसलालाई स्वागत गरे

Prime Ministerperबालेनperनेपाल कांग्रेसorgSupreme CourtorgकाठमाडौंlocPokharalocUMLorg

PER = personORG = organisationLOC = place

Before anything AI-driven runs, we scan each headline against a curated registry of Nepali entities — politicians, parties, ministries, districts, companies, institutions — with every alias they’re known by in both scripts. Most matches resolve here, deterministically, for free.

Anything missed by the registry goes to a second pass that identifies named entities, disambiguates them against known records, and queues genuinely new names for operator review before they enter the registry. Role-sensitive entries (Prime Minister, Mayor of Kathmandu) are resolved against time-bounded assignments, so a story from last year points to the right person — not whoever holds the office today.

The result: every article carries a list of tagged entities, which powers the Topics page, entity profile pages, and the ability to filter the feed by who or what it’s about.

Summarise

A bilingual brief for every multi-publisher story

ENGLISH · AI

The Supreme Court reinstated the lower-house dissolution petition for a fresh hearing, six publishers reported. Parties welcomed the move but disagreed on the timeline.

cites 6 headlines · 4 publishers

नेपाली · AI

सर्वोच्च अदालतले प्रतिनिधिसभा विघटनको मुद्दा पुनः सुनुवाइका लागि दर्ता गरेको छ। छ वटा सञ्चारमाध्यमले समाचार दिए तर समय-तालिकामा मतभेद देखियो।

६ लेख · ४ प्रकाशक

Once a cluster has three or more headlines from different publishers, a scheduled job drafts a story summary in both Nepali and English and a single neutral headline for the cluster. The summaries use only facts that appear in the contributing articles — nothing is inferred or imported from background knowledge.

Summaries are labelled as AI-generated on every surface they appear.
Every summary cites only the articles that contributed to it; the source list is visible from the story page.
Nepali and English are generated at parity — neither is a translation of the other. Both are drafted directly from the source material.

Organise

Filing every story under a subject and a place

ONE STORY · FILED ON TWO AXES

Supreme Court reinstates the dissolution petition for a fresh hearing

SUBJECT

PoliticsprimaryEconomy & BusinessWorld

PLACE

NationalBagmatiKoshiGandaki

SINGLE PRIMARY SUBJECT · ONE RESOLVED LOCATION · AI ON THE SUMMARY

Once a story has its bilingual summary, a separate, lightweight pass files it into one primary subject — Politics, Economy & Business, Sports, Health, and so on across thirteen subjects — plus a few secondary subjects where a story genuinely spans more than one. It also resolves a single location: National, International, or one of Nepal’s seven provinces.

This is a different axis from the entity Topics above — those are the specific people, parties and places a story mentions; this is the broad shelf it belongs on. An AI reads the clean Nepali-and-English summary and makes the call; because it works off the tidy summary rather than raw articles, it’s a cheap, self-contained step that can be re-run over the whole archive when the subject list changes.

One subject spine. Each story has a single primary subject, so it isn’t scattered across half a dozen sections — a deliberate choice that keeps each subject’s page coherent.
Provincial browsing. Stories rooted in Koshi, Madhesh, Bagmati, Gandaki, Lumbini, Karnali or Sudurpashchim are grouped by province — a view of Nepal’s news you can’t get anywhere else.
Same quality bar. Only summarised, multi-source stories are filed and shown — the subject and place feeds carry the exact standard as the home page, never raw single-source headlines.

Browse a subject’s front page at Topics, pick a province under Places, or use the subject bar at the top of the home page. Each subject and place page ranks its stories by recency and how widely they were covered, twenty at a time. Like everything else, the headings switch between Nepali and English with the language toggle.

Render

Served as a reading surface, not a dashboard

PAST HOURNew stories arriving

1–3HMomentum picking up

3–12HTop clusters consolidating

EARLIERLong tail

182

TIME-BUCKETED · ROUND-ROBIN PUBLISHER MIX · BILINGUAL

The final stage is the site you’re on. Server-rendered pages, time-bucketed feed (past hour · past three hours · past twelve · earlier today · yesterday), round-robin publisher mix so no single publisher dominates a segment, bilingual language toggle, and the ownership badge on every card. Analytics via PostHog (no ad tracking), no account required.

Surfaces like Stories, Topics, and Publishers are different cuts of the same underlying pipeline — events, entities, and publishers, respectively.

Principles the pipeline follows

A few rules we don’t break

Headlines only
Full article bodies are never republished. Every item links out.
Ownership visible
Every headline carries its publisher's ownership type — state, private, independent, non-profit.
AI labelled
Summaries and entity extractions are tagged as AI-generated and cite their sources.
No invented facts
Summaries may only use facts that appear in contributing articles. No bridging, no inference.
Bilingual at parity
Nepali and English are first-class — neither is a translation of the other.
Outside Nepal
Infrastructure runs outside Nepal by design — a deliberate regulatory posture.

See it running

The pipeline is live right now.

Open the live feed, browse today’s clustered stories, explore trending topics, or meet the publishers we track.