We’ve just put something a little unusual online: an interactive map and reader of A Gazetteer of the World, a seven-volume reference book from 1856 that set out to describe, in alphabetical order, every place its compilers could find. Tens of thousands of towns, rivers, mountains, ruins, provinces and ports, each with a paragraph on where it is, what kind of place it is, who lives there and how many of them. We’ve turned those printed pages into data you can search, map, and read, all explorable in your browser with no server doing the work behind the scenes.
Before anything else: please treat it as a research demonstrator, not an authority. It was made automatically, end to end, and most of the map locations are currently wrong; the statistical tables, too, are only roughly rendered. More on why that’s interesting, rather than just embarrassing, below.
The book, and its rather elusive editor
A gazetteer is a geographical dictionary, and in the mid-nineteenth century they were serious undertakings. Ours was published in Edinburgh by A. Fullarton & Co. between 1850 and 1856, and it’s firmly in the public domain, which is what makes a project like this possible at all.
It’s also, charmingly, anonymous: the title page credits only “a Member of the Royal Geographical Society.” That member is now generally identified as George Godfrey Cunningham (c. 1802–1860), a Scottish writer, compiler and translator who was himself a partner in Fullarton, and who seems to have spent his career assembling other examples of this sort of vast reference work (he also produced an eight-volume Lives of Eminent and Illustrious Englishmen) and rendering German Romantic tales into English on the side. Beyond his memberships and a scatter of addresses across Scotland and England, remarkably little about him survives; the Gazetteer is reckoned his principal achievement, yet he put his name to none of it.
A gazetteer is never a neutral list, either. Cunningham’s world is the world as seen from mid-Victorian Britain, with all the imperial framing, uneven coverage and confident judgements that implies. That is worth keeping in mind, and it is also why the World Historical Gazetteer records sourced attestations rather than facts: an entry says “this source, at this date, called this place this, and placed it here”, not “this is the truth”. Cunningham’s 1856 view becomes one attestation among many.
How it was made (the short version)
Nobody typed any of this in. We started from public-domain page scans on HathiTrust and ran them through modern, layout-aware OCR (Surya) to turn the printed columns back into text. Then a large language model (the open Llama 3.3, with gpt-oss double-checking and Qwen3 repairing the cases it flagged) read each entry and pulled out structured facts: name, country, coordinates, population, and a feature type.
Those types are not free text. Each is drawn from the Getty Art & Architecture Thesaurus (AAT), a published controlled vocabulary in which every term has a stable web address, so a “river” or a “ruined city” carries an identifier that other datasets can point at. That is the idea behind Linked Open Data: shared identifiers instead of isolated labels, so the data can join up with the wider web rather than sitting in a silo. The statistical tables and the engraved plates were read by a vision model, Qwen2.5-VL.
Every place was then matched against the World Historical Gazetteer through its Reconciliation API, so the 1856 entry gets a modern location, and sometimes a boundary outline. All of the AI runs on our own machines at the University of Pittsburgh’s Center for Research Computing and Data: no per-token bills, nothing sent to a third party. The result is around 116,000 places, most of them linked (as-yet wrongly 😳) to a modern location, plus the tables and plates, served as a static website. If you want the gory detail, it is all on GitHub.
What this actually is, and isn’t
Here is the important part, and the reason we are writing it up rather than quietly shipping a demo. This is a scoping exercise, not a blueprint. It probes one possible strand of future WHG work, ingesting authoritative historical print gazetteers as reference data, and it is emphatically not a preview of “the WHG to come”. We built it to learn, on a deliberately awkward, large, genuinely historical source, where our current tools cope and where they don’t. And it threw back some genuinely useful failures.
Where it bumped into our reconciliation gaps
Matching a nineteenth-century place name to a modern gazetteer entry is hard, and we already knew automatic matching would never be perfect: it is an active area of work at WHG, and it improves as we fold more reference data into our indices. This experiment put a few specific gaps into sharp relief.
- Same name, wrong place. This was not a surprise so much as a confirmation. Where we cannot identify a suitable containment polygon (a parent region to match a place inside), or do not yet hold one in our indices, name similarity alone is a weak signal, and a confident-looking match is often a same-named place somewhere else entirely, occasionally on the far side of the planet.
- We were ignoring the coordinates the book hands us. Many entries print their own latitude and longitude. Once we checked the matches against those, well over half of the coordinate-bearing places sat hundreds (sometimes thousands) of kilometres from where the book puts them. So now, where coordinates exist, we trust them: we look for the best name match within a radius of the printed point, and otherwise leave the place located but explicitly unmatched rather than force a bad link.
- Stated region versus real coordinates. Cross-checking each entry’s printed coordinates against the region it claims to sit in flagged a lot of disagreement. Some of that is the ordinary drift between 1856 administrative geography and modern boundaries, but only some; the rest is genuine error worth surfacing.
None of these are solved here. They are surfaced here, which for a scoping exercise is exactly the point: each one translates fairly directly into a concrete improvement for reconciliation, such as stronger spatial priors, trusting coordinates when a source provides them, and treating “we know where it is but not what it is” as a proper, visible result.
Smaller worlds, deeper local knowledge
A single global gazetteer is an extraordinary feat of compilation, but its coverage is inevitably broad and uneven. Some of the most valuable historical place data comes instead from compilers with deep local knowledge. A favourite example, already in the WHG, is John Adams’s Index Villaris of 1680: an alphabetical table of some 24,000 cities, market towns, parishes, villages and private seats in England and Wales, each with its distance from London and a latitude and longitude that Adams worked out by triangulation. (Adams, an English barrister and surveyor, c. 1643–1690, never finished the wider survey it belonged to; there is more on him here.) Its precision and regional focus are exactly the qualities a worldwide gazetteer like Cunningham’s cannot match.
This is where we would welcome help. We are keen to find more sources of that kind: specialist, regionally-focused, authoritative print gazetteers that are out of copyright and available as PDFs, especially ones that would fill gaps in WHG’s current coverage. If you know your own corner of the world’s reference shelf, that local expertise is precisely what we are short of.
To get us started, my colleague Palak Vashist has put together a candidate bibliography of exactly this kind of source: public-domain print gazetteers (mostly nineteenth- and early twentieth-century, mostly available as scans on the Internet Archive, HathiTrust, the Library of Congress and the Digital Library of India), chosen for the gaps they could help fill. The list leans deliberately into South Asia (the Bombay, Bengal, Madras, Punjab, United Provinces, Central Provinces, Bihar & Orissa and Assam district series, the Imperial Gazetteer of India, Ceylon, Burma and the North-West Frontier), with a global comparator set spanning the Middle East, Africa, the Americas, Oceania and Eastern Europe. Each entry is graded against a selection rubric and tagged with a suggested next step, so the same pipeline used here can be pointed at any of them with relatively little new work. The full list, with sources and notes, is here; suggestions for additions or corrections are very welcome.
Have a look
The Gazetteer of the World Explorer is here. Search a place, wander the map, open a volume and read Cunningham’s prose with its plates set back in place. It is an early experiment and it shows, so do take it in that spirit: a first attempt to let a 170-year-old book speak to a modern index, with a great deal still to fix. We will have more to say as the reconciliation work it prompted takes shape.
Thanks to Humphrey Southall, whose nudge got this project started!



