A whole world, alphabetically: an 1856 gazetteer, read by 2026 tools

We’ve just put something a little unusual online: an interactive map and reader of A Gazetteer of the World, a seven-volume reference book from 1856 that set out to describe, in alphabetical order, every place its compilers could find. Tens of thousands of towns, rivers, mountains, ruins, provinces and ports, each with a paragraph on where it is, what kind of place it is, who lives there and how many of them. We’ve turned those printed pages into data you can search, map, and read, all explorable in your browser with no server doing the work behind the scenes.

Before anything else: please treat it as a research demonstrator, not an authority. It was made automatically, end to end, and most of the map locations are currently wrong; the statistical tables, too, are only roughly rendered. More on why that’s interesting, rather than just embarrassing, below.

The book, and its rather elusive editor

A gazetteer is a geographical dictionary, and in the mid-nineteenth century they were serious undertakings. Ours was published in Edinburgh by A. Fullarton & Co. between 1850 and 1856, and it’s firmly in the public domain, which is what makes a project like this possible at all.

It’s also, charmingly, anonymous: the title page credits only “a Member of the Royal Geographical Society.” That member is now generally identified as George Godfrey Cunningham (c. 1802–1860), a Scottish writer, compiler and translator who was himself a partner in Fullarton, and who seems to have spent his career assembling other examples of this sort of vast reference work (he also produced an eight-volume Lives of Eminent and Illustrious Englishmen) and rendering German Romantic tales into English on the side. Beyond his memberships and a scatter of addresses across Scotland and England, remarkably little about him survives; the Gazetteer is reckoned his principal achievement, yet he put his name to none of it.

A gazetteer is never a neutral list, either. Cunningham’s world is the world as seen from mid-Victorian Britain, with all the imperial framing, uneven coverage and confident judgements that implies. That is worth keeping in mind, and it is also why the World Historical Gazetteer records sourced attestations rather than facts: an entry says “this source, at this date, called this place this, and placed it here”, not “this is the truth”. Cunningham’s 1856 view becomes one attestation among many.

How it was made (the short version)

Nobody typed any of this in. We started from public-domain page scans on HathiTrust and ran them through modern, layout-aware OCR (Surya) to turn the printed columns back into text. Then a large language model (the open Llama 3.3, with gpt-oss double-checking and Qwen3 repairing the cases it flagged) read each entry and pulled out structured facts: name, country, coordinates, population, and a feature type.

Those types are not free text. Each is drawn from the Getty Art & Architecture Thesaurus (AAT), a published controlled vocabulary in which every term has a stable web address, so a “river” or a “ruined city” carries an identifier that other datasets can point at. That is the idea behind Linked Open Data: shared identifiers instead of isolated labels, so the data can join up with the wider web rather than sitting in a silo. The statistical tables and the engraved plates were read by a vision model, Qwen2.5-VL.

Every place was then matched against the World Historical Gazetteer through its Reconciliation API, so the 1856 entry gets a modern location, and sometimes a boundary outline. All of the AI runs on our own machines at the University of Pittsburgh’s Center for Research Computing and Data: no per-token bills, nothing sent to a third party. The result is around 116,000 places, most of them linked (as-yet wrongly 😳) to a modern location, plus the tables and plates, served as a static website. If you want the gory detail, it is all on GitHub.

What this actually is, and isn’t

Here is the important part, and the reason we are writing it up rather than quietly shipping a demo. This is a scoping exercise, not a blueprint. It probes one possible strand of future WHG work, ingesting authoritative historical print gazetteers as reference data, and it is emphatically not a preview of “the WHG to come”. We built it to learn, on a deliberately awkward, large, genuinely historical source, where our current tools cope and where they don’t. And it threw back some genuinely useful failures.

Where it bumped into our reconciliation gaps

Matching a nineteenth-century place name to a modern gazetteer entry is hard, and we already knew automatic matching would never be perfect: it is an active area of work at WHG, and it improves as we fold more reference data into our indices. This experiment put a few specific gaps into sharp relief.

  • Same name, wrong place. This was not a surprise so much as a confirmation. Where we cannot identify a suitable containment polygon (a parent region to match a place inside), or do not yet hold one in our indices, name similarity alone is a weak signal, and a confident-looking match is often a same-named place somewhere else entirely, occasionally on the far side of the planet.
  • We were ignoring the coordinates the book hands us. Many entries print their own latitude and longitude. Once we checked the matches against those, well over half of the coordinate-bearing places sat hundreds (sometimes thousands) of kilometres from where the book puts them. So now, where coordinates exist, we trust them: we look for the best name match within a radius of the printed point, and otherwise leave the place located but explicitly unmatched rather than force a bad link.
  • Stated region versus real coordinates. Cross-checking each entry’s printed coordinates against the region it claims to sit in flagged a lot of disagreement. Some of that is the ordinary drift between 1856 administrative geography and modern boundaries, but only some; the rest is genuine error worth surfacing.

None of these are solved here. They are surfaced here, which for a scoping exercise is exactly the point: each one translates fairly directly into a concrete improvement for reconciliation, such as stronger spatial priors, trusting coordinates when a source provides them, and treating “we know where it is but not what it is” as a proper, visible result.

Smaller worlds, deeper local knowledge

A single global gazetteer is an extraordinary feat of compilation, but its coverage is inevitably broad and uneven. Some of the most valuable historical place data comes instead from compilers with deep local knowledge. A favourite example, already in the WHG, is John Adams’s Index Villaris of 1680: an alphabetical table of some 24,000 cities, market towns, parishes, villages and private seats in England and Wales, each with its distance from London and a latitude and longitude that Adams worked out by triangulation. (Adams, an English barrister and surveyor, c. 1643–1690, never finished the wider survey it belonged to; there is more on him here.) Its precision and regional focus are exactly the qualities a worldwide gazetteer like Cunningham’s cannot match.

This is where we would welcome help. We are keen to find more sources of that kind: specialist, regionally-focused, authoritative print gazetteers that are out of copyright and available as PDFs, especially ones that would fill gaps in WHG’s current coverage. If you know your own corner of the world’s reference shelf, that local expertise is precisely what we are short of.

To get us started, my colleague Palak Vashist has put together a candidate bibliography of exactly this kind of source: public-domain print gazetteers (mostly nineteenth- and early twentieth-century, mostly available as scans on the Internet Archive, HathiTrust, the Library of Congress and the Digital Library of India), chosen for the gaps they could help fill. The list leans deliberately into South Asia (the Bombay, Bengal, Madras, Punjab, United Provinces, Central Provinces, Bihar & Orissa and Assam district series, the Imperial Gazetteer of India, Ceylon, Burma and the North-West Frontier), with a global comparator set spanning the Middle East, Africa, the Americas, Oceania and Eastern Europe. Each entry is graded against a selection rubric and tagged with a suggested next step, so the same pipeline used here can be pointed at any of them with relatively little new work. The full list, with sources and notes, is here; suggestions for additions or corrections are very welcome (please email Palak at PAV82@pitt.edu).

Have a look

The Gazetteer of the World Explorer is here. Search a place, wander the map, open a volume and read Cunningham’s prose with its plates set back in place. It is an early experiment and it shows, so do take it in that spirit: a first attempt to let a 170-year-old book speak to a modern index, with a great deal still to fix. We will have more to say as the reconciliation work it prompted takes shape.

Thanks to Humphrey Southall, whose nudge got this project started!

What’s New in the WHG Index

47 million places, 67 million toponyms, and a phonetic search engine that works across scripts.

By Stephen Gadd, WHG Technical Director


The World Historical Gazetteer helps researchers, educators, and students discover how places connect across time, language, and culture. This post describes the most substantial infrastructure change since the platform launched: a full rebuild of the reconciliation index, a new phonetic search capability, and an automated clustering system that links place records across independent gazetteers.

Infrastructure: University of Pittsburgh CRC

The new system runs on dedicated infrastructure provided by the Center for Research Computing (CRC) at the University of Pittsburgh, replacing the previous single-server deployment. The CRC environment provides the compute and storage capacity needed to index and serve tens of millions of records — including the GPU resources used to train and run phonetic embedding models.

47 Million Places, 67 Million Toponyms

The previous WHG reconciliation index drew on two sources (GeoNames and Wikidata) and contained approximately 13.6 million records. The new index incorporates authority data from six major global gazetteers:

SourcePlacesDescription
OpenStreetMap~15 millionCrowdsourced global mapping data
GeoNames~12 millionThe world’s largest open geographical database
Wikidata~8 millionCommunity-curated structured knowledge base
Getty TGN~3 millionThe Thesaurus of Geographic Names, with substantial historical depth
Pleiades~37,000Gazetteer of the ancient Mediterranean world
Library of CongressExtensiveGeographic authority records

The total distinct place count is approximately 47 million. More importantly, the index now contains approximately 67 million toponyms — the individual name forms by which those places are or have been known, across languages, scripts, and historical periods. Each toponym is linked to its source places and carries a phonetic embedding (see below), making it possible to search not just by exact string but by sound.

Symphonym: Phonetic Search Across Scripts and Centuries

A persistent difficulty in historical gazetteer work is that the same place may appear under many different names: transliterated into different scripts, adapted to different phonologies, abbreviated, or simply spelled according to conventions that are centuries out of date. Standard text search can match “Florence” but will miss “Firenze”; it can find “Constantinople” but not “Konstantiniyye” or “قسطنطنية”.

Symphonym is a phonetic search system developed for WHG that addresses this problem. Every toponym in the index is converted into a fixed-dimensional phonetic embedding — a vector representation of how the name sounds, derived from Grapheme-to-Phoneme (G2P) conversion and articulatory phonetic feature extraction. Names that sound similar end up close together in embedding space, regardless of script or orthography. A search for “Konstantiniyye” will retrieve “Constantinople” and “قسطنطنية”; “Firenze” will match “Florenz”; “Stamboul” will surface alongside “Istanbul” and “İstanbul”.

This is particularly valuable for work with archaic and historical spellings. Researchers working with early modern catalogues, medieval charters, colonial-era maps, or any primary source material will encounter place names in spellings that no longer appear in modern gazetteers. Symphonym’s phonetic matching can bridge this gap: variant historical spellings like “Lipsick” or “Venedig” can be matched to their standard forms (“Leipzig”, “Venezia”) on the basis of phonetic proximity. This enables the enrichment, linking, and geolocation of catalogue descriptions and historical documents that would otherwise require extensive manual identification.

Note that phonetic search finds names that sound alike — it does not resolve etymologically unrelated names for the same place (e.g. “Eboracum” and “York”, or “Thessaloniki” and “Solun”). Those connections are established through other signals in the clustering pipeline, such as authority cross-references and spatial co-occurrence.

Automated Clustering Across Gazetteers

When the same physical location is described independently by GeoNames, Wikidata, TGN, and Pleiades — each with their own identifiers and naming conventions — determining which records refer to the same place is a non-trivial problem. The new system includes an automated clustering pipeline that combines multiple signals:

  • Explicit authority cross-references (e.g. sameAs links between Pleiades and GeoNames)
  • Exact toponym co-attestation — places in different gazetteers sharing the same name string, filtered by spatial proximity and country-code overlap
  • Phonetic similarity between toponyms (via Symphonym embeddings), with thresholds calibrated automatically from the authority hard links
  • Spatial proximity of coordinates
  • Feature type alignment across classification systems

The pipeline runs in four phases, from high-confidence explicit links through to phonetic similarity matching. Thresholds for the phonetic phase are not set manually but are learned from the authority hard links themselves: the system samples known-same and known-different place pairs, computes their phonetic and spatial signals, and fits a logistic regression to determine optimal similarity and distance cutoffs. In the most recent run, this calibration yielded a cosine similarity threshold of 0.79 and a spatial distance threshold of 5 km — substantially tighter than the initial manual defaults.

The result is a set of approximately 7 million clusters grouping 19 million of the 47 million place records. Each cluster represents a single real-world location as attested across multiple gazetteers. For users reconciling their own datasets, this means a search can return a single grouped result for a location rather than a confusing set of separate entries from different sources.

Importantly, the clustering algorithm is designed to be adaptive. Users can assert that particular place records do not belong in a given cluster, and these assertions feed back into the system, improving clustering quality over time.

Clustering also unlocks richer contextual information. When a place record from one authority (e.g. GeoNames or TGN) is clustered with a Wikidata record, the system can follow Wikidata’s links to retrieve supplementary data from Wikipedia — descriptions, images, and other reference material — and present it alongside the place. This means that a search result can surface Wikipedia content even when the original matching authority has no such links itself.

Applications

  • Search retrieves results across scripts, languages, and historical spelling variants
  • Reconciliation matches uploaded data against a substantially larger and more diverse authority base than before
  • Data linking connects user places to the broader Linked Open Data ecosystem via clustered authority identifiers
  • Catalogue enrichment — institutions holding historical documents with place references can use phonetic matching to identify, link, and geolocate those references against modern authority records

Data Architecture

The underlying data model separates places from names (toponyms) and tracks which source attests which name at which point in time. This structure — built on Normalised Place Records, Toponyms, and Attestations — reflects the scholarly reality that place identity is complex and historically contingent.

The current indexing and clustering system, which runs batch computations over Elasticsearch, is the first major step towards a graph-based architecture in which pairwise links between place records are stored as edges and cluster membership is resolved by graph traversal at query time. Under this model, batch clustering becomes unnecessary: clusters can be computed on-the-fly for any query, and users can adjust confidence thresholds interactively (e.g. “show me only high-confidence links” vs “include tentative matches”).

The graph architecture also enables a fundamental shift in how contributions work. Rather than uploading datasets of place records and reconciling them after the fact, the predominant form of contribution becomes the attestation: a name, date, source reference, or classification attached to an existing place in the index. Contributors find the place and attach their evidence to it; new place identities are minted only when no existing record matches. This attestation-centric model better reflects scholarly practice — researchers typically have evidence about known places, not inventories of new ones — and the dense authority backbone of 47 million indexed places makes it practical. (See the design discussion for further detail.)

The new indexing and clustering system will be rolled out progressively. Updates will be posted at whgazetteer.org and on the documentation site.

New Published Datasets!

We’re excited to share the following newly published datasets from WHG:

The Belgian Historical Gazetteer – Provinces Antwerp and East-Flanders dataset brings together historical place names from the reduced cadastre (gereduceerd kadaster) of Belgium (1847–1855), focusing on the provinces of Antwerp and East Flanders. Contributed by Léa Hermenault, it forms part of the wider Belgian Historical Gazetteer Project (CLARIAH-VL+ and the University of Antwerp).

Cliopatria – A geospatial database of world-wide political entities from 3400BCE to 2024CE, a comprehensive open-source geospatial dataset of worldwide states, political groups, events, and rulers from 3400BCE to 2024CE. Presently it comprises over 1600 political entities sampled at varying timesteps and spatial scales. This dataset is edited by Ed Chalstrey, James Bennett, and Erin Mutch and was converted into Linked Places Format by Stephen Gadd.

La sfera_ (_The Globe_), written by the Florentine merchant, Goro Dati, is a textbook designed to introduce the next generation of Florentine merchants to natural phenomena, navigation, and the topography of the Mediterranean. Dati’s Globe (La sfera) is a dataset of places from Book IV which contains an itinerary of and maps major Mediterranean and Black Sea ports.

WHG is always excited to welcome new contributions. If you’re interested in working with us, we’d love to hear from you!

WHG Creates Video for “What’s in a Name? ​Exploring Place Names as Forms of Social and Geographic Storytelling”

The World Historical Gazetteer was invited to contribute an instructional video as part of a set of online modules and classroom resources focused on exploring place names as meaningful forms of social and geographic storytelling. The instructional video was created for the Tennessee Geographic Alliance as an extension of “What’s in a Name? Exploring Place Names as Forms of Social and Geographic Storytelling,” an interactive workshop that invited educators to explore the profound social, historical, and political meanings behind the names attached to places.

The video includes an introduction to the World Historical Gazetteer website and its features, presented by Ruth Mostern; highlights of datasets in the WHG index that reflect the contested nature of place names, presented by Palak Vashist; and a tutorial on creating your own collection of places in the WHG, presented by Ali Straub.

Other videos in the series explore the power of place name repatriation, tools for analyzing place names in social texts, and the relationship between names, identity, and memory.

Explore the videos here!

ISHI at Linked Pasts Symposium 11

On December 9, 2025, members of the ISHI and World Historical Gazetteer (WHG) teams led a session at Linked Pasts Symposium 11: “Linking Knowledge Through Place: ISHI, WHG, and the Future of Gazetteer Collaboration.” The Linked Pasts Symposium is a goal-oriented forum focused on building, planning, and learning about the application of linked open data (LOD) to historical texts, events, and datasets.

During the event, Ruth Mostern, ISHI Director and WHG Project Director, introduced ISHI and its new role in the Pelagios Place Activity (formerly the Gazetteer Alignment Activity). Stephen Gadd, WHG Technical Director, presented recent updates to the World Historical Gazetteer, including ORCID-based authentication, new Entity and Reconciliation Service APIs, and improved reconciliation with Wikidata. Ali Straub and Palak Vashist then discussed new and improved contributor documentation, including a more user-friendly LP-TSV template and a Submission Readiness Checklist that outlines the steps and criteria required to successfully publish a dataset on the WHG.

After the presentations, an open discussion addressed challenges in modeling historical places from uncertain data, WHG’s plans for a graph data model in its next version, its curation strategy and current content gaps, and opportunities for partnerships and capacity building. Participants also expressed strong interest in developing a centralized portal or clearinghouse for syllabi, courses, and training materials related to spatial history and gazetteers.

View the session agenda and discussion summary here. 

WHG Transitions to ORCiD Authentication

The World Historical Gazetteer will now require authentication via ORCiD (Open Researcher and Contributor ID) for all registered users. The use of ORCID will ensure accurate attribution, persistent researcher identity, interoperability, and scholarly credit, while also maintaining accessibility beyond institutional affiliations, within the linked open data ecosystem. Using ORCiD also streamlines account access by removing the need to manage passwords or depend on third-party login services. Users with existing ORCiD accounts can sign in seamlessly and securely. We’ve adopted ORCiD so that users can benefit from a more secure, flexible, and research-friendly system. 

The use of ORCiD will also enable secure and controlled access to the WHG’s APIs, including two new complementary APIs for use with WHG data. The Entity API will allow users to retrieve full metadata, names, types, geometries, temporal bounds, authority info, and linked resources from among datasets, collections, and over 2.2 million places. The Reconciliation Service API will allow users to match historical geographical entities with the WHG data in both automated workflows and manual tools such as OpenRefine.

Our transition to ORCID authentication was made possible thanks to a collaboration with the University of Pittsburgh Library System (ULS).

What This Means for Users 

  • Existing WHG users can link their WHG account with an ORCiD by using the Legacy WHG Login and Link ORCiD button on whgazetteer.org/accounts/login/.
  • If you are a new user who already has an ORCiD, you can simply log in using your existing ORCid credentials.
  • If you are a new user who does not have an ORCiD, you can create one during sign-in on whgazetteer.org/accounts/login/.
  • Registration is NOT required to search WHG’s indexes or to view datasets or collections. Registration is required only for dataset contribution, the creation of collections, or use of our APIs. With your consent, it would also allow us to let you know about important updates and new features.

A Message about WHG Technical Director Karl Grossner’s Retirement

After more than seven years of dedicated work on the World Historical Gazetteer (WHG), Technical Director and Lead Developer Dr. Karl Grossner has announced his retirement from the project team. Karl has been instrumental in all aspects of envisioning, guiding, and building the WHG into groundbreaking digital infrastructure that includes a spatially and temporally referenced index of world historical place names and a linked data ecosystem. Karl has led the development of the platform through three versions, the most recent of which indexes over 3.4 million place names. 

Karl’s contributions have gone far beyond technical expertise. He has taken a leading role in setting the vision for the project, building a collaborative and robust community of scholars who work with linked open geodata, and soliciting and developing the content that we index. His dedication, expertise, and commitment have been fundamental to the project’s success and evolution. We are grateful that Karl remains committed to the success of the WHG and that he will continue contributing actively to it in his retirement. You can keep up with Karl’s continued work on his X account: @kgeographer. We’ve posted a statement from Karl on the WHG blog here : http://blog.whgazetteer.org/2024/07/26/a/

Karl’s accomplishments ensure that the WHG has a bright future. We will continue improving the platform, growing the community, and expanding the index of named places. We are pleased to announce that Dr. Stephen Gadd, a scholar of early modern economic history, has transitioned into the role of lead developer. Stephen has worked closely with Karl over the past year and shares a passion for the WHG and the linked open data community. In the coming months, Stephen and the project team will continue enhancing and extending the platform’s features including the API and the reconciliation process, accessioning historical place datasets, and building our community.

We hope that you will stay in touch during this transition and that you will join us in expressing gratitude and esteem to Karl and sharing good wishes for his future. Please use the contact form under About on whgazetter.org  to contact the project team. 

Ruth Mostern, Principal Investigator and Project Director

Stephen Gadd, Lead Developer

Alexandra Straub, Project Manager

Linked Traces progress

As described in our last post, the World-Historical Gazetteer (WHG) and Pelagios projects have adopted the term “trace” to refer to historical entities for which there is spatial-temporal data of interest, including events, people, works, and other artifacts. Following the lead of Pelagios’ Peripleo, the WHG system (initial beta release July 2019) will index contributed trace data, linking them to places in an underlying knowledge graph that is a) navigable in graphical features, and b) queryable in an API.

We (WHG and Pelagios) have set out to create a standard Linked Traces data format (LTAnno), which will take the form of W3C web annotations. We welcome (need, actually) active collaboration in that modeling task, and feedback from interested observers.

An LTAnno target is an LOD web-published record of some entity, and its body contains a) a place record URI, b) a relation between the place and the target entity, and c) an optional temporal scoping for each. It should be possible to have multiple bodies per target (per example below) and multiple targets per body (e.g several people having the same birthplace, several works having the same place as subject, and so on).

A Trace Data Example

WHG will fully support LTanno format, and likely focus on a few types of trace data, including those related to geographic movement such as journeys and cultural diffusion. Figures 1 and 2 illustrate a test Journey record using the draft LTAnno format. In it, 38 annotation bodies referring to WHG place URIs are linked to a single target, the WorldCat record for the source of the waypoints, “Xuanzang: a Buddhist pilgrim on the Silk Road” (Wriggins, 1997). A user finding their way to any of the 38 places will learn they were waypoints on the journey, be able to see the others, and to navigate to their respective place pages. This only scratches the surface of what will be possible, given a growing volume of trace data for other events, people, and works.

Figure 1 – Portal page for Bamiyan in World-Historical Gazetteer (in development)

Figure 2 – Each waypoint for the Journey trace is linked to its own place portal

Next Steps

The draft examples of LTAnno for different trace types are only preliminary, for discussion. In the coming weeks, Rainer Simon and I will coordinate development of a spec our respective projects can support. There is a Google Group and email list for this working geoup, and we will go into further detail about the draft spec there shortly. Active collaboration by data modelers, data providers, and future users of the format is most welcome.

One of the first orders of business is gathering a few sample datasets for different types of traces, in order to better understand the variety of modeling circumstances. These will then have to be converted into early versions of the format to test usability and usefulness.

At a later stage, we’ll have to put together a simple Linked Pasts ontology describing terms introduced by both this new LTAnno format and the recently developed Linked Places format for gazetteer data connectivity.

Linked Traces

Linked Pasts is an annual symposium. Linked.Art and Linked Places are data models with associated format specifications. Can we manage one more Linked something? Rainer Simon and I have begun an initiative to develop a Linked Traces annotation model and file format as a standard for contributions to linked open data aggregation projects such as World-Historical Gazetteer and Pelagios. The effort could easily extend to software and systems for displaying, searching, and analyzing trace data. The idea has drawn considerable interest, so here are some thoughts to start a discussion…and action.

What is a trace?

For our purposes a trace is any historical entity having a spatial-temporal setting (i.e. footprint) of interest—very general! The types of traces we’re immediately focused on include: people and groups of people, events of any complexity, and artifacts of all kinds (e.g. objects, texts, art works).

What is trace data?

Trace data are annotations of web-published records about (and images of) trace entities. We posit here that the body of a trace annotation must include a place reference (URI and name/title), should include a relation (e.g. waypoint, findspot, birthplace), and could include a temporal scope for that relation. Properties like creator and date are musts also. Trace data should take the form spelled out in the W3C Web Annotation Model and Vocabulary, in the JSON-LD syntax of RDF. Draft examples of some trace annotations have been posted in a GitHub repository for discussion. There are a few outstanding issues that need community consensus to resolve, outlined below.

Why trace data?

The Peripleo pilot application launched a few years ago by the Pelagios project is an example of traces in action. Underlying Peripleo is an index of a) place records aggregated from multiple gazetteers, and b) what we are now calling trace data: annotations of records about ancient coins, coin hoards, and inscriptions with relevant locations such as find spots.

There are many other kinds of things associated with places—at times or during periods—we might like to see, compare, and analyze as elements of “deep” linked data place records in future Peripleo-like software (e.g. World-Historical Gazetteer, now in development). For a given place, discover not only what museum artifacts or inscriptions were found there, but what historical persons are associated with the place, and in what way; what journeys of exploration or pilgrimage it was a waypoint on; and what texts and art works it is a subject of.

We have already heard from people with Person and Event data, and Rainer notes that this should support annotations of IIIF-formatted manuscripts and other images.

One sample

Here is one sample draft annotation record for a Journey event. As mentioned, more examples are on GitHub.

{ "@context":[
    "http://www.w3.org/ns/anno.jsonld",
    { "lpo": "http://linkedpasts.org/ontology/lpo.jsonld"}
  ],
  "id": "http://my.org/annotations/92837",
  "type": "Annotation",
  "creator": {
    "id":"http://example.org/people/2345",
    "name":"Ima Tracemaker",
    "homepage":"http://tracemaker.org"},
  "created": "2019-03-18",
  "motivation": "linking",
  "body": [
    {"id": "http://whgazetteer.org/places/86880",
     "dc:title": "Tashkent",
     "lpo:relation": "lpo:waypoint",
     "lpo:when": {"timespans":[
       {"start":{"in":"630"},"end":{"in":"630"}}]}
    },
    {"id": "http://whgazetteer.org/places/84774",
     "dc:title": "Mathura",
     "lpo:relation": "lpo:waypoint",
     "lpo:when": {"timespans":[
        {"start":{"in":"634"},"end":{"in":"634"}}]}
    },
   // ... etc.
 ],
 "target": {
   "id": "http://my.org/events/90001",
   "type": "lpo:Journey",
   "dc:title" "Pilgrimage of Xuanzang"
 }
}

Open questions

The next step is for a working group to collectively answer existing open questions, and to surface (and answer) questions we haven’t thought of. We welcome collaborators and observers. A few questions that came to mind while developing the prospective samples:

  1. What are the types of traces (annotation targets)?
  2. Should there be a vocabulary of type-specific relations? E.g. waypoint for Journey traces, or birthplace for Persons.
  3. How can bodies (place/time assertions) be combined as sequences in sets for a given target? E.g. Journey waypoints.
  4. How can relations be combined in sets? E.g. a Place was both a birthplace and deathplace for a Person.
  5. Where can “when” be expressed in an annotation? E.g. in #4, can a date be associated with each relation to the Place?
  6. How should “extension” terms we introduce (and allowed by the W3C spec) be defined? In a “Linked Pasts Ontology”? What will be its contents?

Undoubtedly more will surface.

Next steps

I’ve created a Google Group email list for tracking conversation amongst collaborators and observers, and posted parts of this document as an editable Google Doc. After some initial feedback perhaps we should have a Google Hangout. (My plan to begin extracting Google from my life is not going well!) Suggestions for other tools and platforms are welcome.