Version 1 Has Launched

After three years of development, we are pleased to announce the launch of Version 1 of the World Historical Gazetteer (WHG), at whgazetteer.org. Version 1 follows six beta releases over the past year or so. The WHG presently indexes 1.8 million modern place references and approximately 60,000 temporally scoped records.

In addition to filtered search and API access to data, we have developed a suite of tools that allow you to upload place datasets into a private workspace, augment them by reconciling them against the Getty Thesaurus of Geographical Names and Wikidata, publish them as Linked Open Data, and contribute them for accessioning to the WHG index.

We  have a long list of planned improvements and a queue of in-progress data contributions. More data is very welcome of course, and your feedback is essential! You can use the site contact form, create an issue on GitHub, or simply write to us at whg@pitt.edu.

We have completed a Site Guide that describes the purposes, functionality, and data of the present system.  The Tutorials and About sections of the site provide additional information. We will continue keeping our nearly 500 Twitter followers current with news of new data, new features, and bug fixes. We also plan to keep our blog updated with relevant announcements and discussion of the project’s future course..

We are pleased to announce this major step in our project and we look forward to your

The World Historical Gazetteer Team
Ruth Mostern, Karl Grossner, and Susan Grunewald

Linked [Places, Traces, Art, …]

This post aims to clarify the relationships between a few of the models now in development for various uses by members of the historical linked data community, particularly with regard to geography (place)—namely Linked Places, Linked Traces, and Linked Art. Figure 1 provides a conceptual overview (click to magnify).

Figure 1 – Place in Linked Places, Linked Art, and Linked Traces conceptual models

Background

In several key respects the World Historical Gazetteer project (WHG; now in beta release 0.3) builds upon software and data development work produced by the Pelagios project—particularly the historical gazetteer infrastructure underlying its Peripleo and Recogito software applications.

Peripleo is a pilot application (no longer in active development) built to demonstrate a few key linked-data-for-history functions: a) search of a central index aggregating historical gazetteer records published as Linked Data, b) the annnotation of web-published records about historical objects with identifiers for relevant places (mostly coins and inscriptions in this case), and c) the display of search results for both in a map interface. WHG performs those functions also, along with some others.

Recogito is an annotation platform that among other things makes use of that historical gazetteer index by facilitating association of place references tagged in textual sources with the identifiers, coordinates, and name variants found in the indexed gazetteer records.

I have collaborated with Pelagios developer Rainer Simon and a few other interested folks to develop a Linked Places model and format particularly for contributions to the Pelagios and WHG platforms. The Pelagios and WHG indexes will have considerable overlap in coverage, but we anticipate that of WHG will over time become considerably broader in space and time—due primarily to its built-in semi-automated data development and contribution pipeline and stated goal of global breadth.

Because both projects have interest in annotations, we have also begun jointly developing a Linked Traces format—more precisely a set of implementation patterns using the W3C Web Annotation format standard for digital history and GLAM applications.

With that introduction, what follows are some details about Linked Places and Linked Traces, and thoughts about their immediate and potential uses. Also, given the concurrent development of the Linked Art model and ontology, some thoughts about how all of these might in time relate to each other in practice. Figure 1 above should provide useful reference.

Linked Places: model and data format

The Linked Places model and interconnection format (LPF) were developed to meet the particular requirements of the WHG and Pelagios platforms: a common data structure that both could ingest routinely without the need to accommodate on a case-by-case basis the enormous variety of data models in use by digital historical projects, large and small. LPF is a set of extensions to GeoJSON-LD, itself a Linked Data enabling extension to the most widely implmented test-based format for representing geographic features, GeoJSON (an IETF standard).

LPF also adds a standard means for adding time to GeoJSON features, introducing “when” objects to permit temporally scoping of a) an entire Feature, and/or b) its individual names, place types, geometry, and relations to other places, in any combination.

Uploads to WHG (and accessioning to both Pelagios and WHG indexes) require creating a serialization (i.e. transforming export) of place data from whatever form it is maintained in to LPF. We have also developed an abbreviated delimited text file format (LP-TSV) to meet the needs of contributors with relatively simple records.

Figure 1 summarizes the LPF conceptual model.

Linked Art: model and format

A global consortium of organizations involved in the domains of art, cultural heritage and archaeology—principally large museums and universities—are jointly developing Linked Art, a “shared model based on Linked Open Data to describe Art,” along with software implementations of it. The conceptual model is being formalized in an ontology with a subset of CIDOC-CRM entities and relations, and expressed as a data format using JSON-LD, a syntax of RDF.

From the perspective of WHG, Linked Art is a format many prospective users of our platform may adopt to describe objects in their collections. Both WHG and Pelagios are agnostic as to what formats our users and data partners use, and as mentioned above, users will have to perform a serialization to Linked Places format to interact with our platforms.

Figure 1 shows how Place appears in the Linked Art model. The points of contact with Linked Places are identifiers. One kind of identifier in Linked Art is a URI to a linked data gazetteer resource. A serialization of Places from a Linked Art dataset to LPF should include as many such identifiers as can be managed. WHG can aid discovery of those URIs via reconciliation services to Getty TGN and Wikidata.

All that said, place data from Linked Art collections are unlikely to be good candidates for contributions to WHG; the great majority of places will already be indexed. Rather, it is Linked Traces data that will be more relevant.

Linked Traces: model and format

WHG is following on from the Peripleo pilot in experimentally indexing not only place data, but what we are calling trace data: “annotations of web-published records about historical objects with identifiers for relevant places.” We say “experimentally” because it seems likely that the most useful web interfaces to trace data will be distinct from those for place data. Certainly there will be significant scaling issues.

In order to continue exploring the linking of places and associated traces, we (Rainer Simon and I) have also initiated development of a Linked Traces format, as a potential standard for use by the WHG and Recogito platforms. Linked Traces is turning out to be a set of implementation patterns for the W3C Web Annotation format (WA).

Annotating records of “anything” with URIs for web-published place records is but one use case for WA. For example, in Recogito, users annotate texts with references to not only places, but also people, events, and relations between all three.

Figure 1 indicates the way that a set of one or more place records can form the body of an annotation. The JSON form of the body in that example corresponds to an early draft of a “Linked Traces place pattern” in development. The working group’s activities are paused at the moment, but WHG is developing some exemplar data according to that draft, to be explored in our Version 1 release, slated for late spring 2020.

Beta Release v0.2

At long last we are ready to offer a v0.2 beta release of the World Historical Gazetteer (WHG) at http://dev.whgazetteer.org. We hope that spatial historians and spatio-temporal infrastructure developers will be interested in taking a look at what we are building, experimenting with their data or provided samples. It is a “sandbox,” so nothing will be saved for the time being (that will change soon). There are 5-6 months remaining in the term of our initial NEH grant, time enough to complete most of what we planned for this phase, and to incorporate more suggestions from users and potential contributors as we move toward future planning and development.

The site includes a brief guide titled “WHG Beta Release: A Tour,” which outlines what is there, what you can do and how, remaining challenges, and what is in the works. What follows is a higher level introduction.

Places and Traces

The World Historical Gazetteer is a Linked Open Data platform for publishing, linking, discovering, and visualizing contributed records of attested historical places and traces. Our initial focus has been on places, but we are working experimentally to demonstrate their integration within the platform with what we now call traces–defined as web resources about historical entities for which location in time and space is of scholarly and general interest. We are considering three classes of traces for the time being: agents (people and groups), works (e.g. artifacts, texts, datasets), and events (e.g. journeys, conflict). Our objective has been to create the first large-scale spatial infrastructure for world history: oriented toward documenting the human past at the global scale, and particularly the geography of global and transregional connections.

Our accessioning process is intended to eventually be largely self-directed; getting it to that stage means working directly and hands-on with our early contributors.

LOD Publication

Registered users of WHG can publish their place records as Linked Open Data simply by uploading them in Linked Places format (or the LP-TSV version intended for relatively simpler records). We see LOD publication as a key feature for researchers who are not in a position to stand up their own web interfaces with per-place pages. Once uploaded, each record will have a permanent URI and be accessible in our graphical interface and API; on their way to being LOD in good standing. The dataset can be browsed immediately by its owner in a searchable table and map, but turning the uploaded dataset into a contribution for accessioning requires some further steps. The data needs to have as many asserted links to name authorities as possible, and augmentations of geometry where that is missing and findable. We provide reconciliation services for that purpose.

Reconciliation

Simply put, reconciliation is the process of identifying matches between records of named entities. In this case the records are for places, and the matches are between a researcher’s records and those in existing place name authorities. So far, we provide reconciliation services for the Getty Thesaurus of Geographic Names (TGN) and Wikidata; DBpedia and GeoNames are planned.

The reconciliation process has two steps: 1) sending records to the authority, and 2) reviewing the prospective matches returned and accepting or declining them as appropriate. The results of this somewhat laborious process are 1) links, and 2) more geometry. Once augmented in this way, a dataset is ready for accessioning.

Accessioning

The last step is another reconciliation effort — this time to the WHG index. Each record is compared to the growing WHG index to determine if we have a contributed attestation for the place yet or not. If we do, the incoming record becomes a “child” or “leaf” in the set of attestations for the place. If the place is not yet accounted for, the new record becomes a “parent” — the seed for a new set of attestations. At this stage, an automatic linking can be made if two records share an authority match, but the rest will have to be reviewed as described above.

Graphical Interface

The opening screen of WHG offers users search of places and traces. We try to offer enough context on the opening screen to identify the likeliest match. Once you identify a place of interest, clicking its name take you to a “place portal” screen–where everything we have about the place, or linked to it in some way, will appear: attestations from contributors, associated traces, nearby places, physical geographic context (rivers, watersheds, ecoregions). The place portal is very much a work-in-progress at this stage. Several other features are also on our near-term to do list, including advanced search; more and better maps; user data collections; project team ‘workspace’; batching of reconciliation tasks; and more.

A Word About Architecture

There are two data stores within the WHG platform: a relational database (PostgreSQL) and a high-speed index (Elasticsearch). All uploaded data gets imported to a set of relational tables whose names correspond to the elements of Linked Places format: places, place_name, place_type, place_geom, place_link, place_when, place_related, place_description, and place_depiction. Contributed data is most readily managed in that form. Upon accessioning, records are added to the index in the manner described under Reconciliation above.

An API

This part of the WHG platform is one of the most important, and the least developed right now. Stay tuned for further developments. Our intention is to provide access to both contributors’ individual records and datasets from the database (when designated by their owner as public), and to the aggregating index records; both with numerous and useful filtering capabilities.

Content

Our index has been instantiated with records from modern gazetteer resources: 1) about 1,000 of the world’s most populous cites from GeoNames, 2) ~1.8 million place records from Getty TGN, 3) about 1,500 societies from the D-Place anthropological repository; and 4) major rivers, lakes, and mountain ranges from Natural Earth and Wildlife Research Institute.

To this modern “core” we have begun adding historical data: 1) 10,600 entities harvested from the index of the Atlas of World History (Dorling Kindersley, 1995), offering broad but shallow global coverage; and 2) our first specialist gazetteer, HGIS de las Indias, which consists of approximately 15,000 settlements and territories in colonial Latin America. There are several additional large datasets in the queue, which we will be adding in partnership with contributors. Some are previewed as heat maps on our Maps page.

Broad coverage of modern names with increasing historical depth and connections supplied by trace data.

Our Pelagios Connections

The WHG platform borrows extensively from the Peripleo application developed by Rainer Simon of the Pelagios project, extending it significantly in a few ways. Our backend architecture closely mimics that underlying both Peripleo and the Recogito annotation tool, and we are actively collaborating with Rainer and the entire Pelagios Network team on several aspects of this work. In particular, we are co-developing the data format standards for contributions to both systems: Linked Places format, and a nascent Linked Traces annotation format.

Feedback

We welcome suggestions, critiques, even praise :^) – and there is an email form on the site which makes it easy to offer it. Please bear with us in this active development stage and check back as we realize the system’s potential more fully over the next several months. Look for further blog posts and follow us on Twitter; we tweet progress and related information as @WHGazetteer and @kgeographer.

 

Linked Traces progress

As described in our last post, the World-Historical Gazetteer (WHG) and Pelagios projects have adopted the term “trace” to refer to historical entities for which there is spatial-temporal data of interest, including events, people, works, and other artifacts. Following the lead of Pelagios’ Peripleo, the WHG system (initial beta release July 2019) will index contributed trace data, linking them to places in an underlying knowledge graph that is a) navigable in graphical features, and b) queryable in an API.

We (WHG and Pelagios) have set out to create a standard Linked Traces data format (LTAnno), which will take the form of W3C web annotations. We welcome (need, actually) active collaboration in that modeling task, and feedback from interested observers.

An LTAnno target is an LOD web-published record of some entity, and its body contains a) a place record URI, b) a relation between the place and the target entity, and c) an optional temporal scoping for each. It should be possible to have multiple bodies per target (per example below) and multiple targets per body (e.g several people having the same birthplace, several works having the same place as subject, and so on).

A Trace Data Example

WHG will fully support LTanno format, and likely focus on a few types of trace data, including those related to geographic movement such as journeys and cultural diffusion. Figures 1 and 2 illustrate a test Journey record using the draft LTAnno format. In it, 38 annotation bodies referring to WHG place URIs are linked to a single target, the WorldCat record for the source of the waypoints, “Xuanzang: a Buddhist pilgrim on the Silk Road” (Wriggins, 1997). A user finding their way to any of the 38 places will learn they were waypoints on the journey, be able to see the others, and to navigate to their respective place pages. This only scratches the surface of what will be possible, given a growing volume of trace data for other events, people, and works.

Figure 1 – Portal page for Bamiyan in World-Historical Gazetteer (in development)
Figure 2 – Each waypoint for the Journey trace is linked to its own place portal

Next Steps

The draft examples of LTAnno for different trace types are only preliminary, for discussion. In the coming weeks, Rainer Simon and I will coordinate development of a spec our respective projects can support. There is a Google Group and email list for this working geoup, and we will go into further detail about the draft spec there shortly. Active collaboration by data modelers, data providers, and future users of the format is most welcome.

One of the first orders of business is gathering a few sample datasets for different types of traces, in order to better understand the variety of modeling circumstances. These will then have to be converted into early versions of the format to test usability and usefulness.

At a later stage, we’ll have to put together a simple Linked Pasts ontology describing terms introduced by both this new LTAnno format and the recently developed Linked Places format for gazetteer data connectivity.

Linked Traces

Linked Pasts is an annual symposium. Linked.Art and Linked Places are data models with associated format specifications. Can we manage one more Linked something? Rainer Simon and I have begun an initiative to develop a Linked Traces annotation model and file format as a standard for contributions to linked open data aggregation projects such as World-Historical Gazetteer and Pelagios. The effort could easily extend to software and systems for displaying, searching, and analyzing trace data. The idea has drawn considerable interest, so here are some thoughts to start a discussion…and action.

What is a trace?

For our purposes a trace is any historical entity having a spatial-temporal setting (i.e. footprint) of interest—very general! The types of traces we’re immediately focused on include: people and groups of people, events of any complexity, and artifacts of all kinds (e.g. objects, texts, art works).

What is trace data?

Trace data are annotations of web-published records about (and images of) trace entities. We posit here that the body of a trace annotation must include a place reference (URI and name/title), should include a relation (e.g. waypoint, findspot, birthplace), and could include a temporal scope for that relation. Properties like creator and date are musts also. Trace data should take the form spelled out in the W3C Web Annotation Model and Vocabulary, in the JSON-LD syntax of RDF. Draft examples of some trace annotations have been posted in a GitHub repository for discussion. There are a few outstanding issues that need community consensus to resolve, outlined below.

Why trace data?

The Peripleo pilot application launched a few years ago by the Pelagios project is an example of traces in action. Underlying Peripleo is an index of a) place records aggregated from multiple gazetteers, and b) what we are now calling trace data: annotations of records about ancient coins, coin hoards, and inscriptions with relevant locations such as find spots.

There are many other kinds of things associated with places—at times or during periods—we might like to see, compare, and analyze as elements of “deep” linked data place records in future Peripleo-like software (e.g. World-Historical Gazetteer, now in development). For a given place, discover not only what museum artifacts or inscriptions were found there, but what historical persons are associated with the place, and in what way; what journeys of exploration or pilgrimage it was a waypoint on; and what texts and art works it is a subject of.

We have already heard from people with Person and Event data, and Rainer notes that this should support annotations of IIIF-formatted manuscripts and other images.

One sample

Here is one sample draft annotation record for a Journey event. As mentioned, more examples are on GitHub.

{ "@context":[
    "http://www.w3.org/ns/anno.jsonld",
    { "lpo": "http://linkedpasts.org/ontology/lpo.jsonld"}
  ],
  "id": "http://my.org/annotations/92837",
  "type": "Annotation",
  "creator": {
    "id":"http://example.org/people/2345",
    "name":"Ima Tracemaker",
    "homepage":"http://tracemaker.org"},
  "created": "2019-03-18",
  "motivation": "linking",
  "body": [
    {"id": "http://whgazetteer.org/places/86880",
     "dc:title": "Tashkent",
     "lpo:relation": "lpo:waypoint",
     "lpo:when": {"timespans":[
       {"start":{"in":"630"},"end":{"in":"630"}}]}
    },
    {"id": "http://whgazetteer.org/places/84774",
     "dc:title": "Mathura",
     "lpo:relation": "lpo:waypoint",
     "lpo:when": {"timespans":[
        {"start":{"in":"634"},"end":{"in":"634"}}]}
    },
   // ... etc.
 ],
 "target": {
   "id": "http://my.org/events/90001",
   "type": "lpo:Journey",
   "dc:title" "Pilgrimage of Xuanzang"
 }
}

Open questions

The next step is for a working group to collectively answer existing open questions, and to surface (and answer) questions we haven’t thought of. We welcome collaborators and observers. A few questions that came to mind while developing the prospective samples:

  1. What are the types of traces (annotation targets)?
  2. Should there be a vocabulary of type-specific relations? E.g. waypoint for Journey traces, or birthplace for Persons.
  3. How can bodies (place/time assertions) be combined as sequences in sets for a given target? E.g. Journey waypoints.
  4. How can relations be combined in sets? E.g. a Place was both a birthplace and deathplace for a Person.
  5. Where can “when” be expressed in an annotation? E.g. in #4, can a date be associated with each relation to the Place?
  6. How should “extension” terms we introduce (and allowed by the W3C spec) be defined? In a “Linked Pasts Ontology”? What will be its contents?

Undoubtedly more will surface.

Next steps

I’ve created a Google Group email list for tracking conversation amongst collaborators and observers, and posted parts of this document as an editable Google Doc. After some initial feedback perhaps we should have a Google Hangout. (My plan to begin extracting Google from my life is not going well!) Suggestions for other tools and platforms are welcome.

 

 

Of Historical Mapathons, Seeds, and Graphs

“To make a digital historical gazetteer, it would help to have a digital historical gazetteer.” — anonymous

The World-Historical Gazetteer project (WHG) is soliciting contributions of place data (attestations of places in historical sources) from any region and period, in any quantity, in order to link them in a “union index” and thereby link the research that discovered them. Almost all of the historical sources are texts or tabular datasets, and although text sources often include descriptions of relative location (e.g. in a province or near a river) rarely do either include geographical coordinates. Considering that one of the main reasons we record place names in such sources is to map them or further analyze and compare them, this presents a problem.

A Problem

If we look up the names in the global modern place authority sources like GeoNames, Getty Thesaurus of Geographic Names (TGN), DBpedia, or Wikidata (a process commonly called “reconciliation”) we obtain generally poor results. Many historical names are no longer in use, many refer to multiple places, and many potential matches get lost in the shuffle due to varying transliteration schemes, alternate spellings, and OCR transcription errors.

A typical scenario we encounter is that a sizable majority of places referenced in a historic text or corpus remain un-located after reconciliation against modern authorities and are therefore not mappable. Granted, mapping is not always the point, but it is often an essential goal. Even if it isn’t, some graphical or otherwise computable representation of the spatio-temporality in a historical source usually is.

We are coming to realize there are some steps we as a community can take to improve this situation: “historical mapathons,” “prioritizing seed datasets,” and “geographical graphs.”

Historical Mapathons

To quote Wikipedia, “a mapathon is a coordinated mapping event.” Until now they have almost always involved adding features to OpenStreetMap in a area for which they are relatively sparse, often in response to a disaster. Mapathons might occur in a single room, where some guidance (and/or pizza) is provided to participants, or “virtually” – where anyone across the globe makes contributions using a web-based software like the iD map editor.

A historical mapathon is a coordinated mapping event where the activity is “feature extraction” for one or more historical maps. That is, tracing features as point, line, or area geometries, along with the associated place name and potentially other attributes the cartography offers. This activity has been performed routinely by creators of historical GISes and others, but we’re aware of only one web-based group “crowd-sourced” virtual mapathon – GB1900, the self-described “name transcription” project of University of Portsmouth in 2017-18 (results). Over a period of many months, volunteers transcribed over 2.5 million text strings represented on Ordnance Survey six inch to the mile County Series maps published between 1888 and 1914. The result is a dataset that should prove immensely valuable to historians of Britain in that period. We can assume that if the GB1900 data is indexed by WHG (our intention), future efforts to map texts of the period should be improved greatly. This is one example of the “seed” principle discussed below.

Seed Datasets

Thanks to the Pleiades project, “a community-built gazetteer and graph of ancient places,” researchers wishing to study the geographies of texts of and about the ancient Mediterranean can locate a very high proportion of place references found in their sources. The Pelagios Commons’ Peripleo project has used Pleiades data as a “seed” in a growing index of place attestations. Records of places within the index are continually augmented with further attestations of places contributed by others. Over time the index has been expanding, spatially and temporally. WHG is following on from Peripleo, extending its aims in a few ways and offering unlimited spatial and temporal coverage. Therefore, seed datasets for particular regions and periods are highly desirable for us.

A seed dataset can take a few forms and might come from a few directions: 1) a repository of attestations laboriously curated by a group of scholars over time (e.g. Pleiades); 2) a historical geographic information system (HGIS), also laboriously developed by one or more scholars and derived from cited primary and secondary sources (e.g. China HGIS, HGIS de las Indias); 3) a historical mapathon.

An Example

Two datasets at the top of our long queue of pending contributions are Werner Stampl’s HGIS de las Indias, and “the Alcedo gazetteer” [1]. Each is fairly comprehensive for 18th century Latin America (~15k and ~18k records respectively). Alcedo happens to be one of over 200 sources for HGIS de las Indias, and there is considerable overlap in coverage. The LatAm Gazetteer project, recently initiated under a Pelagios Commons micro-grant, developed digital text versions of Alcedo and its English translation from scanned images, and from those, a dataset of headwords and place types. At that stage, effective reconciliation with modern gazetteers was impossible — for example, Getty TGN has eight distinct Acapulco listings in modern day México. The original entry text reads, “situada en la Costa de la Mar del S,” which could narrow the possibilities considerably, but there is no ready way to send that phrase as actionable context when searching TGN.

LatAm research developer Nidia Hernández was then able to match 60% of the Alcedo entries to an Indias HGIS entry, and because it has containing districts, provinces, and countries (with geometry), we can record those topological relations and ultimately improve results of the reconciliation to Getty TGN. Still, we have thousands of records that are locatable only by painstaking reading of the original entry, which might still place them only in relation to entities that no longer exist or whose names have changed. A mess! And commonplace.

What we can take away from this is a) it helps to have a large authoritative “seed” for any given region and period – like HGIS de las Indias in this case; and b) in any case, it would help immeasurably to have data from historical maps of the period in place before working with texts — data developed in historical mapathons, that is. Maps provide approximate geometry and a hierarchy of “within” relationships, making reconciliation to modern gazetteers easier, and a “nice to have” option rather than essential.

Realizing a Historical Mapathon

Extracting (tracing) place data from old maps can be tedious and time consuming, so it’s best if a) highly motivated groups do it; b) it’s limited to a few key maps per group; and c) tools are available to make it as easy as possible. The steps involved are:

  1. Choose a few maps having the desired coverage and a viable license (e.g. using Old Maps Online or the David Rumsey Map Collection).
  2. Decide on an encoding strategy: what to digitize and whether to geo-rectify each map to a modern map. This will vary according to the group’s purposes and the individual maps’ cartography. If the distortion is not too extreme, digitizing features will produce estimated geometries that may be of value.
  3. If indicated, geo-rectify maps using desktop GIS (QGIS, ArcMap) or a web-based tool like MapWarper or Georeferencer (built in to the Rumsey site or standalone). This important step is best done by someone with experience at it or willingness to master it.
  4. Have group members use an online tool to view the map(s) (overlaid on a “real-world map” if geo-rectified) and create point, line, and polygon features according to the encoding strategy of Step 2. Saved data can be downloaded and mapped at any stage, and when complete, uploaded to WHG as a contribution.

Having completed this, any subsequent groups attempting to map texts of the region and period in tools like Recogito should see radically better results. Over time: more seeds like this, an easier contribution workflow to WHG, better coverage generally, leading to a true world historical gazetteer resource.

Next steps

The roadblocks to staging a historical mapathon right now are a) the lack of a single tool designed specifically for Step 4*; b) lack of straightforward tutorial for Steps 1-4. The WHG team is committed to working on both of these, and to have them in place by mid-July 2019. We’ll post progress periodically. In the meantime, think of which maps you’d really like to mine in this way, and who you might get to join your mapathon team.

*It should be noted that in some scenarios for non-georeferenced maps, the Recogito tool can be used as is. In that case, all that’s missing is a workflow for converting “within” relationship tags to the Linked Places format used for contributions to WHG. The result will be a graph dataset that could be useful for some purposes.

————-

[1] The 1787 “Diccionario geográfico-histórico de las Indias Occidentales o América” (Alcedo) and its English translation (Thoimposn, 1812).

Progress Report, Dec 2018

Data pending accessioning to WHGazetteer, Dec. 2018 (partial)

The World-Historical Gazetteer project (WHG) is now just past the halfway mark in our three-year NEH grant period. We are on track to produce what was promised in the proposal, notwithstanding some expansion of scope and unforeseen developments and opportunities. We expect future project milestones roughly as follows:

  • March 2019: Alpha release of functional contributions system and basic API
  • July 2019: Limited public beta for comment: contributions, API, place pages
  • July 2019: Second Advisory Committee meeting
  • September – December 2019: Rolling beta releases and data additions
  • 1 April 2020: Launch
  • April – May 2020: Refinements

There are currently seven (!) activity tracks: 1) data development; 2) outreach to potential data partners; 3) format standards; 4) backend design and development (data stores and contribution pipeline); 5) frontend design and development; 6) API; and 7) ecosystem.

The primary focus right now is the contribution pipeline of #4 and some pilot screens below show progress. We have a considerable volume of data queued up for conversion to Linked Places format then accessioning, and much more is promised.

The functionality we are providing for contributing data has become more ambitious than originally envisioned. We need to accommodate contributions ranging in size from a handful to many thousands, with a minimum of hand-holding by WHG support staff.

Registered users will have a dashboard from which they can manage datasets through the entire workflow: (i) uploading files, (ii) ingesting to our database, (iii) initiating reconciliation against name authorities, (iv) reviewing and validating “hits”, and (v) submitting the resulting enhanced dataset for indexing. Users will also be able to create, download, and share “MyPlaces” collections of records tagged while using the site. That’s a lot!

I will be presenting this poster at the Mainz Linked Pasts IV meeting, 11-14 Dec. The following figures are excerpted from it.

I will also join presenters of the LatAm Gazetteer project funded by a Pelagios Commons Resource Development grant. The LatAm project has provided us two excellent datasets, which together will form within WHGazetteer a “seed” of sorts for Colonial Latin America. Progress working with that data is indicated on the following functional screens in our pilot (pre-alpha) site. Note that little to no attention has been paid to styling at this point.

Upload file to begin dataset creation
List and manage datasets in progress
Review results of reconciliation to Getty TGN, marking correct “hits”

Contributing in Linked Places format

HGIS de lasIndias spatial footprint
Fig. 1 – HGIS de las Indias spatial footprint

We are pleased to announce version 1 of the Linked Places (LP) format, to be used for contributions to World-Historical Gazetteer (WHG) and to the Pelagios “registry index” that underlies its Recogito annotation tool and Peripleo search interface. We were joined in this effort by Rainer Simon of Pelagios and Graham Klyne and Arno Bosse of Oxford’s Cultures of Knowledge, so a big shout of thanks to all three.

A basic specification of the format has been published to GitHub, along with some relevant files:

The spatial footprint of the HGIS de las Indias contribution appears in Figure 1; more about that data and ongoing efforts to make it the “seed” for a virtual LatAm Gazetteer within WHG are forthcoming soon.

An Interconnection Format, Not “The OneModel

Historical research projects producing gazetteer data have distinctive data models reflecting their source data and project-specific requirements. We are completely agnostic as to contributors’ internal models and formats. The Linked Places format provides a uniform way to build links between different gazetteers.

Temporal Scoping

WHG and Pelagios are aggregating contributions which have great variety in scope and granularity. With LP format, we are striking a balance between accommodating detail and offering simplicity. Two underlying conceptual models: of Place and of attestations. We say (hopefully not controversially) that places have, over time,

  • one or more names in various languages
  • one or more functional types
  • zero or more known spatial footprints (geometries)
  • one or more relations to other places

Gazetteer data developers gather attestations of these properties, often temporally scoped. That is, names, types, geometries, and relations were/are true for some timespan—whether we have that detail to hand or not. Thing is, some projects do and some don’t, and we must accommodate both cases.

The simple subject, predicate, object form of RDF is not adequate for these kinds of relations and so we have reified them as attestations: NameAttestation, TypeAttestation, Setting, and RelAttestation. Each can be temporally scoped with a “when” element and include citation information, although this is not required. We also permit (and encourage) a global “when” element for the record—effectively the union of temporal attributes for names, types, geometries, and relations.

Links

We ask that all contributions include, if at all possible, at least one closeMatch or exactMatch relation to a published place record having a de-referenceable URI. These links are “coin of the realm” so to speak. If an incoming record shares such a link with an existing record in our registry index, we  add it to the existing record. If it shares no such link with any existing records and doesn’t otherwise match name, type, and geometry, a new registry index record will be created using it as a seed.

Three other kinds of LinkAttestations to existing web resources are recognized: subjectOf, primaryTopicOf, and seeAlso.

Record-Level Properties

Several “record-level” properties round out this format, as detailed in the GitHub specification: title, ccode (modern country), description(s), and depiction(s).

Moving Forward

Please get in touch  if you are interested in contributing to WHG. The Linked Places format is not set in stone; this v1 is subject to revision based on our experiences ingesting contributions. The ingest of HGIS de las Indias data as a gazetteer offered to users of Recogito has succeeded, following significant manual effort to make the transformation from that project’s complex model to LP format. Temporally scoped namings and parthood relations in that dataset made it over, but are not reflected in Recogito. Simpler example datasets will be added soon. The WHG web interface and API are being designed to expose all contributed place attributes, but they won’t be available for several months.

Linked Places Annotations

Next up in our format-creation effort is an update to the standard format for contributions of annotations. Just as Peripleo displays coins and inscriptions associated with places in its registry index, the WHG graphical search will such annotation records, but in our case for historical routes, datasets, and bibliographic records. The format previously used by Pelagios/Peripleo will be updated soon—details to follow.

Contributing to World-Historical Gazetteer: a Preview

The World-Historical Gazetteer project (WHG) will soon begin aggregating and indexing historical gazetteer datasets, and exposing them as Linked Open Data via graphical and programmatic web interfaces — just as Pelagios Commons’ Peripleo project has done for a few years. And like Peripleo, WHG will also index contributions of annotation records that associate historical “items” with place identifiers. Typical items for Peripleo have included coins, coin hoards, and inscriptions of the Classical Mediterranean. Items records WHG will focus on include journey events, regions, and datasets. In fact, annotated items could be anything for which location is relevant, e.g. people and various types of events.

We are almost ready to begin accepting contributions; this post previews the pipeline and formats involved.

Contributions to WHG can include, in some combination: 1) gazetteer data, i.e. place records drawn from historical sources; 2) annotation records that associate a published record about an item with a place identifier; 3) collections of item metadata records referenced in annotations; and 4) a file describing the contributed dataset(s) in Vocabulary of Interlinked Datasets” format (VoID).

Over the past several weeks we have collaboratively developed a new Linked Places format (LPF here for short) with Rainer Simon of Pelagios, to be used for contributions of historical place data to both WHG and Pelagios’ Peripleo. The Linked Places format is designed around the JSON-LD syntax of RDF (it is also valid GeoJSON, with temporal extensions, as explained in the GitHub README). The new format makes use of several existing vocabularies and also introduces some terms specific to our shared purposes.

Several expert colleagues contributed valuable input, including Graham Klyne, Richard Light, Lex Berman, Arno Bosse, and Rob Sanderson [1]. We are in the process of updating the template Peripleo has used for annotation contributions (formerly Open Annotation in RDF Turtle, now its next-generation W3C Web Annotation in JSON-LD). Both are discussed in a little more detail below.

Contributing historical place records

There will be two separate workflows for contributions: from larger projects and from smaller ones. The distinction is whether a project has the capability and resources to meet two criteria which are accepted norms for publishing Linked Data: 1) publishing data in some syntax of RDF (in our case the new LPF); and 2) providing a unique URI and associated “landing page” for each resource described.

Case 1: Larger projects

If your project has (or will have) a web presence that provides public pages describing your individual places and/or “items”, (routes, regions, etc.), then we ask that you perform a transformation and export of your data in the standard formats mentioned above – Linked Places, a future annotation format (see Contributing Annotations below), and VoIF. Upon validation, we will ingest those records, link them with those already in the system, and expose them in a nice GUI and API. Details of WHG interfaces are forthcoming soon.

Case 2: Smaller projects

If your project does not entail creating a web site providing per-record landing pages, then we can accept your data contribution as CSV, mint unique URIs, and provide very basic landing pages for places and other items. The records will also be made available as JSON-LD (bonafide RDF) via our API. We will provide a Python program for converting CSV to LPF, but note that the CSV will have to conform to a template that aligns with LPF (available soon). Conversion from your native format to our CSV template will probably be more manageable than to LPF. In other words, upon submitting CSV data we can parse, a semi-automated conversion and ingest procedure will result their publication as Linked Open Data.

Contributing annotations

WHG will index metadata describing historical “items” annotated with gazetteer record identifiers. These annotation records assert, in effect: “this item is/was associated with this place, in this way;” and optionally, “at this time.”

The result of such annotations can be seen in the current Peripleo interface, where upon navigating to a given place, you can view metadata (including images) for coins and inscriptions associated with it in e.g. a foundAt or hasLocation relation. Annotations exposed in the WHG web interface will include historical journeys for which the given place was a waypoint, and regions, works, and datasets including or referring to the place.

Annotation contributions will comprise two sets of data: 1) collections of brief Item metadata records; and 2) collections of annotation records in W3C Web Annotation format. The contribution template in use by Pelagios’ Peripleo now is currently being updated to better account for typing of items and relations. Details of that new Linked Places annotation format (LPAF?) will be published soon. Collaborators in that modeling effort are most welcome!

[1] Twitter handles, in order: @gklyne, @RichardOfSussex, @mlex, @kintopp, and @azaroth42

Progress and Next Steps, Jan. 2018

The World-Historical Gazetteer project (WHG) has been under way for six months, and we’d like to let people know what progress we have made and our immediate next steps.

Progress

  • We digitized the index of “Atlas of World History,” a 1999 Dorling-Kinderley volume edited by historian Jeremy Black. This has given us approximately 10,000 places to seed our “spine” gazetteer, including cultural places like settlements, states, regions, peoples, and archaeological sites, and natural features like rivers and mountain ranges. Each entry is associated with one or more of the atlas’s 450 maps, each of which has a temporal coverage.
Georeferenced places from Black Index and societies from D-Place
Georeferenced places from Black atlas index (red) and societies from D-Place (green)
  • We have aligned approximately 70% of places in the atlas index with records in GeoNames and/or DBpedia so far, giving us geometry, additional name variants, and Wikipedia abstract text. In the near future we will also align records with the Getty Thesaurus of Geographic Names. A significant proportion of unmatched entries simply aren’t in existing gazetteers — but will be in ours!
  • We have gathered several data sets to augment the spine with additional cultural and natural features that will allow us to contextualize place records in our interfaces in some novel ways:

Societies (peoples) and related language regions (D-Place)
Rivers and lakes (Natural Earth)
Watersheds (World Resources Institute)
Mountain ranges (Natural Earth)
Terrestrial ecoregions (biomes; World Wildlife Federation)
Major ocean currents (NOAA)

  • In September, we held a kickoff meeting in Pittsburgh. Participants included members of our four data partner teams, several experts from the historical Linked Open Data “ecosystem,” and members of the Pitt and Carnegie Mellon digital humanities and library communities. We received a lot of valuable input, much of which we have compiled into a set of 70 “user stories.” These will inform design of our data models, graphical interfaces and API.
  • Our initial mapping of cultural features indicated that, as expected, some regions are somewhat under-represented. We are identifying a few significant historical print gazetteers and maps, digitization of which can help rectify the problem, and facilitating the work to extract and publish that data. The first such project will help fill a gap for 17-18c Latin America – the 1786 “Diccionario geográfico-histórico de las Indias Occidentales ó América,” and its 1812 English translation.
  • We also have worked to position our project in the the ecosystem referred to earlier. Over the past year, Technical Director Karl Grossner served as co-coordinator of the Linked Pasts Working Group of Pelagios Commons, which recently published a white paper – now open for comment as a shared document – “From Linking Places to a Linked Pasts Network.” We view WHG as joining Pelagios’ Peripleo system as a place-centered Linked Pasts Network Hub; more about that framing effort in due course.
  • We have begun experimental integration of the Getty Thesaurus of Geographic Names (TGN) into the WHG system. This extraordinary resource (~2.5 million geo-referenced places; ~4 million place names) has, like most existing gazetteers, limited temporal information. We’d like to facilitate temporal annotations to all indexed place records, as we move towards a truly historical gazetteer system.

Next steps

Following a mid-January 2018 system design charrette, database and software development begins in earnest. Early design plans will be published for comment.

Data development will continue throughout the duration of the project, and discussions with several prospective data partners are ongoing. Our goal is supporting communities of researchers specializing in particular regions and periods, and guiding them to publishing place data we can incorporate in our “union index.” The initial focus is on Colonial Era Latin America and West Africa, Maritime Southeast Asia, and Early Modern Europe, but we welcome inquiries concerning any region/period combination.