Harmonising the Heterogeneous at the Cultures of Knowledge seminar series with Eero Hyvönen.
Notes are unedited.
Two forms of the Web : WWW for humans, GGG (Giant Global Graph) for data.
Core data set 1048 data sets and 59 billion triples.
Google’s Knowledge Graph and Microsoft’s Satori – graph engines in the search giants.
Why semantic web?
Intelligent search – can add meaning to the search and specifics. Abstraction can be added.
Multilingual search. Using a URI as identifier, variations can be indexed and stored.
Cultural Content Heterogeneity – items can be linked across collections.
Helps with the construction of cultural content production but also shows lack of standards. If not linked, then cannot cross reference.
Semantic Web 2.0?
Enriching data through sharing – equality in sharing.
CultureSampo as a pilot example.
In part, relational DBs can be mapped to RDF (if created correctly). Worked out the model, then put the ontology on top. Appears to be a giant union of tables.
Once the data is correct, then it can be re-used. Multilingual search through the different ontologies. Puts the collections on the map.
Re-use nodes between data sets to make the union successful.
Align the metadata models between the schemes. Use the metadata as a foundational event-based model.
What kind of infrastructure?
Sharing ontologies: general concept, actor, place, time, event and domain specific. Collected ontologies into a cloud to share them with KOKO (https://onki.fi/en/browser/overview/koko).
Actor changes from linguistic variations, marriage, nicknames and so on. created a semantic DNB from Finnish set. Linked to different data sets to enrich the data. Life story can be enriched by events. Even taxonomies changes.
Shared metadata schemas:
Sharing linked open data: publish the data as services (http://onki.fi).
Use / create simple tools for users to get the data from the service. Used the 5 star data model but added in schema and provide validation against the schema to provide trust.
Should be able to build sites on top of the services (http://www.ldf.fi).
Try to provide automatic documentation.
Open your data so that it links with others’ data: reduces redundancy, data is enriched, work can be shared and data can be re-used.
Used shared infrastructure to prevent data linking problems.
Using the validation to prove that data, or enriched data, is correct.
Users should be able to download from a native RDF repository.
Schema validation – tool looks at the schemas and tries to resolve them. If you have an ordinary element, then validate that the value comes from the vocabulary. Semantic and vocabulary.
Spatio-temporal queries – times don’t happen at the same moment.