2010-04-15 - FAO/MIMOS AGROVOC Workshop, Thursday, meeting notes On Skype channel: Gerard Sylvester Ahsan Morshed Gudrun Johannsen Sachit Rajbhandari Armando Stellato Prashanta Shrestha Tom Baker Johannes Keizer Dickson: MIMOS STP Presentation. Applications built on-top of the STP Multimodel Semantic Browser. Visualize and navigate large knowledge bases. Query built visually by dragging concepts and relations. Query Graph converted automatically into SPARQL. Testbed is 1.3 billion triples. Gather info from blogs, where are topics talked about? Allegro Knowledge Base server is indexed. Gerard: All queries are in parallel -- intuitive pre-fetch. SOA - Architecture Presentation. Application <-> KBI <-> KBase Server (also creates index). Gerard: Lookup service in SOA acts as UDDI for the services. Armando: When out on the Web, the delegator takes care of getting reg/req from distributed apps. They wrote their own load balancing components, which measure directly the hardware load. There's one Data Index (a file), then each KBS has its own local index with its indexed data. Each KBS can have different data, but their federated in someway. Dickson: Allegro graph with free-text indexing the fastest. NASA uses Allegro Graph. Up to 15 million triples, AllegroGraph is free. Dickson: should be two installations. One for public, one for knowledge managers - not in production mode. Should be a process in place to migrate to production site. Armando: free-text indexing is embedded inside allegrograph; they provide an extension for SPARQL which brings free textual search inside SPARQL as a dedicated operator (filter). Gerard: Migration policy to be put in place. Gerard: Modelling - Knowledge Base - Architecture? Johannes: Modelling an dKBase - Technology - User Interaction? Apps need to talk to KBI. Can be public domain. Johannes: Need three groups: 1) modeling group - need final agreement on future modeling of AGROVOC Concept Scheme - what is right approach? I think we are far on this and can agree. Feel more confident we can agree in this workshop. 2) User interface - user requiremnets - work has gone into workbench - pretty good shape. This group: what are the things that have to be adjusted until we get working. What really has to be done. Another way to think user requirements - what would we like to do in a year - life-changing in handling this? User Requirements that give completely new way of modeling? 3) Technology group: first, very crude discussion of technology as is - how to get it to work? Then: transfer to something that works better. Armando outlined yesterday. Then: integration into Allegrograph. Should we try to do now? Intrigued by benchmarking results! Users hate to wait. Then: how to manage mirror of development and production servers. Need to see open-source discussion - not ideologiical here - how to become sustainable? Sustainability maybe achievable in different ways besides open source. Fantasy: AGROVOC hosted at MIMOS, doing in collaboration with FAO for Agri community - stable solution - maybe decide not to embark on venture for open-source SKOS editor - getting open-source projects - most projects are dead - what is the balance? Johannes: (a) Modelling Group - AGROVOC Concept Scheme - What is the right approach to it? (b) User Interface/Requirements Group - working in two directions - what are the things that need to be adjusted - what would we like to do in a year? Future? (c) Technology - what has to be done? transfer to something that works better (based on Armando presentation; AllegroGraph) Johannes: SKOS editor as open-source - useful - but not necessarily for us to do. AgriDrupal - we are collaborating - tens of thousands - Open Source community exists. Instead of embarking on open source - keep open mind for now. Where could this evolve if we are on MIMOS platform? ---------------------------------------------------------------------- User Interface and Structural group into one group. aka Group Modelling and User Interaction group. Tom: Check if there're really 43K synonyms rels in the original DB. ACTION: synonyms should be checked in AGROVOC database. Gerard: 8,251 tuples hasSynonyms in the current AGROVOC MySql DB Gudrun: Yes, it's possible. ACTION: Merge classes and related instances. Gerard: hasMainLabel to determine if it is Pref label or Alt label? ACTION: Come up with an algorithm that uses the triple about isMainLabel to determine whether it is a preferredLabel or altLabel. Gerard: Then aos:isMainLable can be dropped after determining this. ACTION: Remove extra c_ and i_ after merging class/instance. ACTION: Examine the datatype declaration for hasTermType, hasStatus, takenFromSource, isPartOfSubvocabulary. -- hasTermType needs to be reviewed - is it modelled as a data type? Maybe SKOS term type can be related to hasTermType with a range and identified with URI instead of string? -- isPartOfSubVocabulary: SKOS collection? -- FOAF status? FOAF uses some term_status property. hasStatus: can be used any other standard property: ACTION: Flag handling of term types in general as an issue, model has to be changed. Gerard: We have 15 scopeIDs in English. ACTION: hasEditorialNote: is it significantly different from scopeNote? We can use SKOS native properties for this. Gerard: 53,000 CONCEPTS and 500,000 TERMS (approx) ACTION: Data is only 80%, but it just crashes at a certain time, so maybe there are holes in it an data can seem even less. we just have to generate a complete snapshot. ACTION: hasDefinition nature (first-class citizen?) should be clarified Gerard: Lexicalization (+ noun) = SKOS Label ? Tom: no functional requirements, does not add any additional info, for noun...can it be dropped? ACTION: Check "noun" for possible deletion. Replace lexicalization+noun with SKOS-XL constructs. ACTION: Discuss possible requirement of meta-vocabulary for linking external resources (to provide homogeneous semantics for them). JKeizer: Alignment with MeSH, WordNet, Euro WordNet, and other vocabularies. ACTION: Create separate concept scheme for AGRIS/CARIS Classification. Intention was to integrate AGRIS/CARIS Classification directly into the thesaurus, though maintaining like different facets on them (perspectives, having separate classification scheme). Automatic linking to AGROVOC. AGROVOC CS facilitates extraction of any classification scheme as needed. Need to flag AGROVOC concepts used in various classification schemes. Tom: In SKOS, a concept can belong to many concept schemes - same effect can be achieved by tagging concept as belonging to (skos:inScheme) AGRIS Caris classification or to (skos:inScheme) AGROVOC. ISSUE FLAGGED: a number of properties are declared to have a range of xmls:string - is this the correct usage? In Dublic Core, RDFS:literal is used. ACTION: Other declared vocabularies should made available separately as linked data. ACTION: WBDatatypeProperty, conceptEditorialDatatypeProperty... better to not be put on linkedData (they are just used to drive the application through the model). Armando's proposal: they could be used inside dedicated NamedGraphs (which would not be outputted to linkedData export). ---------------------------------------------------------------------- 15:30 - Johannes 1) What short-term changes needed to Workbench interface? 2) What kind of interface do we want for the future? For the breakout group on interface. 3) How could the transformation from Agrovoc from current model to SKOS take place? Are there content problems related to it? Technical breakout group...: -- what are the tech probs that must be resolved to fix for Open Archive by May - with working interface - taking into consideration that many triples may be eliminated If we cannot change structure by end of May we can do some interventions - renouncing some triples to improve performance? -- Put everything into MIMOS system? URIs: -- agrovoc should *really* be in the URI. Johannes: we were heavily criticized by Neon evaluators for having non human-understandable IDs. In Jo's opinion, no language is globally human understandable, and he prefers to keep original Agrovoc IDs. Tom: http://id.loc.gov/authorities offers example on a template for writing understandable URIs though maintaining codes: ...initialpartofNS/code#understandablename http://id.loc.gov/authorities/sh85097196#concept -- for the concept, resolves to the page http://id.loc.gov/authorities/sh85097196 Armando: for the hashVSslash issue (pro and contra), some history: http://esw.w3.org/HashVsSlash Tom: regarding earlier discussion of URIs, see http://groups.google.com/group/pedantic-web/msg/fc698ddd51115d60, where Richard Cyganiak writes: In my experience, getting content negotiation right is difficult, and it's a persistent source of interoperability problems. At the same time, I perceive the 303 approach as more complex than the hash approach. With hindsight, I think it's a bit unfortunate that "303 + conneg to HTML and RDF" has become the "canonical" way of deploying RDF on the Web, because it's complex and error-prone, and there are simpler methods. The focus on "303 + conneg" has created a perception that deploying RDF is harder than it actually is. With RDFa, we're in the happy situation that there is a selection of technologies that make it possible to do away with all the complex server configuration around content negotiation and 303 redirects. So RDFa plus hash URIs hits a sweet spot in many situations. > I'm particularly interested in emerging > best practice for publishing large, continually growing > vocabularies. I think that the question of "hash vs. slash" and "RDFa vs. HTML+RDF/ XML" is largely orthogonal to the question of the size or change rate of the dataset. You could use which 303s to a describing document specific to that resource; or you could use without 303s and conneg. I would use the latter if I wanted to keep my server setup simple and if I can live with the slightly uglier URIs. ---------------------------------------------------------------------- Workbench performance Gudrun: Increasing the speed of the search functionality - prime importance. Gerard: Some features of Advanced Search like exact match, starting with etc to be incorporated into the search. Johannes: Comparing search string with all labels...? Rowena: Clicked on concept selected, however it doesn't go the exact one. Gerard: selecting concepts sometimes gives different results (wrong) Gudrun: Very often happens that you have to click three or four times - reload - takes a long time. Or you get a list, often get "no results found". Wrong results after a search query - consistently takes a lot of time to display search results. Sometimes have to wait 3-5 minutes for result. SEARCH should be highest priority! From my POV the highest priority. Takes ages until you can find the concept you want to relate to the concept. System seems not to remember the last action. System should know how to go back where you started. Values not passed from one page to another through actions. The system doesn't remember what the search when after choose to click on "show URI" and also "non preferred term". Need memory of where you came from. Example: search for Rice then select non-preferred name -> search needs to be executed again. Lavagna: Even while adding new relationships - finding concept to add. Gerard: Instead of searching can we type directly the concept? Gudrun: As alternative to browse, should be possible to type "Rice" or "Forest" and if it shows me terms starting with "Rice" instead of waiting for everything to load. Suggest for a search with content to browse instead to wait for entire list to load. Gerard: In creating a new relationship to another concept for Related Concept can we redefine the BROWSE to do a SEARCH with contains [text] and then select the relevant concept? For move concept there is no validation. Lavagna: Validation process - moving concept, need to do validation afterwards - but there is no validation process, and you don't see it in the list either - very dangerous - can damage system by moving to wrong place and not know. Gerard: [Home] to be rechristened as [Recent Changes]. If a terms exists - then a warning should be given while adding new terms : exists in this context do you want to create in a different context? Validator's role to make sure that there is no duplicate term; in any case there should be a warning otherwise too much of load on the validator. A validator to ensure terms are not duplicated. Should there be a prompt out sign to highlight terms had been duplicated. Gerard: Rights to be granted ONLY to people we know. AGROVOC could be spammed. No term available in selected language - UI to be 'beautified' to hide this message? Rowena: Concept: No term available in selected language are shown at the top of the list whenever start up the system. Gerard: add if(isNull(termspell)) then don't display on the WB on SEARCH - based on language selected. Rowena: Frustration occurred when a user derived such a long list of output but not able to view parent concept. Gerard: Parent concept hovers above the child concept in the display?? Screen realestate to be looked into. ACTION: In order to use terminology consistently, decide between: -- Label (pro: used in SKOS, con: some people expect "Term" -- Lexicalization (pro: used in legacy AGROVOC model, con: AGROVOC-specific) -- Term (pro: some people expect this, con: different from SKOS terminology, which could lead to confusion in capacity building) Gerard: Images links should be checked periodically. Rowena: Have to copy image twice, at least from Internet Explorer. Gerard: What happens if we find that a image has been moved/deleted? Image template? Option to add other resources also. For the next version, option for mapping. Gerard: Geographical location to be tagged with the concept? (like spatiallyIncluded in for countries)