2010-04-14 - FAO/MIMOS AGROVOC Workshop, Wednesday, meeting notes On Skype channel: Gerard Sylvester Sachit Rajbhandari Gudrun Johannsen Armando Stellato Prashanta Shrestha Tom Baker Ahsan Morshed Armando present: -- Purpose of "domain_concept" is simply to "cluster" domain concepts -- A sub-class may have multiple parents; multiple inheritance - but not common. -- Instead of hierarchy of classes, would be more natural to model concepts as instances of class Domain Concept (ag:c_domain_concept). -- What are the Semantics of hasTranslation - only lexicalization? -- Is there any relation like hasTranslation in Agrovoc thesaurus? Tom says: I have checked the triples - there are no concepts that are subClassOf more than one class (except for those that are _explicitly_ declared to be subClassOf owl:Thing, which is unnecessary... Currently hasTranslation relation is created (automatically) between all the terms in different languages. If there is no specification that hasTranslation occurs only between some terms in different languages, we can infer that hasTranslation occurs between all the terms in different language and reduce 400k triples. Tom says: There are 1,167,435 triples with "hasTranslation" in AGROVOC... ! Armando was looking at an older version when he counted 400k - every time a translation is added, it explodes further... Armando: It is more precise to say we want to give up the class- instance paradigm. Main limitation of OWL-DL is that it did not allow metaclassing. ---------------------------------------------------------------------- Tom Baker - AGROVOC in SKOS -- Proposing that AGROVOC moves to SKOS format -- SKOS XL - extension to SKOS allows for ref of label -- SKOS is standardized as a metamodel for thesauri -- AGROVOC is a source of requirements; played a major role in SKOS standardization process as use case -- Can the workbench can be re-engineered to consume a SKOS format? FAO says: I think that SKOS provides more possibilities than OWL. JKeizer - definite need to keep differentiation among terms (lables - hasSynonyms). Goal - to publish AGROVOC as Linked Data. URIs haven't been promoted for Linked Data - option to revisit before doing so. Also what type of URIs? use PURLs? Trade-offs between using "hash" or "slash". Version number is 'out-of- fashion'! AOS is the banner under which a lot of these dev. has taken place - does it need to be part of the URI. ---------------------------------------------------------------------- Gudrun presenting on the Workbench Johannes says: the concept server uses often and wrongly the word ontology. Language Management: Spanish and Spanish LA are listed as different languages, is this correct? For the RSS feed you should have the possibility to choose which changes should be submitted. Ahsan says: still having a search problem - takes too much time ! Link AGROVOC on http://aims.fao.org/website/AOS-%3A-Registries leads to a page in Arabic ("Sorry you provided an invalid URL") at http://aims.fao.org/ar/pages/374/sub Johannes: strings should be checked if they are already present in the concept scheme to avoid that a concept is inserted twice. Gerard: I think this was discussed yesterday..how does the system differentiate between say - Hyderabad in India and Hyderabad in Pakistan. Sachit says there are 4,000,000 triples in the current snapshot. My snapshot, dated 2010-01-29, has 2,000,000 triples. Do I not have a complete snapshot? Johannes: what is the criteria for the languages in the language list? It seems that there are also languages in which AGROVOC is not existent:" nederlands". Sachit says: more languages are added because same wb is used for authority control and they have data on these other languages also. Tom: Sachit, do you mean the 4,000,000 triples are for AGROVOC (per se) plus other authority files - not just for AGROVOC? Or that there are languages not represented in my 2,000,000 triples? Sachit: No. authority control is completely different one. Sachit: The triple store I sent you on 29 January 2010. Gerard: The most complete AGROVOC SQL file can be downloaded from http://agrovoc.icrisat.ac.in/. Tom : My file of 2010-01-29 is in OWL. I converted it into Ntriples. Is it more-or-less up-to-date? Johannes: the performance is not on a production level at the moment. Sachit: It is on the development server. I'll send the updated owl file once we complete the conversion. Johannes: The definitions from other sources should be linked in as "linke data"! automatically! Is Mesh already available as linked data? Tom: In my understanding, MESH is available in SKOS, but I'm not sure if it is in the public domain. Johannes: We had a Japanese project that should have produced images for nearly all AGROVOC concepts, we have to investigate where they are. Johannes: the language list for AGROVOC and the authority files should not be mixed but be displayed separately sensitive to the scheme that is under editing. Johannes: Gudrun says something has to be changed: what is this presicely? Tom wondering about: aos:hasImageLink rdfs:subPropertyOf aos:WBDatatypeProperty ...? Gerard: Also shouldn't there be a template for images - size, resolution, mode of citation (reference)? Tom: Sixteen properties in the aos: namespace have rdfs:range of xmls:string. Unsure whether this is proper usage. Wondering if rdfs:Literal is more correct. Armando: Regarding Mesh: it is available from: http://thesauri.cs.vu.nl/eswc06/mesh/rdf/meshdata.rdf It is being made available from: Vrije Universiteit Amsterdam. http://www.nlm.nih.gov/mesh/filelist.html this is one is the original site: you just need to register to take files in all formats. Johannes: So we have only to insert an equivalence statement from the AGROVOC URI to the MESH URI and we will get automatically the description displayed? Tom: equivalence statement = SKOS mapping properties? Maybe in a drop-down menu? Wondering if "relationships" module is limited to AGROVOC, or can relationships be created to MESH terms? Johannes: This would be important. Tom: Maybe two ways to do it: extend the Relationship module, or create a separate Alignment module. Armando: In principle there is no big problem in doing it, since we just need to load (with the actual model) also the Mesh SKOS triples (which won't be displayed since they're not governed by domain_concept). Then it suffices to add a library of "linkable" resource. I would also sugggest to create pointers to important linguistic resources, such as WordNet, which is a very common practice for making ontologies/KOSes more easily linkable to other resources via automatic processes (see for example FOAF, though it uses bad links - subclassOf, grr! - to an outdated "WordNet Ontology"). Tom: The SKOS mapping properties could be used to express the alignment. Armando: Yes, absolutely. Tom: I do not believe there are any "mapping" properties among the refinement properties used within Agrovoc. Armando: Yes, i would use SKOS mapping properties in any case (I mean both with current model, and even better, with your proposed one). Ahsan: Can we add any special property for mapping? Tom: Some Linked Data people make liberal use of owl:equivalentClass, owl:equivalentProperty, and owl:sameAs - but it is easy to get into trouble. Armando (to Ahsan): In principle, yes, if we want to assign given semantics to specific links, one can create subpropertìes of SKOS mapping ones. So that yours assume the specific meaning than you need, while you maintain compatibility with generic SKOS compliant tools. Johannes: It seems that particularly any search in the Workbench makes problems. Tom: Would cutting the number of triples by half make search significantly faster? Armando: It is partially related to bad network (there were problems also in downloading normal files). Also, we still have to embed the new indexing engine (we'll try to solve this with Sachit in these days). Armando: In this case, the improvement is linear in the cut (in the number of concepts, more precisly than triples): that is: half concepts, half time. But in any case, it would not solve the problem at the root. The new indexing engine will solve (log-complexity), but we need to assess a few things. Johannes: We looked, and all editors were stand-alone. Anyone aware of other shared editing environments? Armando: both technical (embed it) and conceptual: like what is to be indexed. Web Protege gives some concurrent editing facility. But there is (I think) no real support for multiple (different) users. Tom: pOWL - web-based platform for collaborative Semantic Web devt. http://powl.sourceforget.net/overview.php Armando: also, http://art.uniroma2.it/publications/docs/2009_AIIA09_CONGAS%20a%20COllaborative%20ontology% 20development%20framework%20based%20on%20Named%20GrAphS.pdf this one (realized by my group) WAS very cool (obviously, in my opinion, ahahh :) ) Armando: But we did so many changes to the new edition of the main platform, that we had to abandon the collaborative one. Johannes: Ontology editors tend to have performance problems with lots of triples. Even TopBraid struggles. Armando: It depends on how they are configured. For example, if a reasoner is alwayys active (and it may be by default because they may be setup for small ontologiers), then reasoning takes really a lot of time (and no Allegro graph will really become so fast under reasoning pressure). Dickson: Maybe need bigger machines, with 32GB/64GB memory. Johannes: An application that cannot run on a 32 gig machine is a bad application. Armando: Not only a matter of memory (though it is also important). Reasoning takes lot of time...usually it should be done offline. Johannes: But we are not doing any reasoning. Armando: Yes, ok, I was speaking in general, in our case it's just protege api that sucks :) Sachit: Software architecture and model of the Concept Server workbench. Have been working on it for 2 yrs. Using GWT - helps when the data in windows has to be reloaded; removes the need to refresh the whole page. Protege OWL to manage the triple store - backend SQL (MySQL). GWT also handles compatibility with different browsers (IE, Firefox, Safari, Chrome). Gerard: Gilead interface between Hibernate and GWT. Graph visualization via Web services directly from the MySQL triples store. Sachit: Performance of app when storing triples in MySql better (rather than file based?). Dickson: Protege API already has optimized the triple-store for traditional RDBMS. Intermediate layer helps to swap the technology (eg. Protege) used to access triple store.. its integretated in current version. Prashanta: [??] webservices = AgrovocWB API Gerard: A Web Service supports only Query<->Response. So it is very different from the WB. Right? Johannes: Are the webservices complete? Do they cater for all the needs of the FAO OpenArchive? Gerard: No provision to 'write-back'. Prashanta: Webservices (WS) provide a subset of requests available in WB. WS is not competely different from WB. in the base level they access triple store using Protege API which basically means they follow similar procedure to fulfill the requests. Java program, using Protege OWL API and Hibernate, accesses MySQL of Thesaurus and writes to Triple Store (MySQL). Sachit: performance issues are due to Protege. Dickson: Wwap out Protege and see if performance improves. Sachit: Once AGROVOC is in WB, wll still need to export MySQL to Thesaurus users. Whenever we create a new concept, cannot delete it. Can only change status to Deprecated. Relationship created is not validated ... Admin and Publisher only have access to relationship creation. Status is only for Concepts and Labels - no status for definition. PROPOSED concepts can be deleted. Only accepted concepts cannot be deleted. Tom: Wondering whether proposals that are not accepted are documented in a historical log. Dickson: Validation needs to be done stepwise. First V1, then V2. Prashanta: Proposals that are rejected are stored in log. They appear in recent changes page. Sachit: How to handle export from triple store? Gerard: The WB is quite resource hungry. Shouldn't the export be handled 'outside' of the application? Prashanta: Yes, one of the options is to do the export routine separate from web application. So basically get export request from user (and also email id) .. so the request is sent to some separate app.. which runs the export routine. When it's complete, it can email link to user to download export file Gerard: Armando - WB Architecture Gerard: Performance and Maintainability - Imp. issues. Tom: Does anyone know SKOSEd and skosapi? http://lists.w3.org/Archives/Public/public-swd-wg/2009Apr/0084.html Dickson: Need to benchmark test of agrovoc wb. Sachit: Fails in search module. Dickson: Queries should not be "lost". How many online users are we expecting for May release. Johannes: we can tune this. This year, have two promotion workshops - Latin America - so three-four experts in world. For this year, do not need to get beyond 10-11 concurrent users. That is: editors. There might be alot more users, retrieving. As soon as we open it, may have enormous number of online users. Must protect against Google indexing. If you pull Agrovoc into Google, gets millions of hits. Thousands of Thai users - where find certain terms than in Agrovoc? First few months, should protect for use of the community. During this emergency situation. Can have export, make that available. Have MySQL problems in Rome, had to close some facilities bc people "give me all terms in all languages" and Apache crashed. As soon as you allow people to access big data streams. Dickson: By end of workshop, list of priority items. Rest tackle in version 2. Ahsan: Working group for interface, software, architecture.