You are here

Excel add-in Rosanne: islands of structure in unstructured data

 

Rosanne is a software tool, produced by COMMIT/Project eFoodLab, that allows users to add semantics when creating new datasets. It is implemented as an add-in for the popular Microsoft Excel software package. Users select areas inside a sheet to become semantic tables. They can then select terms from proprietary or public ontologies, including the publicly available and widely applicable OM-ontology (quantities and units of measure), to annotate the data in the table.

Data management

Different industrial food companies and research centres have expressed a need for data management.

In industrial research and development, but also in production settings, trade and retail, spreadsheets with unstructured data are abundant. The terminology used in these datasets typically differs between individuals, departments and organizations. Moreover, jargon and personal abbreviations are often used; units of measure are omitted, etc. Lack of understanding of the data and needless overhead for clarifying the data causes work processes to be inefficient. It can also give rise to serious errors. Without high quality metadata it is very difficult to find, interpret and reuse the data, let alone integrate data from different sources

Rosanne is a software tool that aims to facilitate adding ‘islands of structure’ in terms of semantic tables within otherwise unstructured formats. This means that tables are annotated and handled using shared vocabularies. These annotations supply the necessary metadata to make it easy to find, interpret and reuse the data. Rosanne also offers support in integrating data from different spreadsheets, using the information in the annotations.

The aim of this valorisation project was to produce a proof-of-concept version of Rosanne, which would offer all functionality within Excel and be robust, fast and user-friendly. In addition, it would link to company ontologies, created in this project, to offer easy customisation to each company’s unique needs while maintaining generic applicability. An essential part of the project was to test Rosanne with several industrial partners.

Rosanne

​Rosanne is a software tool that allows users to add semantics when creating new datasets. It is implemented as an add-in for the popular Microsoft Excel software package. Users select areas inside a sheet to become semantic tables. They can then select terms from proprietary or public ontologies, including the publicly available and widely applicable OM-ontology (quantities and units of measure), to annotate the data in the table.

These vocabularies (ontologies) ensure that unambiguous concepts are used, making it easier to find and understand the data. However, users still have the freedom to use their own text in the headers (for example departmental terminology, or a local language) and other areas of the spreadsheet.

Behind the screens, Rosanne uses the RDF Record Table model to model the structure of the annotated data table. This permits the data to be converted to an entirely semantic form, which enables advanced computer support in processing the data. Rosanne takes advantage of this to offer users support when integrating files. Users can select two or more spreadsheet files, and select any number of tables from the spreadsheets for integration. The user does not have to know the cell address of the data or even which file it is in. They define the integration easily in terms of the semantic concepts; for example they may choose to see ‘mass’, ‘viscosity’ and ‘creaminess’ of a ‘Product’. The add-in then compiles the table with this information automatically.

Uniqueness

Rosanne is seamlessly embedded in Excel, extending the standard ribbon to enable user actions. Once a user has obtained permission to install Rosanne, it can simply be downloaded and installed locally as a component of Excel. It connects to public ontologies using web services.

Rosanne is unique due to its online connection to a library of ontologies and its easy user interface. The ontology library includes the comprehensive ontology of quantities and units of measure OM and several food-related ontologies, all available online on the website http://www.wurvoc.org. A user or company may also add their own ontologies to accurately describe their own data. Rosanne allows users to manipulate tables in terms of objects and quantities rather than indices of rows and columns; it is seamlessly integrated in standard spreadsheet software – essential for usability and user acceptance. Moreover, to the best of our knowledge, it is the only solution that has built-in support for semantic data integration, so that a direct benefit can be derived from the annotations in making integration quick and easy.

Read the whole report below.