Authors: Gianluca Colombo,Ettore Colombo,Andrea Bonomi,Alessandro Mosca,Simone Bassis
ArXiv: 1309.7697
Document:
PDF
DOI
Abstract URL: http://arxiv.org/abs/1309.7697v1
Over the last decades, the amount of data of all kinds available
electronically has increased dramatically. Data are accessible through a range
of interfaces including Web browsers, database query languages,
application-specific interfaces, built on top of a number of different data
exchange formats. All these data span from un-structured to highly structured
data. Very often, some of them have structure even if the structure is
implicit, and not as rigid or regular as that found in standard database
systems. Spreadsheet documents are prototypical in this respect. Spreadsheets
are the lightweight technology able to supply companies with easy to build
business management and business intelligence applications, and business people
largely adopt spreadsheets as smart vehicles for data files generation and
sharing. Actually, the more spreadsheets grow in complexity (e.g., their use in
product development plans and quoting), the more their arrangement,
maintenance, and analysis appear as a knowledge-driven activity. The
algorithmic approach to the problem of automatic data structure extraction from
spreadsheet documents (i.e., grid-structured and free topological-related data)
emerges from the WIA project: Worksheets Intelligent Analyser. The
WIA-algorithm shows how to provide a description of spreadsheet contents in
terms of higher level of abstractions or conceptualisations. In particular, the
WIA-algorithm target is about the extraction of i) the calculus work-flow
implemented in the spreadsheets formulas and ii) the logical role played by the
data which take part into the calculus. The aim of the resulting
conceptualisations is to provide spreadsheets with abstract representations
useful for further model refinements and optimizations through evolutionary
algorithms computations.