EuSpRIG 2011 Abstracts and notes by Patrick O'Beirne

EuSpRIG 2011: Spreadsheet Governance – Policy and Practice

www.eusprig.org

Abstract in italics, my notes in regular font.

Breviz: Visualizing Spreadsheets using Dataflow Diagrams

Felienne Hermans, Martin Pinzger, Arie van Deursen (swerl.tudelft.nl)

We have addressed the need of spreadsheet professionals for support with the transition of a spreadsheet to a colleague with the generation of data flow diagrams. We apply these data flow diagrams for the purpose of understanding a spreadsheet with three example cases and suggest it for the assessment of the quality of the spreadsheet’s design.

This paper won the EuSpRIG 2011 David Chadwick Student Prize. I had always discounted previous research into DFDs because of the spaghetti diagram from real-sized spreadsheets, but Felienne has developed it further into a layered diagram of semantically related blocks. Early days, but promising. http://www.slideshare.net/Felienne/breviz-at-eusprig

Spreadsheets in Financial Departments: An automated analysis

Dr. Kevin McDaid, Dr Ronan MacRuairi, Mr. Neil Clynch, Mr. Kevin Logue, Mr. Cian Clancy, Mr. Shane Hayes (dkit.ie, cuspatechnologies.com)

65,000 spreadsheets for the financial departments of both a government and a private commercial organisation were analysed using the ‘Luminous’ technology. This provides an important insight into the nature and structure of these spreadsheets, the links between them, the existence and nature of macros and the level of repetitive processes performed through the spreadsheets.

All commercial spreadsheet discovery and monitoring applications analyse repositories. What is unique about the DKIT approach is an innovative system which can at last give companies a handle on the amount of time people are spending with Excel, on which project, doing what operations in Excel. This at last holds out the possibility of spreadsheet productivity improvements, which is the Holy Grail of researchers like Tom Grossmann and practitioners like us. Up to now, we had to rely on over-the-shoulder observation and on-the-job training and consulting along the lines of "here's an easier way in Excel to do this", "There's a better solution to this problem", "This could be automated completely if you adopted x technology" etc. Now we still need Excel/VBA experts to assess the findings but they will have a better view of the overall process and daily use of spreadsheets.

Requirements for Automated Assessment of Spreadsheet Maintainability

José Pedro Correia, Miguel A. Ferreira (sig.eu)

In this position paper we argue for the need to create a model to estimate the maintainability of a spreadsheet based on (automated) measurement. We propose to do so by applying a structured methodology that has already shown its value in the estimation of maintainability of software products. We also argue for the creation of a curated, community-contributed repository of spreadsheets.

This starts from existing software quality assessment techniques to map metrics of static analysis to ratings. As we know, all the main commercial assessment frameworks already collect spreadsheet metrics. Their contribution is to group these into a five-point scale of maintainability. They have made a start but need more than the commonly available teaching or example spreadsheets to work with. They offer to curate a repository of serious spreadsheets. Send them some big ones, anonymised! I have a list of about 100 big ones I’ve given them.

From Good Practices to Effective Policies for Preventing Errors in Spreadsheets

Daniel Kulesz (informatik.uni-stuttgart.de)

While there are many proposals for such practices in literature written by practitioners and researchers, they are often not consistent with each other. This paper proposes an expert-based, retrospective approach to the identification of good practices for spreadsheets. It is based on an evaluation loop that cross-validates the findings of human domain experts against rules implemented in a semi-automated spreadsheet workbench, taking into account the context in which the spreadsheets are used.

Unlike the sig.eu paper, they combine human ratings (which sig,eu avoid) with metrics from static analysis. They intend to allow the rules engine to be expanded using a Java API. This, again, is very similar to what the commercial spreadsheet assessment platforms do.

Controls over Spreadsheets for Financial Reporting in Practice

Nancy Coster, Linda Leon, Lawrence Kalbers, and Dolphy Abraham (lmu.edu)

This paper describes a survey involving 38 participants from the United States, representing companies that were working on compliance with the Sarbanes-Oxley Act of 2002 (SOX) as it relates to spreadsheets for financial reporting. The findings of this survey describe specific controls organizations have implemented.

25 of the 38 companies surveyed have assets between $1Bn and $100Bn. Change management was the no.1 self-identified area of concern. Although over half say they have a review process with a checklist, 30% report the reviewers have no domain experience. 11 indicated that internal reviews have discovered lapses in controls. Astonishingly most say they plan to implement Excel Track Changes, apparently without realising the problems that brings in its wake.

Leveraging User Profile and Behavior to Design Practical Spreadsheet Controls for the Finance Function

Nancy Wu (backofficemechanics.com)

This paper discusses a major barrier that is preventing more organizations from adopting enterprise spreadsheet management programs. But even without a corporate mandated effort to improve spreadsheet controls, finance functions can still take simple yet effective steps to start managing the risk of errors in key spreadsheets by strategically selecting controls that complement existing user practice.

The most implementable controls are those that do not require advanced Excel knowledge and enhance existing paper review activities – validation, labelling and separating assumptions.

Towards Evaluating the Quality of a Spreadsheet: The Case of the Analytical Spreadsheet Model

Thomas A. Grossman, Vijay Mehrotra, Johncharles Sander (usfca.edu)

We suggest four principles. First, state the domain—the spreadsheets to which the guidelines apply. Second, distinguish between the process by which a spreadsheet is constructed from the resulting spreadsheet artefact. Third, guidelines should be written in terms of the artefact, independent of the process. Fourth, the meaning of “quality” must be defined. We define the domain of “analytical spreadsheet models”, which are used in business, finance, engineering, and science. We suggest that ultimate goal is a set of guidelines for an evaluator, and a checklist for a developer.

They select six quality dimensions: Suitable, Readable, Transferable, Accurate, Reusable, Modifiable. For ‘analytical’ spreadsheets, ie those that take some inputs, perform some non-obvious calculations, and present reports to decision makers, they propose these design principles: Modular, Structured, Input-Output, Information Flow, Separate Inputs & Computations & Reports.

In Search of a Taxonomy for Classifying Qualitative Spreadsheet Errors

Zbigniew Przasnyski, Linda Leon, and Kala Chand Seal (lmu.edu)

In this paper, we propose a taxonomy for categorizing qualitative errors in spreadsheet models that offers a framework for evaluating the readiness of a spreadsheet model before it is released for use by others in the organization. Inspection of the errors [in the Panko ‘Wall’ problem] reveals four logical groupings of the errors creating four categories of qualitative errors. The usability and limitations of the proposed taxonomy and areas for future extension are discussed.

This consists of a list of what I call ‘bad smells’ in spreadsheet structures. One, ‘Poor layout for model extension’ is a common problem addressed by the FormulaDataSleuth presentation later. Some of these can be identified by automated tools, some only by expert inspection.

Workbook Structure Analysis – “Coping with the Imperfect”

Bill Bekenn and Ray Hooper (fairwayassociates.co.uk)

The FormulaDataSleuth® toolset developed for the analysis of workbook structure comprises: the identification of layout in terms of filled areas formed into “Stripes”, the identification of all the Formula Blocks/Cells and the identification of Data Blocks/Cells referenced by those formulas. It is essential for the initial “Watching” of an existing workbook and enables the workbook to be subsequently managed and protected from damage.

This is particularly suited to spreadsheets designed by people with a ‘product’ focus where there is a block of similar calculations repeated for a large number of products; as distinct from the data-oriented tabular approach or the calculation-oriented modelling approach. The tool uses algorithms to identify stripes, areas, and blocks, and facilitates safer copying of these blocks with the correct relative and absolute addresses.

Effect of Range Naming Conventions on Reliability and Development Time for Simple Spreadsheet Formulas

Ruth McKeever, Kevin McDaid (dkit.ie)

A new experiment measures the effect of range names on the correctness of, and the time it takes to develop, simple summation formulas. Formulas developed by non-experts using range names are more likely to contain errors and take longer to develop. Less structured naming conventions result in poorer performance by novice users.

This experiment asked people to create formulas using only names and prevented them from checking what cells the names referred to. So, the less meaningful the name, the worse the user’s guesses were. It shows that you can’t expect automatic improvements by simply applying commonly recommended practices without the implicit context that goes with them. This kind of research shows the need for practitioners to more carefully define the implicit ‘terms and conditions’ of their best practice guidelines.

An Empirical Study on End-users Productivity Using Model-based Spreadsheets

Laura Beckwith, Jácome Cunha, João Paulo Fernandes, João Saraiva (uminho.pt)

To improve end-users productivity, recent research proposes the use of a model-driven engineering approach to spreadsheets. In this paper we conduct the first systematic empirical study to assess the effectiveness and efficiency of this approach. A set of spreadsheet end-users worked with two different model-based spreadsheets, and we present and analyze here the results achieved.

They apply database normalisation techniques to refactor spreadsheets with data tables, then add automatic data validation based on inferences from the table relationships. The tradeoff is that the refactored model is harder to understand for the first time.

Spreadsheet on Cloud – Framework for Learning and Health Management System (SocialCalc)

1K.S. Preeti, Vijit Singh, Sushant Bhatia , Ekansh Preet Singh, Manu Sheel Gupta (seeta.in)

We have proposed a Spreadsheet on the cloud as the framework for building new web applications, which will be useful in various scenarios, specifically a School administration system and governance scenarios, such as Health and Administration.

This describes the SocialCalc project, a shared spreadsheet for the XO Sugar operating system of the One Laptop Per Child (OLPC) project. The case study is a school administration system.

A Platform for Spreadsheet Composition (DISCOM)

Pierpaolo Baglietto, Martino Fornasa, Simone Mangiante, Massimo Maresca, Andrea Parodi, Michele Stecca (m3s.it)

The process of elaborating spreadsheet data is often performed in a distributed, collaborative way, [which] may lead to errors in copy-paste operations, loss of alignment and coherency due to multiple spreadsheet copies in circulation, as well as loss of data due to broken cross-spreadsheet links. In this paper we describe a methodology, based on a Spreadsheet Composition Platform, which greatly reduces these risks.

This consists of a client plugin which manages the automatic export and import of data from a shared platform DISCOM. They describe it using a car dealership example. I would welcome a more detail comparison with the MS Sharepoint system.

Beyond The Desktop Spreadsheet

Gordon Guthrie, Stephen McCrory

Hypernumbers is a new commercial web-based spreadsheet. Barriers between data and programme instructions can be simply and reliably reimposed, allowing spreadsheet usage to be split into two distinct phases – development, where audit and testing can reduce errors, and deployment, where what would be risky practices in other spreadsheet paradigms are simply engineered out.

They provide a web-based platform with a hierarchical view of data. This can be contrasted with the approaches of the other two presentations of SocialCalc and DISCOM.

Spreadsheets on the Move: An Evaluation of Mobile Spreadsheets

Derek Flood, Rachel Harrison, Kevin McDaid (dkit.ie)

The increasingly mobile nature of business has meant that more users will need access to spreadsheets while away from their desktop and laptop computers. Existing mobile applications suffer from a number of usability issues that make using spreadsheets in this way more difficult. This work represents the first evaluation of mobile spreadsheet applications available for the Apple iOS platform.

The first limitation is the screen, the second the input method; it may take up to 4 keystrokes to enter a single digit or equals sign.

An Insight into Spreadsheet User Behaviour through an Analysis of EuSpRIG Website Statistics

Grenville J. Croll (spreadsheetrisks.com)

The European Spreadsheet Risks Interest Group (EuSpRIG) has maintained a website almost since its inception in 2000. We present here longitudinal and cross-sectional statistics from the website log in order to shed some light upon end-user activity in the EuSpRIG domain.

In 2010 there were about 35,000 unique visitors to eusprig.org. The traffic trends to the site mirror other trends before and after a financial collapse. A few thousand people worldwide have a qualified interest in the integrity and quality of spreadsheets.

Living with spreadsheets

Dean Buckner (fsa.gov.uk)

Spreadsheets are the tool of choice for linking separated corporate systems. (Think Tower of Babel, Stovepipe solutions, Islands of Automation, EAI). The FSA found one ‘awesome’ system with 9,000 worksheets. Date, source, and type of data is completely opaque, the nature and location of transformations completely unclear. The solution is left as an exercise for the reader.

Estimating Spreadsheet Review Workload in Project Finance

David Colver (operis.com)

He analysed a large number of projects and concluded that the ease of reviewing spreadsheets, and re-reviewing the corrections the client makes, is related to the skill and cooperative attitude of the client rather than to any obvious static spreadsheet metrics.

This is relevant to a number of researchers who are trying to measure quality in spreadsheets by static (structural) inspection techniques. Presumably this is so that some indication can be derived of the workload needed to remediate them. But David’s point is that there is no relationship – it depends on the ability of the author to do what the professional concludes has to be done. This relationship will be familiar to professional software testers as the defect injection rate versus the detection and removal efficiency.

This was a rare opportunity to hear a long-experienced practitioner of both model building and auditing open the kimono on their in-house activity data on work on significantly sized spreadsheet projects. His slides are not publicly available and if we do webcast EuSpRIG conferences in future, this may be the kind of one for which we turn off the camera. There was a 2004 paper from Mercer consultants on the same topic “Financial Modelling of Project Financing Transactions,” which is now removed from the web, email me for my archive copy if you’re interested.

Concluding remarks

I’m pleased with the level of debate and discussion this year. Our task is to make all these good ideas more widely known. I wish researchers would make more of an effort to contact members of the EuSpRIG committee before embarking on a year’s work, which could be accelerated by pointing them to research already done and giving an opinion on how realistic their proposed approach is in the real world. Not that that is definitive – it’s always satisfying in research to confound the ‘experts’!

Patrick O’Beirne

pob_at_sysmod_dot_com