PraxIS Aug. 2007

07-08 Contents: Medical Diagnostic Spreadsheet Risks, Quality, report from Eusprig 2007

ISSN 1649-2374 This issue online at http://www.sysmod.com/praxis/prax0708.htm [Previous] [Index] [Next]

Systems Modelling Ltd.: Managing reality in Information Systems - strategies for success
IN THIS ISSUE
1) IT Risk & Security Risks from untested medical spreadsheets Spreadsheets latest spam message carriers
2) Quality Spreadsheet Quality
3) Spreadsheets Report from Eusprig 2007 conference The Bug Hunt results
4) Off Topic Spreadsheets considered harmful :-)
11 Web links in this newsletter
About this newsletter and Archives Disclaimer Subscribe and Unsubscribe information

_______________________________________________________

Welcome to PraxIS

This is my report from the Eusprig 2007 conference in Greenwich. I will be on vacation in August, returning on 3rd September. Enjoy your break!

Patrick O'Beirne

_______________________________________________________ _______________________________________________________

1) IT Risk & Security

Healthcare IT Risk

In the Eusprig 2007 conference reported below, Simon Thorne of the University of Wales Institute Cardiff (UWIC) cited the Cardiac Anaesthesia Risk Evaluation (CARE) model from the Medical Algorithms website www.medal.org . His analysis revealed some clearly incorrect results from its formulas, which may have been automatically generated. So, we have to wonder: is this CARE spreadsheet in use? Has nobody reported back that it gives wrong results? If that is because they have not been noticed, does that mean that there have been misdiagnoses? If so, with what consequences? For many years, the FDA has been applying stringent controls to spreadsheets used in pharmaceutical companies. Shouldn't we have similar regulatory insistence on data quality in spreadsheets and databases used by medical professionals and hospital administrators?

Spreadsheets latest spam message carriers

http://blog.trendmicro.com/spam-excel28s29/

Spam Excel(s) July 23rd, 2007 by Miray Lozada. "Spammers are Excel-ing, literally. Text and image spam as PDF files are now old news as MS Excel enters the spam scene. Last July 22, Trend Micro researchers started noticing email messages that carry ZIP-packed Excel files. When opened, these Excel files stink of pump-and-dump schemes that spam mails are now notorious for. "

____________________________________________________________
____________________________________________________________

2) Quality

I see a connection between Kath McGuire's talk on training and Ray Panko’s talk on human error and other references to organisational culture. In organisations with a low level of maturity, errors are blameful, therefore they are covered up or masked by compensation rather than correction, or responsibility displaced. Simon Murphy told of a user who ‘had to hide from an angry manager because a supplier had unilaterally changed a data structure without notice.’ Others become defensive and say “Don’t touch my spreadsheet or you will hurt value in the business.” Some people have a fatalistic attitude to difficulties, they seem to accept that that’s the way things have to be. Kath told of one who user suffered for four hours every month individually correcting hundreds of cells with excess spaces. She said that she would rather eat through a foot of concrete than put up with that, and showed the user how to use the search & replace command. Other people will think that tedious and repetitive work should not have to be that way, and are motivated to spend the time to look for more efficient solutions. Where we draw the line between compliance, acceptance, and fatalism, and reflection, creativity, and critical thinking, depends on our self-motivation and curiosity. When we reach a point where we think our work is good enough, we stop learning and never realise how much more work we could avoid by working smarter.

When training is given at the time the pain is felt, the justification is clear and the payback continues. As people build up a repertoire of spreadsheet patterns, they are able to integrate these into larger models. They don’t have to re-invent them every time and commit the problem of Thinking, as Ray Panko might say. They can re-use chunks of familiar algorithms, structure, and code, whether for generic spreadsheet formulas and structures, or for application domain calculations.

An example of a spreadsheet data quality problem

http://www.eiminstitute.org/current-magazine/volume-1-issue-5-july-2007-edition/a-foundation-for-data-quality
"When a manufacturing firm was developing an addition to its global data warehouse involving the purchasing of raw materials on a global basis, it tried to verify the data collected with previous financial statements and their supporting documents. The subsidiary financial data was collected using spreadsheets which were completed in each subsidiary and processed in the corporate accounting system. The data for the global data warehouse was captured directly from existing computer systems used by the subsidiary. When checking the data, it was found that an error had been made while transforming the data from the spreadsheet into the corporate accounting system. A decimal point had been misplaced resulting in a difference in the millions of dollars. The reason this error occurred was that the base unit of measurement differed for each. In the operational systems the monetary values were recorded in actual currency units and the spreadsheet recorded the data in thousands of dollars. The decision was made to record the data in the global data warehouse as collected from the operational systems and to not restate the financial statements because the difference was not material. "

____________________________________________________________
____________________________________________________________

3) Spreadsheets

Notes from the Eusprig 2007 conference

Dean Buckner of the UK Financial Services Authority gave the Regulator’s View on the progress in the control of End User Computing (EUC) in the financial markets. His themes were echoed by many subsequent speakers (my comments in brackets):

1. Change of mindset. He referred to the acceptance that spreadsheets are not going to be replaced by bigger systems, but rather that they are here to stay.

2. User training. This is still shockingly neglected; he still finds dumb solutions that could be replaced by cleaner methods. (This was echoed by Kath McGuire’s stories.)

3. There is no accepted base of ‘good practice’. (Although there is a mass of advice in circulation, best practice is always dependent on the context of the creation, use and maintenance of the spreadsheet with its specific intended user and surrounding technology. The 47 skills in the ECDLF syllabus are a starting point. David Chadwick is looking for more.)

4. Because of (3) there is therefore no accreditation of spreadsheet skills. (There are of course Excel exams one can take, but I am still waiting for organisations to require key users to be qualified to the ECDLF standard.)

5. He sees increasing mention of spreadsheet controls in audit reports. (A sign that this iceberg is becoming visible)

6. Data standards – including data quality in Access databases, created to get over the 65535 row limit in Excel prior to the 2007 version.

7. Software support – more tools and technologies are becoming available to manage spreadsheets. (It’s a pity we didn’t have a talk from Microsoft about their Excel Services using Sharepoint)

Steve Powell of the Tuck School of Business at Dartmouth College revealed the results of their audits into quantitative errors that showed impacts of up to $100 million in real world spreadsheets. These are reasonably large spreadsheets, having on average 15 sheets and 1.5MB in size. They discovered how difficult auditing really is, even with tools like Spreadsheet Professional. The first fully documented study of its kind, this paper received the highest rating by the attendees of all the papers at the conference. I just have to extract this one quote describing one end of the scale:

Organisation 5 is a small consulting company with highly educated employees and a culture that demands excellence. The spreadsheets we audited from this firm were works of art: thoughtfully designed, well documented, easy to understand, and error free.

Ray Panko of the University of Hawaii provocatively titled his paper Thinking is bad. He quoted human error research indicating that logical thought is the most difficult thing we creatures can do. In his categorisation of Blameful vs Blameless errors, he echoed the call in my paper to be frank about unavoidable error rates. He ended with a challenge to those prescribing good or even best practices to produce the evidence that their recommendations are safe and effective.

Paul Bach of Compassoft, the main conference sponsor, reported that 30 to 40% of corporate data is in uncontrolled end-user applications. He described the processes necessary to get them under control: discovery, change management, and validation. He mentioned one user who guessed that they had 300 critical spreadsheets; they discovered millions (not all critical of course).

Ralph Baxter of ClusterSeven, another conference sponsor, outlined Enterprise Spreadsheet Management (ESM) and described the efficiencies that come from saving wasted time. Taking Dean Buckner’s 6^th point for example, the ‘spot the difference’ task of data integrity checking is easier with ESM.

Simon Murphy of Codematic sympathised with the tortured souls in Spreadsheet Hell. Maintenance represents 90% of spreadsheet use, yet the initial development phase gets most of the attention. Most users have only had at most two days formal training in Excel, so they build unstable and fragile frankensheets which are terminally undebuggable and break if anyone changes anything. ‘Keeping it working’ therefore becomes the desperate state of these users like the little Dutch boy keeping his finger in the sea wall.

Soheil Saadat of Prodiance highlighted the larger-scale problems in multiple linked workbooks. Links become broken because paths are changed when these files are moved into a document management system. He described their WebDAV system for discovery, risk analysis and automatic updating such links.

Andy Kumiega of the Stuart Graduate School of Business described hybrid applications using Matlab for serious matrix manipulation and Excel for the user interface and data integrity control. For example, Excel cannot handle matrices bigger than 52x52 but Matlab’s interface is text-based.

My own paper ( http://www.sysmod.com/psp2.pdf) titled Facing the Facts argued for making visible the cost of rework. Track the time (and therefore the cost) spent on each phase of spreadsheet development, distinguishing when you are creating new work, or correcting existing work. You may be surprised to discover that more than half your time is spent on rework, on correcting defects. It would then be good engineering practice to trace the causes of these defects back to their root and put in earlier checks to prevent and detect errors. For example, it is much less costly to catch a specification or design mistake by early proofing and review, rather than having to unwind it after implementation, with all the knock-on effects on dependencies of such late changes.

David Chadwick of Greenwich University presented a questionnaire on a Minimum Generic Skill Set for spreadsheet development. The first such list of testable skills that I know of was created by the European Computer Driving Licence Foundation in their syllabus Spreadsheet Check and Control. My book of that title is based on the 47 points in the syllabus. David Chadwick expands on this to include teaching practices such as auditing models seeded with errors.

Kath McGuire of SmallSpark asserted that training based on the tasks that people need to accomplish in their context is more effective than generic classroom-type training. People learn more effectively when they are told what they need to know, at the time when they need to know it, in the situation where the need arises. In such cases, the productivity benefits are immediate and obvious.

Simon Thorne of the University of Wales Institute Cardiff (UWIC) had several distinctive points in his presentation. Firstly, the topic Example Driven Modelling was novel enough, but secondly he provided statistical analysis of the significance of the results found, which is welcome whenever we get it. For classification problems (ie grading) people find it easier to provide examples than to create spreadsheet formulas to do the classification. Such examples can be fed into neural network software so it can learn how to classify attributes where the answer is not yet known. (Of course, such an approach does not suit computational algorithms which are the more common uses of spreadsheets.) As an example of a classification spreadsheet, he cited the Cardiac Anaesthesia Risk Evaluation (CARE) model from the Medical Algorithms website medal.org. His analysis revealed some clearly incorrect results from its formulas, which may have been automatically generated.

Mukul Madahar also from UWIC described various measures of risk and impact, including urgency. (One of the classic problems in time management is to distinguish between the important and the urgent. Procrastination or short-term focus on personally interesting tasks has the effect of making all important work also urgent!)

Philip Howard of Bloor Research presented an overview of the types of software tools in the marketplace: auditing, automation, control and compliance. Along with the presence of the four other vendors (Compassoft, SecureXLS, ClusterSeven, and Prodiance) that is an indicator of the maturing of the market.

Derek Flood of Dundalk Institute of Technology (DKIT) reported on the limitations of voice control technology that experimental subjects encountered when using it to debug spreadsheets. When entering a formula the Dragon software does not adapt the vocabulary in order to offer relevant keywords first. It uses the same vocabulary in all situations so it persistently offered "sun" as opposed to "sum" during the auditing process.

Brian Bishop also of DKIT investigated end-user behaviour while correcting a bebugged spreadsheet. His novel approach was to record the timing of their navigation and editing. This gave some insight into patterns of scanning and fixing, which would be worth more research.

David Colver let us in on one of the secret weapons in the Operis armoury: inclusion/exclusion analysis. For efficiency in his final review of the work of the analysts, he has client spreadsheets transformed into a standardised form. As part of that institutionalised process, the analysts rearrange the data to make explicit which items are included and which excluded from key bottom line figures such as IRR and financial ratios. This is a deceptively simple but powerful method to raise questions about the appropriateness of the choices.

Tom Grossman of the University of San Francisco in his paper "Source Code Protection for Applications Written in Microsoft Excel and Google Spreadsheet" discussed the different possibilities of spreadsheet protection available in Excel and in Software as a Service (SaaS).

Patrick Kemmis and Giles Thomas of Resolver Systems described their Python-based spreadsheet that unusually has both a grid and a code tab in view at the same time.

Jocelyn Paine of Spreadsheet Factory pushed the boundaries of spreadsheet deconstruction with his tutorial on modularity, and a paper on Semantic Wikis and Literate Programming. The latter featured an economical and elegant way of writing documentation with embedded code, contrasted to the usual approach of code with embedded comments.

The organisation of the conference received favourable comments from the attendees. All the speakers kept within their time slot and had time for questions.

At the AGM, the new committee was elected, with Grenville Croll as the new chair (I stepped down after being four years in the job):

David Ball (Treasurer), University of Wales Institute Cardiff
Ray Butler (Previous Eusprig chair), Highways Agency, UK
David Chadwick (Previous Eusprig chair), University of Greenwich, UK
Pat Cleary (Secretary and Conference Organiser), University of Wales Institute Cardiff, UK
David Colver (Member), Operis, UK
Grenville Croll (Chair), Spreadsheet Engineering, UK
Roland Mittermeir (2004 organiser), University of Klagenfurt, Austria
Simon Murphy (Member), Codematic Ltd, UK
Patrick O'Beirne (Member), Systems Modelling Ltd, Ireland
Jocelyn Paine (Member), Virtual Worlds Ltd, UK
Simon Thorne (Member), University of Wales Institute Cardiff, UK
David Ward (Member), Baker Tilly, UK

The Bug Hunts

There were two tests. The first was a simple costing spreadsheet with seven unique formulas and three errors.

Of the ten results returned after ten minutes: Four found all three defects completely correct. Four more found two. An omission defect was found by six. An implementation defect was found by seven. Another implementation defect (interpretation of the spec) was found by eight. Another one noticed something was wrong but did not clearly state what it was.

Overall, 21 of the 3*10 defects were found, an average of 70% defect detection efficiency. The hardest one to find, as predicted by Ray, was the omission defect at only 60%.

The second was a tricky spreadsheet with four unique formulas only one of which was in a cell, the other three were in defined names. There were seven errors and one weakness (lack of input error trapping)

Of the fourteen results returned after one hour:

Eleven managed to unprotect the worksheet (it had no password)
Ten said it should check for negative / blank / zero inputs
Seven that it should check for whole numbers (* see below)
Six found the logic errors
Two said it should trap for non-numeric input
None reported the spelling mistake thrown in for an easy mark; like other professionals in such studies, they didn't regard it as material.

As six found the logic errors, I'll rate that as a 6/14 or 43% defect detection efficiency.

*Postscript: There were two clear winners who achieved 74% marks. The preferred method of testing for whole numbers is INT(x)=x. The other way is MOD(x,1)=0 which fails if the ratio between the number and the divisor is greater than 2^27 In other words MOD(134217728,1) returns #NUM!. For more information, see http://support.microsoft.com/kb/119083

Press / web coverage

http://www.accountingweb.co.uk/cgi-bin/item.cgi?id=170552 Auditors praised by FSA for improving spreadsheet compliance

http://www.accountingweb.co.uk/cgi-bin/item.cgi?id=170555 'Access is the new Excel' says Simon Murphy

http://www.accountingweb.co.uk/cgi-bin/item.cgi?id=171091 Spreadsheet error research: Wasted time worse than mistakes.

http://smurfonspreadsheets.wordpress.com/2007/07/14/eusprig-2007-3 Simon Murphy's blog

http://www.accountingweb.co.uk/cgi-bin/item.cgi?id=171111 Corporate strength spreadsheet compliance with online eXpresso system. Set up by the data integration specialist SmartDB, the eXpresso website will host collaborative workspaces for Excel users in secure communities. eXpresso users can upload and share their worksheets, track changes and compare versions online. The service is designed to help companies comply with Sarbanes-Oxley internal control requirements and reduces the need to email around sensitive information, the developer claims.

Spreadsheet Check and Control: 47 best practices to detect and prevent errors

http://www.sysmod.com/az.php?a=190540400X&b=Spreadsheet+Check+Control Available worldwide from Amazon.

http://sysmod.buy.ie/catalog/product_info.php?products_id=188 Our offer - free shipping to EU .

_______________________________________________________
_______________________________________________________

FEEDBACK

Simply send your comments to FEEDBACK (at) SYSMOD (dot) COM

Thank you! Patrick O'Beirne, Editor

_______________________________________________________ _______________________________________________________

5) Off Topic

Spreadsheets Considered Harmful :-)

http://wbsci.org/Miscellaneous/34183/Spreadsheets-Considered-Harmful.html

Read it to the end ... especially the end of the page :-)

http://pdos.csail.mit.edu/scigen/
One useful purpose for such a program is to auto-generate submissions to conferences that you suspect might have very low submission standards.

And of course, there is a tool for the opposite test:

http://montana.informatics.indiana.edu/cgi-bin/fsi/fsi.cgi
Paste any text in the textbox. The chance that your submission is a human-written authentic scientific document will be output. Text over 50% chance will be classified as authentic.

http://technology.guardian.co.uk/weekly/story/0,,2134304,00.html

"There is a sadness when ideas are not rewarded. The inventor of the modern spreadsheet, Dan Bricklin, has to make money from lecturing nowadays, because he never patented his idea of laying out information on a grid, and his spreadsheet, Visicalc, was overtaken by later programs which ran on faster PCs, such as Lotus 123."

_______________________________________________________
_______________________________________________________

Copyright (c) Systems Modelling Limited, http://www.sysmod.com . Reproduction allowed provided the newsletter is copied in its entirety and with this copyright notice.

We appreciate any feedback or suggestions for improvement. If you have received this newsletter from anybody else, we urge you to sign up for your personal copy by sending a blank email to EuroIS-subscribe (at) yahoogroups (dot) com

For those who would like to do more than receive the monthly newsletter, the EuroIS list makes it easy for you to discuss issues raised, to share experiences with the rest of the group, and to contribute files to a common user community pool independent of the sysmod.com web site. I moderate posts to the EuroIS list, to screen out inappropriate material.

Patrick O'Beirne, Editor
_______________________________________________________
ABOUT THIS NEWSLETTER
"Praxis" means model or example, from the Greek verb "to do". The name is chosen to reflect our focus on practical solutions to IS problems, avoiding hype. If you like acronyms, think of it as "Patrick's reports and analysis across Information Systems".
Please tell a friend about this newsletter.
We especially appreciate a link to www.sysmod.com from your web site!
______________________________________________________
ARCHIVES
To read previous issues of this newsletter please visit our web site at http://www.sysmod.com/praxis.htm

DISCLAIMER
This newsletter is prepared in good faith and the information has been taken from observation and other sources believed to be reliable. Systems Modelling Ltd. (SML) does not represent expressly or by implication the accuracy, truthfulness or reliability of any information provided. It is a condition of use that users accept that SML has no liability for any errors, inaccuracies or omissions. The information is not intended to constitute legal or professional advice. You should consult a professional at Systems Modelling Ltd. directly for advice that is specifically tailored to your particular circumstances.
_______________________________________________________
PRIVACY POLICY:
We guarantee not to sell, trade or give your e-mail address to anyone.
To subscribe to this Newsletter send an email to
EuroIS-subscribe (at) yahoogroups (dot) com
To unsubscribe from this Newsletter send an email to
EuroIS-unsubscribe (at) yahoogroups (dot) com
EuroIS is the distribution list server of the PraxIS newsletter. It also offers a moderated discussion list for readers and a free shared storage area for user-contributed files. The archives of this group are on YahooGroups website http://finance.groups.yahoo.com/group/EuroIS/
_______________________________________________________

PraxIS Aug. 2007

IN THIS ISSUE

Welcome to PraxIS

1) IT Risk & Security

Healthcare IT Risk

Spreadsheets latest spam message carriers

2) Quality

An example of a spreadsheet data quality problem

3) Spreadsheets

Notes from the Eusprig 2007 conference

The Bug Hunts

Press / web coverage

Spreadsheet Check and Control: 47 best practices to detect and prevent errors

FEEDBACK

5) Off Topic

Spreadsheets Considered Harmful :-)