PraxIS August 2006

06-08 Contents: Poison Data, Error Analysis article, Information Quality, Software Testing, Conference reports

ISSN 1649-2374 This issue online at http://www.sysmod.com/praxis/prax0608.htm  [Previous] [Index] [Next]

Systems Modelling Ltd.: Managing reality in Information Systems - strategies for success  

IN THIS ISSUE

1) Risk & Security
     Poisoned data and code
2) Software quality
     What is an error?
3) Quality and Testing events
     European DM/IQ conference
     SoftTest Ireland, afternoon 5th Sept: Test automation, Model Based Testing
4) Spreadsheets
     Eusprig 2006 Conference, Cambridge, UK, July 5-7
     London Excel User Conference July 19-21
     Bonus materials for owners of  'Spreadsheet Check and Control'
5) Off Topic
     Unusual articles on Wikipedia    
21 Web links in this newsletter
About this newsletter and Archives
Disclaimer
Subscribe and Unsubscribe information

_______________________________________________________

Welcome to PraxIS

This month, I ask for your opinion on how useful it would be to have a Eusprig accreditation of conformity to good spreadsheet practice.

I started my article on errors and mistakes with the intention of putting down some definitions to clarify some terms I see used loosely. It expanded to cover software engineering, testing, software craftsmanship, and more. Let me know if there is any aspect you would like to hear more about in future newsletters, or in detail in an in-company course.

Patrick O'Beirne

_______________________________________________________ _______________________________________________________

1)  IT Risk and Security

Poisoned data and code

A discussion on the Excel-L list recently about protecting code included the suggestion to embed some apparently important but actually non-functioning code in an application, so that if copying was ever suspected, the presence of that code would be the 'smoking gun'. However, it does rather depend on the copy being found. Perhaps the 'Easter Eggs' in some commercial software could be an elaborate form of trap as well as a sign that the development team were getting bored.

A Nihilartikel is a fake article placed in a dictionary or encyclopedia either to detect plagiarism (rather easier in the printed world) or simply as a puzzle for the cognoscenti.

A Trap street in maps is a street that is deliberately out of scale or nonexistent, again to detect unauthorised reproductions. It should not, however, lead legitimate users of the map astray.

Honeytokens can exist in almost any form, from a dead, fake account to a database entry that would only be selected by malicious queries, making the concept ideally suited to ensuring data integrity—any use of them is inherently suspicious if not necessarily malicious. http://en.wikipedia.org/wiki/Honeytoken

Dictionary of Information Security

http://www.sysmod.com/az.php?a=1597491152&b=Dictionary+of+Information+Security Rob Slade wrote this as a resource for those studying for the CISSP exam.

____________________________________________________________
____________________________________________________________   

2) Software Quality

What is an error?

That apparently innocent question was at the base of some discussion at the Eusprig conference and Excel training classes mentioned below. Taxonomies (classifications) of spreadsheet errors have been presented in journals and at Eusprig conferences that classify:

The problem I have with these is that they focus on errors as things that exist in spreadsheets and therefore on detection. My interest is in what to do to prevent these errors, so we should be looking at the activities that create them or surround their use. This article is intended to separate cause and effect, actions and outcomes, errors and defects. For more detail, see Watts Humphrey's books and articles on the software development process.

Errors are mistakes made by people performing activities

It is a fact universally acknowledged that people make mistakes. To err is human – about 0.5% of mechanical actions are errors. In the world of software development (of which spreadsheet development is but one example) the complex logic that we grapple with pushes the error rate up to 5%. If you want more data, the standard reference is the Human Error Research website of Ray Panko at: http://panko.cba.hawaii.edu/HumanErr/ 

What activities are we talking about?

The data processing of which the spreadsheet is but one part is the means by which the decision maker arrives at their decision. The problem owner has to cope with mistakes in every part of the process - the choice of which problem to solve, who says it's a problem, the feasibility of any solution, the definition, selection, and collection of data even before it gets to processing, the methods of assessment, the time constraints, the political and social agendas, the competence of those involved, the frequency with which the processing is repeated, and so on, are all activities in which humans are involved and can make mistakes. All activities take place over time which therefore determines how many opportunities one has to improve the process, but also means that the problem environment may change, the people involved may change, and so on. In this article, I'll only focus on spreadsheet development. To expand it requires writing a book, which indeed may happen!

Errors (mistakes) in development inject defects (faults)

These errors create defects in the code, also called faults. In programming, syntax errors are detected immediately by the compiler, and therefore do not escape past the programmer. Depending on the language, data type errors may only be visible at execution time. In spreadsheets, these are the immediately recognisable #VALUE errors. Other faults such as logic errors can escape the developer. The important question is, when did the error happen - at what stage in the process? Only then do we know what mistakes are being made, and when is the soonest they can be prevented.

Faults (defects) cause failures - eventually

A failure happens when the defective code is executed. Therefore, one fault may give rise to many incidents of failure. But only if the conditions are met for that piece of code to be executed. One may think that in spreadsheets all logic is always recalculated, but the presence of any conditional functions (IF, AND, OR, CHOOSE, INDIRECT, MIN, MAX) means that some calculation dependency chains are only followed for certain data values.

Failures have a cost - to some extent

The cost of a failure is its impact, which can be a loss of time, money, reputation, or even health or life in the case of safety-critical and clinical applications. The rating of the impact may depend on who is doing the evaluation, and whose interest is served by the disclosure. To be objective, the assessment of cost needs to be done by all stakeholders, either directly or by proxy. For example, an internal test team may regard itself as the customer's proxy.

The classic mistake of project sponsors is to weight the benefits of completion so heavily that they do not consider the costs of failure. The psychological aspects of 'damn the torpedoes' leadership, organisational behaviour and groupthink, are outside the scope of this article.  Project delay introduced by prolonged testing may cost more than the simple personnel cost of testers. To reduce this, replace an inefficient reliance on late testing by more up-front testing of requirements, specification, design, and construction quality.

When can failures be detected?

Because errors are inevitable, a mature organisation (or developer) checks their work. The degree of formality of the checking, when it is done, and by whom, should be related to the cost of failure at each stage of development. A failure is defined by a deviation from desired behaviour. Therefore there must be some definition of what is expected, in terms of requirements, specifications, designs, coding standards, and test cases.

The earlier that faults are detected, the cheaper it is to fix them, because less work has to redone. The most expensive time is after they escape into use, when each failure can have consequential costs that could far exceed the costs of prevention. Failures are not always immediately visible, or if visible are not always recognised, so costs can accumulate. Therefore the purpose of testing is make faults visible. Good practice in constructing software is to build in self-testing features that make faults visible as early as possible, and to make testing easier. This 'defensive programming' is a more careful and considered approach that trades off quick development for more efficient testing and safer use.

Errors made in specification are usually the most expensive because effort is wasted in building the solution to the wrong problem. They can be reduced by asking questions, by not making assumptions about what is required.

Errors made in design are also expensive because effort is wasted in building the wrong solution to the problem. Misunderstandings can be reduced by software verification, which checks that requirements, specifications and design all correspond. This is not to say that you must in all cases do Big Design Up Front (BDUF). Rather, this is to recognise that there are always many ways to achieve a given behaviour in software. Design=Choices. It depends on the range of features available in the software available (eg simple Excel, VBA, XLA or XLL add-ins etc), and on the breadth of experience and training of the developer. Because of the amateur and self-taught nature of many Excel users, they find it difficult to know how to do better, it's as hard as pulling themselves up by their bootstraps. Call upon professional reviewers to speed up the evaluation of code quality, to replace hacky, laborious, fragile, complex, and error-prone code by understandable, maintainable, efficient code written with software craftsmanship. Such experts are readily available - just look at the list of names of trainers and developers presenting at Eusprig and the Excel User Conference, and the Microsoft Valued Professionals on the many Excel lists.

Excel provides instant feedback on syntax errors. Structural analysis tools can identify weaknesses that are recognised by experienced developers as latent problems, likely to trip someone up in the future. For example, hard coded constants in formulas, inconsistent formulas, or tables that can be corrupted by insertions or deletions elsewhere in the sheet.

The final test is to execute the software, to validate whether it meets the needs of the user. Execution testing is an expensive way to find defects. Not just because rework is expensive, but also because by this stage the software may have reached a level of complexity such that the question 'when do we stop testing' is only answerable by 'when we are out of time'. For that reason, proponents of Agile methodologies recommend Test-Driven Development, requiring the developer to create test cases before they even write the code. This causes the developer to check their understanding of what the code is required to do.

 Whether testing can make the latent defects visible depends on how well chosen the test cases are. Here is a sample question from the aptitude test for software testers provided by Grove Consultants: 'You have run your tests and they all pass. Is that good news?' A novice tester will say yes; an experienced tester will say that it depends on the quality of the tests. Software testing is a whole discipline in itself. To learn more, join your local testing special interest group (SIG) such as SoftTest Ireland or the BCS SIGIST.

Does a tree falling in the forest make a noise if nobody hears it?

That classic question in beginning philosophy has a counterpart here: if a spreadsheet has a failure but it is not detected, is it an error? If the failure is never detected over the lifetime of the spreadsheet, it remains a philosophy (or maybe an ethics?) problem. But when it is seen, there are various possible outcomes.

1) The fault is discovered but it has insufficient impact. For example, an unimportant spelling mistake.

2) The fault can be corrected before any loss is incurred. This is the ideal state to which most review and testing efforts are directed, as discussed below.

3) A loss is recognised, in time to recover. Note that the loss can be to any party involved. There is the alleged story of a public utility that accidentally overcharged consumers; when the problem was discovered, it was politically unsafe to admit to it, so the following year they trumpeted their success in achieving economies and lowered the charges.

4) An actual loss is realised and acknowledged, although too late to do anything about it. Example: the Transalta $24M loss

5) A loss is realised but the stakeholders are not informed. The further outcome then depends on whether the story is discovered, or leaked, and what means of redress are available. This brings up some interesting ethical problems about whistleblowers and responsibility. 

6) The correct answer is redefined to be whatever the spreadsheet result is. This may seem back-to-front but some legal contracts specify it in order to achieve finality in a negotiation constrained by a deadline. In fact, this is simply a more drastic version of case (1). The fault may be recognised, but it is tolerated because the cost of rework in correcting it, or the opportunity cost of the delay introduced, is greater than the cost of accepting it and proceeding.

A failure is detected - what do you do about it?

The fault that gave rise to the failure can be corrected. Depending on the choice made as to how to fix the fault, the degree of change involved, this may be more or less time-consuming and error-prone and it will require regression testing to make sure that no other fault has been created as a side effect.

Rather than throwing away the information on the defect once it is corrected, a mature organisation or developer will track the occurrence of defects, their cause, and the time it takes to resolve them. An analysis of faults can show what activity can benefit from improvement - requirements gathering, specification and design, coding standards, testing effectiveness, user training, and so on.

Root cause analysis can be done by a fishbone (ishikawa) diagram. By iteratively asking 'why did this happen', the real cause of the problem can be found and corrective measures taken.

In summary

Mistakes are inevitable, so we must check for them as near to the point of injection as possible. To do this, we need to know what kind of mistakes can happen in each activity. Training courses and good books teach the generic types of mistake that can happen. For each project, retain a list of the defects that were found, especially the failures that escaped into final testing - or worse - into use. Analyse each defect to find why it occurred, and for the root causes, plan how to improve both the activity and the controls around that activity. This system of continuous improvement is what gains speed and saves money over time.

Related books

http://www.sysmod.com/az.php?a=0201548097&b=Introduction+to+the+Personal+Software+Process Watts Humphrey's Introduction to the Personal Software Process

http://www.sysmod.com/az.php?a=0201733862&b=Software+Craftsmanship  Software Craftsmanship. Pete McBreen uses the metaphor of apprentice/journeyman/master

http://www.sysmod.com/az.php?a=0321146530&b=Test+Driven+Development  Kent Beck's classic book Test Driven Development: By Example

____________________________________________________________
____________________________________________________________

3) Quality and Testing events

European DM+IQ conference

The IRM Data Management and Information Quality Conference will be held from 30th October to 2nd November 2006 in the Victoria Park Plaza Hotel in London. I am presenting on 1st November on 'Minimizing risks in IQ spreadsheets'. You can get in for £100 less than the advertised price by just citing me as the reference.

http://www.irmuk.co.uk/dm2006

Related book

http://www.sysmod.com/az.php?a=0471253839&b=Data+Information+Quality  Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits, by Larry P. English

 

SoftTest Ireland, afternoon 5th Sept: Test automation, Model Based Testing

http://www.softtest.ie/events/  5th Sept 2006, 2-5pm at the Holiday Inn Hotel, Pearse Street, Dublin 2

In this SoftTest Ireland educational event, members will learn from two international experts: Mark Fewster of Grove Consultants (UK) and Harry Robinson of Google (US).

Mark Fewster of Grove Consultants on 'Keyword Driven Test Automation Illuminated'.

Test Automation has come a long way over the past 20 years. Many organisations that have been successful with test execution automation have employed a data-driven approach. This in turn has been developed further into more sophisticated approaches that are often termed keyword-driven (though other terms are used).

Anyone involved with test execution automation should find this presentation of interest. For those with little or no experience of keyword-driven approaches then the presentation will give a solid understanding of what can be achieved and in outline, how. For those already familiar with keyword-driven approaches then the presentation will bring greater clarity and possibly some new ideas for how they can improve their own keyword-driven approaches.

Harry Robinson of Google on 'Model-based Testing'

Software is getting more complicated, and the standard methods of testing it are not doing a thorough job. Manual testing is labour-intensive and does not always find the deeper, serious bugs in an application. Traditional test automation, on the other hand, is costly and fragile. This is where model-based testing comes in, the idea of automatically generating tests from models of an application's expected behaviour.

This presentation will focus on how to do model-based testing, and how test generation is changing the world of software quality. What is model-based testing? Why is it so different from most testing today? Where is model-based testing pushing the industry, and where will it end up? And, most importantly, how can you prepare yourself and your team for the changes that are coming?

Attendance is free to all members and guests are welcome. To learn more about the speakers and topics visit:- http://www.softtest.ie/events/  To register to attend contact:05394 22294 or 

Related books

http://www.sysmod.com/az.php?a=0201331403&b=Software+Test+Automation  Software Test Automation by Mark Fewster and Dorothy Graham

http://www.sysmod.com/az.php?a=0672327988&b=Software+Testing  Software Testing (2nd Edition) by Ron Patton

____________________________________________________________
____________________________________________________________

4) Spreadsheets

Notes from the Eusprig 2006 Conference, Cambridge, UK, July 5-7

http://www.uwic.ac.uk/eusprig/2006/index.htm The seventh annual conference and AGM of the European Spreadsheet Risks Interest Group ( www.eusprig.org ) theme was Managing Spreadsheets: Improving corporate performance, compliance and governance.

Of local interest is that Alan Rust and Brian Bishop from Kevin McDaid's team at the Software Technology Research Centre of the Dundalk Institute of Technology in Ireland won the student prize for their paper on Agile techniques with an emphasis on Test-Driven Development. The prize for best student paper is sponsored by the ISACA Northern England Chapter.

The keynote address from Dean Buckner of the UK Financial Services Authority (FSA) highlighted that the risks around the uncontrolled development and use of spreadsheets are now being recognised. Some financial institutions are accepting the de facto strategic use of spreadsheets and are no longer trying to assert that they will switch from spreadsheets to a 'big solution'. This recognition is leading to the development of the relevant risk mitigation and management procedures within these organisations. Nonetheless, the FSA visits continue to find a lack of good training and internal auditing skills, in spite of the good effect of Sarbanes-Oxley compliance. SOX only applies to financial reporting, and his interest includes risk reporting and operations. He says "I am still seeing 'dumb solutions' that training would have easily prevented." The bad news, from his point of view is that there is no industry view on good practice, and no accreditation available. He sees the biggest problem as the 'hacky' use of spreadsheets for data processing and no interest in good data quality. (See above for news of the Information Quality conference). He points out that an over-emphasis on 'spreadsheet errors' overlooks the control environment in firms.

Tom Grossman of the School of Business and Management at the University of San Francisco gave the after-dinner speech and presented the student prizes. He also stressed the point that Eusprig should be more in the forefront of fixing the problem. After all, it has done a very good job for seven years now of raising the issue and proving a forum for researchers to investigate the many causes and propose solutions. The basic cause is, of course, human error, which I discuss above.

The question is, and this is where I'd like to ask for feedback from you, what can be done by an independent body of academics and practitioners to be more focused on the solution?

Education

Tools & Solutions

So, in summary, it looks to me like there are plenty of solutions and methods available if people want to use them. So, let's look at the accreditation point raised by Dean Buckner. Should Eusprig create recommended guidelines for spreadsheet development? Is it possible to have products or services accredited as conforming with these guidelines, in the same way as, for example, companies can be certified to conform to TickIT or ISO9000 for software development? Accreditation does not come cheap. Taking the test for the BASDA standards for VAT compliance and Euro Conversion compliance cost the software companies about UK£30,000, roughly EU€45,000 or US$60,000. And that was for well-defined legally prescribed calculation methods as implemented in accounting software. An audit based on the following of general principles could give rise to much argument as to how well the criteria were met, or how relevant they were to the target market of the product concerned, and any rating for other features important to some, but not all, marketplaces. Nonetheless, is conformity to a Eusprig standard something you would look for and respect in a product or service, and would it affect your buying decision? What do you think? Email me to let me have your opinions.

 

Notes from the London Excel User Conference July 19-21

http://www.exceluserconference.com/2006UKEUC.html 

This was a very practical conference focused on Excel techniques. Many speakers have earned their MVP (Microsoft Valued Professional) helping people freely on the many Microsoft-sponsored and independent Excel mail lists and online discussion forums. Simon Murphy, who also presented at Eusprig 2006, gave a class on spreadsheet design and good practice in VBA. Andy Pope gave a dazzling display of chart types that I never knew you could achieve, showing along the way how much better Excel charts look when you delete that grey background. Bob Phillips and Martin Greene showed advanced Excel and custom functions, VBA and add-ins. Nick Hodge excelled in database processing and pivot tables. Charles Williams disclosed some of the speed techniques behind his FastExcel product, claiming that he does not have a spreadsheet that takes more than ten seconds to recalculate. My sessions were on preventing and detecting spreadsheet errors.

The 2006 West Coast Excel / Access User Conference will be in October 25-27th at the Marina del Rey Hotel in California.

See http://www.sysmod.com/spreads.htm#Books for a list of recommended books on Excel

 

Bonus materials for owners of  'Spreadsheet Check and Control' 

As the first anniversary of my book approaches, I am making available some expanded material in response to requests for more detail:

1) A 303K 11 page PDF: Understanding the recalculation mode, Lookup and Transition Formula Evaluation, Pie charts with negative data, Using Excel Scenarios for test cases, Comparing worksheets.

2) An expanded chapter on Data Validation, 16 pages, 468K PDF..

3) Bonus material outside the scope of the ECDL syllabus. Mainly VBA examples,16 pages, 320K PDF.

To download, please have the book to hand in order to enter a password from a page and then visit:

http://www.sysmod.com/sbp 

Links to buy the book:

http://sysmod.buy.ie/catalog/product_info.php?products_id=188  Our offer - free shipping to EU in August 2006.

http://www.sysmod.com/az.php?a=190540400X&b=Spreadsheet+Check+Control Available worldwide from Amazon.

 

ScanXLS finds the links between many Excel workbooks in directories

http://www.sysmod.com/scanxls.htm  SCANXLS is my Excel utility to scan directories and create an inventory of spreadsheets. It also builds a cross-reference of their dependencies, and helps assess their quality. Many programs will show the links IN (ie TO) a spreadsheet; SCANXLS is one of the very few tools in the marketplace that inspect entire directories and construct a list of XLS files that are found to have links FROM other files.

_______________________________________________________
_______________________________________________________

FEEDBACK

Simply send your comments to FEEDBACK (at) SYSMOD (dot) COM

Thank you! Patrick O'Beirne, Editor

_______________________________________________________ _______________________________________________________

5) Off Topic

Diverting August reading

http://en.wikipedia.org/wiki/Wikipedia:Unusual_articles

Unusual places and place names, numbers and dates, language, inventions, popular culture and urban myths, politics, religion, wars, and death. A grab bag of candidates for 'Would you believe' articles.

http://en.wikipedia.org/wiki/Culture_jamming

An example of culture jamming is the Bubble Project a street art movement which involves placing empty "speech bubbles" on posters and advertisements.

http://en.wikipedia.org/wiki/SCIgen

SCIgen From Wikipedia, the free encyclopedia Jump to: navigation, search SCIgen is a program that randomly generates nonsense in the form of computer science research papers, including graphs, figures, and citations. It uses a custom-made context-free grammar to form all elements of the papers.

In 2005, a paper generated by SCIgen, Rooter: A Methodology for the Typical Unification of Access Points and Redundancy, was accepted as a "non-reviewed" paper to the World Multiconference on Systemics, Cybernetics and Informatics (WMSCI 2005), and the authors were invited to speak. The authors of SCIgen described their hoax on their website, and it soon received great publicity when picked up by Slashdot.

 

_______________________________________________________
_______________________________________________________

Copyright 2006 Systems Modelling Limited, http://www.sysmod.com . Reproduction allowed provided the newsletter is copied in its entirety and with this copyright notice.

We appreciate any feedback or suggestions for improvement. If you have received this newsletter from anybody else, we urge you to sign up for your personal copy by sending a blank email to   EuroIS-subscribe (at) yahoogroups (dot) com - it's free!

For those who would like to do more than receive the monthly newsletter, the EuroIS list makes it easy for you to discuss issues raised, to share experiences with the rest of the group, and to contribute files to a common user community pool independent of the sysmod.com web site. I will be moderating posts to the EuroIS list, to screen out inappropriate material.

Patrick O'Beirne, Editor
_______________________________________________________
ABOUT THIS NEWSLETTER
"Praxis" means model or example, from the Greek verb "to do". The name is chosen to reflect our focus on practical solutions to IS problems, avoiding hype. If you like acronyms, think of it as "Patrick's reports and analysis across Information Systems".
Please tell a friend about this newsletter.
We especially appreciate a link to www.sysmod.com from your web site!
______________________________________________________
ARCHIVES
To read previous issues of this newsletter please visit our web site at http://www.sysmod.com/praxis.htm

DISCLAIMER
This newsletter is prepared in good faith and the information has been taken from observation and other sources believed to be reliable. Systems Modelling Ltd. (SML) does not represent expressly or by implication the accuracy, truthfulness or reliability of any information provided. It is a condition of use that users accept that SML has no liability for any errors, inaccuracies or omissions. The information is not intended to constitute legal or professional advice. You should consult a professional at Systems Modelling Ltd. directly for advice that is specifically tailored to your particular circumstances.
_______________________________________________________
PRIVACY POLICY:
We guarantee not to sell, trade or give your e-mail address to anyone.
To subscribe to this Newsletter send an email to
EuroIS-subscribe (at) yahoogroups (dot) com
To unsubscribe from this Newsletter send an email to
EuroIS-unsubscribe (at) yahoogroups (dot) com
EuroIS is the distribution list server of the PraxIS newsletter. It also offers a moderated discussion list for readers and a free shared storage area for user-contributed files. The archives of this group are on YahooGroups website http://finance.groups.yahoo.com/group/EuroIS/
_______________________________________________________