1) Risk & Security
     Euro conversion looms again
     Testing software and electronic calculators
     Stock spam pays off if they're quick


2) Data Quality
     Review of Tufte's 'The Visual Display of Quantitative Information'
     European DM+IQ conference, London, Oct 30-Nov 2


3) Spreadsheets
     Excel 2007 databars misrepresent zero
     More rave reviews and more downloads for 'Spreadsheet Check and Control'
     Free books on Excel


4) Off Topic
     C More or Less


Welcome to PraxIS

This month, I do a reader's digest of Edward Tufte's book on graphics, without using a single graphic. You'll have to buy the book to get them!

Patrick O'Beirne

_______________________________________________________ _______________________________________________________

1)  IT Risk and Security

Euro conversion looms again

It's nearly that time again, for the recently joined members of the EU. First up is Slovenia:

From 1 January 2007 the euro will replace Slovenia's currency, the tolar, at the fixed and irrevocable conversion rate of 239.640 tolars for one euro. The European Commission reported that the practical preparations for the introduction of the euro in Slovenia (population: 2 million) were at an advanced stage, but the final preparations must be speeded up to ensure a smooth changeover and to address citizens' fears about price increases. Slovenia became member of the EU in May 2004 together with nine other countries. Most of the other countries wish to join the euro area between 2008 and 2010.

Testing software and electronic calculators for conversion accuracy

Back in 2001, we helped a public agency avoid the potentially expensive error of selecting a low-cost but inaccurate euro calculator for public distribution. My attention was drawn recently to an official  web site where I found an incorrect conversion rate - a simple typo, but they just had not checked for the crucial rounding boundaries in their test cases.

A software tester needs to be like a sculptor or diamond cutter looking for the flaw line where a small tap cracks the whole thing open. You can purchase from us very detailed test data sets based on the mathematics of the conversion between the euro and the national currency.

The Systems Modelling euro certification scheme was launched in August 2000. It follows the guidelines of ISO/IEC 12119:1994 - "Information Technology - Software packages - Quality requirements and testing". The test cases cover the effects of changing scales of value, detecting the use of inverse rates, rounding, truncation, accounting with master and detail records, and account and base conversion in the transition scenario. They also assess reliability, error protection, audit trails, user interface (usability) issues, documentation, and support. The product will be certified to have fulfilled specified conditions. This is not a certification of the producer's business, their quality system or their software production process. For further information, contact me.

Related links Frequently Asked Questions about Euro conversion  Euro conversion calculator using the official daily ECB exchange rates  One day workshop on converting IT systems to the euro UK Treasury business factsheets "the euro: it's your business" updated July 2006.


Stock Spam pays off if they're quick "Spam Works: Evidence from Stock Touts and Corresponding Market Activity" by Laura Frieder and Jonathan Zittrain Berkman Center Research Publication No 2006-11 (July 2006) 

"We suggest that the profitability of spammed stock touting calls for adjustments to securities regulation models that rely principally on the proper labeling of information and disclosure of conflicts of interest in order to protect consumers. Based on a large sample of touted stocks listed on the Pink Sheets quotation system, we find that stocks experience a significantly positive return on days when they are heavily touted via spam, and on the day preceding such touting. Returns in the days following touting are significantly negative. Investors who respond to touting are losing, on average, 5.25% in the two day period following touting."

I attempt to filter such spam in Eudora by applying a filter that moves emails flagged by Mailscanner with SARE_GIF_STOX to Trash. I tried HTML_IMAGE_ONLY_0 but found that it caught too much real mail from people who don't know better.

Related Book  Frauds, Spies, and Lies: and How to Defeat Them by Fred Cohen, 2005


2) Data Quality

Review: The Visual Display of Quantitative Information, by Edward R. Tufte

Amazon ranks this in the best 100 non-fiction books of the 20th century. It's a classic, which usually means that it is more frequently referenced than read. Because established publishers seemed "appalled at the prospect that an author might govern design", Edward Tufte remortgaged his home to pay for self-publishing this volume. He hired a first-rate book designer to ensure the book layout exemplified his own principles, such as eliminating the usual separation of text and image.

Part 1, the first third of the book, deals with graphical practice, especially integrity.

Chapter 1,'Graphical Excellence' illustrates how graphics reveal patterns hidden in tables of numbers, providing examples down the centuries of the evolution of statistical X-Y charts, cartographic data maps, and time series. His most famous example is the chart by Minard showing the terrible shrinking of Napoleon's army in Russia on dimensions of army size, date, temperature and location. He gives five principles including "Graphical excellence consists of complex ideas communicated with clarity, precision. and efficiency."

Chapter 2,'Graphical Integrity' tackles deceptions such as distorted axes, areas, and non-comparable time periods. His funniest example is the Day Mines 1974 annual report that conceals the zero axis on the profit barchart so as to make the bars appear tall. He defines a 'Lie Factor' as the ratio of the size effect in the graphic to the size effect in the data, and takes newspapers to task for such misuse. He gives six principles for integrity: proportionality, labelling, variation, standardized units, dimensions, and quoting data in context. Of them all, this chapter should be required reading not just for creators of charts, but readers.

Chapter 3,'Sources of Graphical Integrity and Sophistication' attacks the idea that graphics are only for the unsophisticated reader. His analysis textbooks, newspapers and magazines by usage of relational (scatter) diagrams compare to non-explanatory charts ranks the Frankfurter Allgemeine on a similar position to Pravda.

Part 2 deals with the theory of data graphics. Essentially, he recommends maximising the amount of ink devoted to data and eliminating chart junk.

Chapter 4,'Data Ink' was put into practice at the Excel User Conference by Andy Pope who deleted the automatic grey background of an Excel chart every time he created one. He also told me about the Stephen Few book that I'll review at a later date.

Chapter 5,'Chartjunk' shows the awful cluttered effects of noise, shading, hatching, and even the grid. He lampoons the use of unnecessary graphical items as 'ducks', after a duck-shaped building that places form before function. (Although I think he may be losing a sense of humour there).

Chapter 6,'Data-Ink Maximization' begins with a wonderful quote by Ad Reinhardt about painting from his statement from an exhibition catalogue: "Clarity .. no noise ... no humbugging .. no mixing things up". Tufte strips down familiar presentations such as the box-plot, bar-chart, scatterplot down to their essentials. His dot-dash-plot is a little minimalist even for my austere tastes.

Chapter 7,'Multifunctioning Graphical Elements' describes some unusual ways to encode data into pictorial and verbal presentations, such as data-based coordinate lines. Shading and colours should of course be used to convey meaning and not create puzzles for the reader.

Chapter 8,'Data Density and Small Multiples' defines data density as the ratio of the number of data points to the area of the graphic. He says "The average published graphic is rather thin ... very few statistical graphics achieve the information display rates found in maps ... Graphics can be shrunk way down." True, but the results tend to give me a headache. The smallest graphics - not dealt with in this book - are 'sparklines' which I described last month.

Chapter 9,'Aesthetics and Techniques' discusses making complexity accessible, combining words, numbers, and pictures, with typography, colour, proportion, and scale. A table compares the attributes of friendly and unfriendly graphics.


Related books
The Visual Display of Quantitative Information: Edward R. Tufte. 2nd Ed, 2001. 
Show Me the Numbers: Designing Tables and Graphs to Enlighten by Stephen Few



European DM+IQ conference, Oct 30-Nov 2

The IRM Data Management and Information Quality Conference will be held from 30th October to 2nd November 2006 in the Victoria Park Plaza Hotel in London. I am presenting on 1st November on 'Minimizing risks in IQ spreadsheets'. You can get in for 100 less than the advertised price by just citing me as the reference.

Related book  Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits, by Larry P. English


3) Spreadsheets

Excel 2007 databars misrepresent zero

In the light of Edward Tufte's book, it is amusing to hear of the decision by Microsoft to deliberately distort Excel graphics on the grounds that it is what users want.  Juice Analytics - On misrepresenting data

"[What's wrong with this chart?] The graduated shading makes it hard to see where the bar graphs end. Zero pounds of sprouts were consumed, but the bar shows a value. The brussel sprouts badness is based on Microsoft's implementation of databars in the upcoming version of Excel. To quote the Excel 2007 blog: 'The answer is that when we were doing usability testing of this area in Excel, we found that users preferred not to see blank data bars, so Excel™s default was set to a 10% minimum width.' [...] the rest of us can use the in-cell graphing to do everything databars can do and more."


Are your spreadsheets error free? Look again!

Martin Green of reviewed my book 'Spreadsheet Check and Control' on Amazon:

'I hate proofreading my work. I know what's supposed to be there so that's what I see. But if a typo slips through the net there's no real harm done and someone else usually spots it and lets me know. But auditing a spreadsheet is a different matter. A simple error could easily go unnoticed and its consequences might cascade through the workbook without ever being discovered. So this book is a Godsend! It explains, clearly and with illustrated examples, how to design and build reliable, error-free spreadsheets and how to use the tools that Excel provides for auditing and error-checking. Each section concludes with a self-test and there is a support website. I though I knew my way around Excel pretty well, but reading this book I found myself saying "I didn't know you could do that!". If you build spreadsheets you should read this book.'


Bonus materials for owners of  'Spreadsheet Check and Control' 

As this is the first anniversary of my book, I am making available some expanded material in response to requests for more detail:

1) A 303K 11 page PDF: Understanding the recalculation mode, Lookup and Transition Formula Evaluation, Pie charts with negative data, Using Excel Scenarios for test cases, Comparing worksheets.

2) An expanded chapter on Data Validation, 16 pages, 468K PDF..

3) Bonus material outside the scope of the ECDL syllabus. Mainly VBA examples,16 pages, 320K PDF.

To download, please have the book to hand in order to enter a password from a page and then visit:

Links to buy the book:  Our offer - free shipping to EU in August 2006. Available worldwide from Amazon.


ScanXLS finds the links between many Excel workbooks in directories  SCANXLS is my Excel utility to scan directories and create an inventory of spreadsheets. It also builds a cross-reference of their dependencies, and helps assess their quality. Many programs will show the links IN (ie TO) a spreadsheet; SCANXLS is one of the very few tools in the marketplace that inspect entire directories and construct a list of XLS files that are found to have links FROM other files.


Free books on Excel  makes available this amazing set of free books:

Statistical Analysis With Excel [1.6 MB]
Excel For Beginners [2.4 MB]
Charting In Excel [1.6 MB]
Excel-- Beyond The Basics [1.8 MB]
Managing & Tabulating Data in Excel [1.9 MB]
Financial Analysis Using Excel [1.7 MB]



Thank you! Patrick O'Beirne, Editor

_______________________________________________________ _______________________________________________________

4) Off Topic  New Programming Language C+-

"There's finally a replacement for the commonly used programming language, C++ -- yes, it's C+- (pronounced "C More or Less"). Unlike C++, C+- is a subject-oriented language. Each C+- class instance, known as a subject, holds hidden members, known as prejudices or undeclared preferences, which are impervious to outside messages, as well as public members known as boasts or claims. C+- is a strongly typed language based on stereotyping and self-righteous logic. C+- supports information hiding and, among friend classes only, rumor sharing."


