Ian's Blog: August 2013

Sunday 25 August 2013

Fail fast, fail often, fail always

Apparently failing fast and failing often will lead to an eventual success, assuming of course that you are able to give the project or whatever it is time to develop and mature to a sufficient degree so that "failure" can be meaningfully assessed.

Failure does not necessarily mean that the project will not be a success, but rather that the technology, knowledge and product doesn't necessarily satisfy the current business goals. This is quite different from something that is a failure.

The mentality that everything should be an immediate success and that the smallest flaw is evidence of failure is killing not just research and innovation but the whole research and development process.

Furthermore the tendency to rapidly kill projects perceived to be failures without understanding what can be learned from those project and applied to bolster the next or parallel projects will eventually kill your business.

Innovation only comes from allowing scientists and engineers time to develop their ideas and not by rapidly killing those ideas before enough learning can be gained from them to benefit the overall business.

Indeed, if you do want to fail fast and fail often, ensure that you are learning from your mistakes and capitalising on the technologies and knowledge gained so that the fail often doesn't become fail always.

Saturday 24 August 2013

Wikileaks, NSA and a crytpographic cat and mouse game

So we know that certain government agencies are collecting data and we also know that Wikileaks has at least 400Gb of encrypted data (aka. "insurance") now in the public domain.

It is encrypted and no-one knows what's there...right?

Schneier posted on his blog an article about the upcomming cryptopocalypse about an presentation from the Black Hat conference entitled The Factoring Dead:Preparing for the Cryptopocalypse.

Given that we are fairly confident that the NSA has an enormous amount of computing power available to it, but even then probably not enough for a brute force attack on the Wikileak's encrypted data by a few orders of magnitude, they do have some pretty competent mathematicians working in that field and a vast amount of data including a lot of unencrypted originals.

Now, as mathematics starts chipping away as the encryption algorithms themselves the amount of computation and time to decrypt without a key comes dramatically down (it might just take a few billions of years now).

What helps in these scenarios is if you have some knowledge about what the encrypted data contains and its encryption method. If the Wikileaks file contains what it is supposed to contain (whatever that is, but it probably implicates the USA) then we have a head-start towards the decryption. This isn't too far from what Alan Turing and the Bletchley code breakers did.

Furthermore, rarely in the past have we have huge amounts of data to work with...encryption algorithms generate data with extremely high entropy. The point now is that is there enough entropy in 400Gb of encrypted data? Do patterns start emerging at some point? Admittedly even 400Gb is not considered a large amount of data these days.

Furthermore the encryption algorithm used is one approved by the NSA. What if the NSA start matching already encrypted documents they hold against the Wikileaks data? Is there a weakness in the keys? Are there patterns in the encrypted data that are independent of the keys being used? Then there's the interesting idea of a blackdoor in the algorithm too.

So, (tin foil hats on please), one of the things that Wikileaks is then looking forward is accidental disclosure of some information in that release. This might suggest that either NSA has succesfulyl decrypted the data. I'm sure that the NSA aren't that stupid to tell us that but governments are large things and even despite apparently good security practices breaches happen either deliberately or accidentally.

From a game theoretic point of view the best I can come up with at the moment is that any release of information by the NSA will reveal subtle hints about what they can and can not do with regards to their knowledge of encryption algorithms.

Shares in popcorn and publishers of books on game theory just went through the roof. Ultimately it might just be that physics, mathematics and plain old information theory has the last laugh...

Wednesday 21 August 2013

DNT lives!

The W3C's DNT proposal continues to march on. Thanks to an anonymous colleague for the link, his DNT value is undefined so I assume it to be "1".

From a technical viewpoint, DNT is quite simple in that adding a line to the HTTP headers is an instruction to any downstream carrier or user of that information that you do not wish to be tracked. The definition of what is means to be tracked and how this is enforced is of course somewhat hard to ascertain, but:

DNT: 0              // you can track me
DNT: 1              // don't track me

This got me wondering, given recent revelations, maybe we should add extra parameters or extend DNT to take into account other organisations who might be viewing our data?

I propose a 2-bit value:

DNT: 00     // please track me and supply me with obscure, irrelevant advertisements etc.
DNT: 01     // DNT as currently defined by W3C (meaning advertisers)
DNT: 10     // NSA, GCHQ etc are not allowed to track me, but advertisers can
DNT: 11     // no-one, not even government agencies or advertisers may track me.

Saturday 10 August 2013

Flying planes, Surgery and Privacy - the presentation

Here's a version of a talk I gave with a colleague on our experiences in using checklists in our daily work auditing systems for privacy compliance. To me the surgical checklist is probably the most apt for the types of work we perform during a privacy audit. Then privacy steps beyond the bounds of just "privacy" to becoming a more information safety issue:

Flying Planes, Surgery and Privacy (external version) from Ian Oliver

Thursday 8 August 2013

Facebook Postings and Hair Pins

Two things I noticed today: firstly a posting appeared on my Facebook wall linking to an article from the Finnish newspaper Iltalehti about 'hair pins' and their correct usage - more about that later - and of course people posted replies. However I was wondering how many noticed the small grey text underneath:

What might be an innocent (or not) reply could potentially go not just to your friends, and their friends, but also to the entire readership of Iltalehti.

We can get into an interesting argument about whose fault this this and whether the person posting should have read that comment about the potential readership, however discussing fault and appointing blame isn't too helpful as we've discussed earlier in an article about understanding accidents.

None of the 3 parties involved here: Facebook, Iltalehti and the writer of the comment are entirely to blame nor entirely innocent, but just like the pilot who let his airspeed decay and crashed the aircraft, we have an overall education and situation awareness problem that is going to be very difficult to change.

I'm not even advocating not posting to Facebook, but just be more aware of where comments are going to end up in addition to what you think your Facebook privacy settings are.

Note for non-Finnish speakers: apparently hair pins are being used the wrong way round...you can put the above article's URL into Bing or Google to translate.

Aside: first time I used Bing translate, it worked perfectly....though it did translate Finnish to...Finnish....

Secondly, regarding the above article on hair pins...it reminded me of a quote from Douglas Adams' Hitchhikers Guide to the Galaxy:

The sign said: Hold stick near centre of its length. Moisten pointed end in mouth. Insert in tooth space, blunt end next to gum. Use gentle in-out motion.

"It seemed to me," said Wonko the Sane, "that any civilization that had so far lost its head as to need to include a set of detailed instructions for use in a packet of toothpicks, was no longer a civilization in which I could live and stay sane."

Monday 5 August 2013

Understanding Software Engineering Accidents

The cause of 99.999...% of accidents is easy to ascertain, quite simply it is the pilot's/driver's fault. In the cases of two recent aviation and railway accidents (Asiana 214 and the Galicia's train crash) these were caused by the pilot and driver respectively....?

Finding the causes of an accident don't actually involve finding who is to blame but rather the whole context of what led up to the point where an accident was inevitable. And then even after that exploring what actually occurred during and after the accident.

The reasoning here is that if we concentrate on who to blame then we will miss all the circumstances that allowed that accident to occur in the first place. For example., in the case of the Galician train crash it has been ascertained that the driver was speeding, on the phone and had a history of breaking the rules. However this misses the more subtle questions of why did the train derail, why could the driver break the speed limit, what was the reasoning of the phone call, why did the carriages fail catastrophically after the derailment, did the safety systems work sufficiently, what state was the signalling in, did the driver override systems, was the driver sufficiently trained etc etc etc.

In other words, the whole context of the systems and organisation needs to be taken into account before appointing final blame; and even then very, very rarely is it single point of failure: the Swiss Cheese Model.

If we apply this model to software engineering and specifically accidents such as hacking and data breaches we become very aware of how many holes in our computing Swiss cheese we have.

Take a simple data breach where "hackers" have accessed a database on some server via an SQL injection via some web pages. If we apply our earlier thinking, it is obviously the fault of the 'stupid' system administrators who can't secure a system and the 'stupid' software engineers who can't write good code. Hindsight is great here isn't it?

To answer 'who to blame?' or better still 'why did things go wrong and how can we prevent this in the future?' we need to put ourselves, as other accident investigators do, in the position of those software engineers, system administrators, architects, hackers and managers AT THE POINT IN TIME WHERE THEY ACTED, and NEVER in hindsight.

Why did the trained, intelligent software engineer write code susceptible to SQLi in the first place?

Maybe they were under time pressure, no proper, formal design documentation, coding standards, never trained to spot those kinds of errors? Actually just stating that we have a failure of discipline is already pointing to wholesale failures across our whole organisation rather than just one individual.

Even if in the above case this was malice by the programmer, then why didn't the testing pick this up? Why was the code released without testing or checking? Now we have a failure elsewhere too.

Were there no procedures for this? Was the code signed-off by management with the risk in place? Now the net widens further across the organisation

and so on...

In aviation there is a term to explain most accidents: loss of situational awareness, which when explored invariably ends up with a multitude of 'wrong' choices being made over a longer period of time rather than just at those few critical minutes or hours in the cockpit.

I'm of the opinion that in software engineering that we almost always operate in a mode where we have no or little situational awareness. Our systems are complex, we lack formal models of our systems that clearly and concisely explain how the system works; indeed one of the maxims used by some practitioners of agile methods actively eschews the use of formality and modelling tools. Coupled with tight deadlines, a code-is-king mentality and rapidly and inconsistently changing requirements we have a fantastic recipe for disaster.

Bringing this back to an aviation analogy again, consider Turkish Airlines Flight 1951 which crashed as Schiphol in 2009. It was initially easy to blame the pilots for allowíng the plane to stall on final approach, but the whole accident investigation revealed deficiencies in the training, the approach procedures of Schiphol, a non-fault tolerant autothrottle and radar combination, a massively high workload situation for the pilots and ultimately a fault which manifested itself in precisely the behaviour that the pilots were requiring and expecting on their approach, that is the aircraft was losing speed.

As an exercise, how does the above accident map to what we experience every day in software engineering? Given high workloads, changing requirements, inconsistent planning and deadlines to get something (anything!) out that sort of works and we start getting answers to why intelligent administrators and programmers make mistakes.